About Metrics that are Monitored for Aviatrix Resources
Aviatrix Controller captures system metric and network metric information about the virtual machines (instances/hosts) that Aviatrix Gateways run on. Health-type metric information is also captured for Controller and CoPilot virtual machines. See Global Control Plane Health Alert. Metrics that are monitored by Aviatrix Controller and Aviatrix CoPilot include the following:- System Metrics for Triggering Notifications or Other Actions.
- Network Metrics for Triggering Notifications or Other Actions.
- Health Metrics for Triggering Notifications or Other Actions.
About Percentage Metrics
Several network metrics in the tables below are expressed as percentages (metrics with names beginning withper_). All percentage metrics share the same denominator: total attempted packets per second (rate_pkt_attempted), which is the sum of successfully processed packets (rx + tx) and all failure events across both directions. Because the denominator is bidirectional, a directional percentage (such as per_rx_drop) represents that failure type as a fraction of all traffic, not just inbound or outbound traffic.
Health Metrics for Triggering Notifications or Other Actions
The following health metrics are available in CoPilot. They are listed in alphabetical order, by the name used in the CoPilot UI.| Name (Health Metric) | Description | Internal Metric Name | Accessible by API |
|---|---|---|---|
| BGP Peering Status | Any BGP peering status change triggers an alert. | BGPpeeringStatus | |
| Connection Status | Any connection status change on the specified gateways/connections triggers an alert. | ConnectionStatus | |
| Gateway Status | Any gateway status change triggers an alert. | GatewayStatus | |
| Underlay Connection Status | Monitors the syslog from any connection that includes the host as the source or destination. When syslog data indicates a potential problem from each direction of the connection between that host and another host within 30 seconds of the other, the alert is triggered. On the same connection, if the syslog data later indicates the problem is resolved from either direction, the alert is automatically resolved. | UnderlayConnectionStatus |
System Metrics for Triggering Notifications or Other Actions
For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following system metrics. Aviatrix gateways report live Linux system statistics (such as memory, CPU, I/O, processes, and swap) for the instances/virtual machines on which they run. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.| Name (System Metric) | Description | Internal Metric Name | Accessible by API |
|---|---|---|---|
| CPU Idle (%) | Of the total CPU time, the percentage of time the CPU(s) spent idle. Collected as a 3-second average from the gateway. | cpu_idle | ✓ |
| CPU Kernel Space (%) | Of the total CPU time, the percentage of time spent running kernel code (system mode). | cpu_ks | ✓ |
| CPU Steal (%) | Of the total CPU time, the percentage of time a virtual CPU waited for a real CPU while the hypervisor serviced another virtual processor. Relevant for shared-tenancy instances. | cpu_steal | |
| CPU Used (%) | The percentage of CPU used, calculated as 100% minus CPU Idle. | cpu_used_per | |
| CPU User Space (%) | Of the total CPU time, the percentage of time spent running user-space (non-kernel) code. | cpu_us | ✓ |
| CPU Wait (%) | Of the total CPU time, the percentage of time spent waiting for I/O operations to complete. | cpu_wait | ✓ |
| Disk Free | The free (unused) storage space on the disk volume, in bytes. | hdisk_free | |
| Disk Free (%) | Of the total storage space on the disk volume, the percentage that is free and unused. | hdisk_free_per | |
| Disk Total | The total storage capacity of the disk volume, in bytes. | hdisk_tot | |
| IO Blocks In | The number of blocks per second received from block devices during the sampling interval. | io_blk_in | |
| IO Blocks Out | The number of blocks per second sent to block devices during the sampling interval. | io_blk_out | |
| Memory Available | The amount of memory (in bytes) available to be allocated to new or existing processes, including free memory and reclaimable caches. | memory_available | ✓ |
| Memory Available (%) | Of the total memory, the percentage that is available to be allocated to new or existing processes. | memory_available_per | |
| Memory Buffer | The amount of memory (in bytes) used by kernel buffers. | memory_buf | ✓ |
| Memory Cache | The amount of memory (in bytes) used by the page cache. | memory_cached | ✓ |
| Memory Free | The amount of memory (in bytes) that is completely unused and available. Unlike Memory Available, this does not include reclaimable caches. | memory_free | ✓ |
| Memory Swapped | The amount of memory (in bytes) written to swap space. Reports 0 when swap is not in use. | memory_swpd | ✓ |
| Memory Total | The total physical memory (in bytes) on the host. | memory_tot | |
| Memory Used | The amount of memory (in bytes) actively in use by processes. | memory_used | |
| Memory Used (%) | Of the total memory, the percentage actively in use by processes. | memory_used_per | |
| Processes Uninterruptible Sleep | The number of processes in an uninterruptible sleep state, typically blocked waiting for I/O to complete. | nproc_non_int_sleep | |
| Processes Waiting To Be Run | The number of processes that are currently running or are in the run queue waiting for CPU time. | nproc_running | |
| Swaps From Disk | The amount of memory (in kilobytes) swapped in from disk per second. | swap_from_disk | |
| Swaps To Disk | The amount of memory (in kilobytes) swapped out to disk per second. | swap_to_disk | |
| System Context Switches | The number of CPU context switches per second. | system_cs | |
| System Interrupts | The number of hardware interrupts per second, including the clock interrupt. | system_int |
Per-vCPU Metrics
Starting in CoPilot 4.32, the following per-vCPU metrics are available through the Metrics API. These metrics provide CPU utilization broken down by individual virtual CPU core for each gateway, enabling identification of single-core bottlenecks that aggregated CPU metrics may mask.| Name (vCPU Metric) | Description | Internal Metric Name | Accessible by API |
|---|---|---|---|
| vCPU Average Usage (%) | The average CPU utilization percentage for an individual vCPU core over the sampling interval. | vcpu_avg_usage | ✓ |
| vCPU Minimum Usage (%) | The minimum CPU utilization percentage observed for an individual vCPU core during the sampling interval. | vcpu_min_usage | ✓ |
| vCPU Maximum Usage (%) | The maximum CPU utilization percentage observed for an individual vCPU core during the sampling interval. | vcpu_max_usage | ✓ |
Network Metrics for Triggering Notifications or Other Actions
For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following network metrics. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.Cumulative Counters
Cumulative counters represent running totals since the interface was last reset. CoPilot uses the difference between consecutive readings to compute per-second rate and percentage metrics.| Name (Network Metric) | Description | Internal Metric Name | Accessible by API |
|---|---|---|---|
| Bandwidth Egress Limit Exceeded | (AWS Only) The cumulative count of events where the outbound (egress) bandwidth allowance for the instance type was exceeded. Sourced from the Elastic Network Adapter (ENA) driver. | bandwidth_egress_limit_exceeded | |
| Bandwidth Ingress Limit Exceeded | (AWS Only) The cumulative count of events where the inbound (ingress) bandwidth allowance for the instance type was exceeded. Sourced from the ENA driver. | bandwidth_ingress_limit_exceeded | ✓ |
| Collisions during Transmission | The cumulative count of collisions detected during packet transmission on the interface. | tx_colls | |
| Compressed Packets Received | The cumulative count of compressed packets received by the interface. | rx_compressed | |
| Compressed Packets Transmitted | The cumulative count of compressed packets transmitted by the interface. | tx_compressed | |
| Conntrack Allowance Available | (AWS Only) The number of tracked connections that can still be established before the instance’s connection-tracking allowance is exhausted. Sourced from the ENA driver. | conntrack_allowance_available | |
| Conntrack Limit Exceeded | (AWS Only) The cumulative count of events where the connection-tracking (conntrack) table limit for the instance type was exceeded. Sourced from the ENA driver. | conntrack_limit_exceeded | |
| Errored Packets Received | The cumulative count of packets received with errors as reported by the network interface (e.g., CRC errors, framing errors). | rx_errs | |
| Errored Packets Transmitted | The cumulative count of packets that encountered errors during transmission. | tx_errs | |
| Linklocal Limit Exceeded | (AWS Only) The cumulative count of events where the link-local packet rate limit for the instance type was exceeded. Sourced from the ENA driver. | linklocal_limit_exceeded | |
| Multicast Packets Received | The cumulative count of multicast packets received by the interface. | rx_multicast | |
| Packets Dropped during Transmission | The cumulative count of outbound packets dropped by the interface, typically due to resource constraints such as transmit queue overflow. | tx_drop | ✓ |
| Packets Dropped while Receiving | The cumulative count of inbound packets dropped by the interface, typically due to resource limitations such as receive buffer overflow. | rx_drop | ✓ |
| PPS Limit Exceeded | (AWS Only) The cumulative count of events where the packets-per-second allowance for the instance type was exceeded. This is a single aggregate counter covering both inbound and outbound directions. Sourced from the ENA driver. | pps_limit_exceeded | ✓ |
| Received Bytes | The cumulative count of bytes received by the interface. | rx_bytes | |
| Received Frames | The cumulative count of frame alignment errors on received packets. | rx_frame | |
| Received Packets | The cumulative count of packets successfully received by the interface. | rx_packets | |
| Receiver FIFO Frames | The cumulative count of FIFO buffer overflow events when receiving packets. | rx_fifo | |
| Transmission FIFO Frames | The cumulative count of FIFO buffer underrun events when transmitting packets. | tx_fifo | |
| Transmitted Bytes | The cumulative count of bytes transmitted by the interface. | tx_bytes | |
| Transmitted Carrier Frames | The cumulative count of carrier sense errors encountered during transmission (e.g., loss of link signal). | tx_carrier | |
| Transmitted Packets | The cumulative count of packets successfully transmitted by the interface. | tx_packets |
Per-Second Rates
CoPilot computes per-second rates from cumulative counter deltas between consecutive collection intervals. Throughput rates (rate_sent, rate_received, rate_total) are reported in bits per second.
| Name (Network Metric) | Description | Internal Metric Name | Accessible by API |
|---|---|---|---|
| Bandwidth Egress Limit Exceeded Rate | (AWS Only) The per-second rate of egress bandwidth-limit-exceeded events. Sourced from the ENA driver. | rate_bandwidth_egress_limit_exceeded | |
| Bandwidth Ingress Limit Exceeded Rate | (AWS Only) The per-second rate of ingress bandwidth-limit-exceeded events. Sourced from the ENA driver. | rate_bandwidth_ingress_limit_exceeded | |
| Collisions Rate during Transmission | The per-second rate of collisions during packet transmission. | rate_tx_colls | |
| Compressed Packets Received Rate | The per-second rate of compressed packets received. | rate_rx_compressed | |
| Compressed Packets Transmitted Rate | The per-second rate of compressed packets transmitted. | rate_tx_compressed | |
| Conntrack Limit Exceeded Rate | (AWS Only) The per-second rate of connection-tracking limit-exceeded events. | rate_conntrack_limit_exceeded | |
| Conntrack Usage Rate | (AWS Only) The rate at which connection-tracking capacity is being consumed, reported in packets per second. Only available on instances where the Conntrack Allowance Available metric is present. | conntrack_usage_rate | |
| Drop Rate during Transmission | The per-second rate of packets dropped during transmission. | rate_tx_drop | ✓ |
| Drop Rate while Receiving | The per-second rate of packets dropped while receiving. | rate_rx_drop | ✓ |
| Errored Packets Received Rate | The per-second rate of packets received with errors. | rate_rx_errs | |
| Errored Packets Transmitted Rate | The per-second rate of packet transmission errors. | rate_tx_errs | |
| Limit Exceeded Rate (PPS) - AWS Only | (AWS Only) The per-second rate of packets-per-second limit-exceeded events on the instance. | rate_pps_limit_exceeded | |
| Linklocal Limit Exceeded Rate | (AWS Only) The per-second rate of link-local packet rate limit-exceeded events. | rate_linklocal_limit_exceeded | |
| Multicast Packets Received Rate | The per-second rate of multicast packets received. | rate_rx_multicast | |
| Packet Drop Rate | The per-second rate of all dropped packets across both directions. Computed as the sum of Drop Rate during Transmission and Drop Rate while Receiving. | rate_pkt_drop | ✓ |
| Packet Failure Rate | The aggregate per-second rate of all network failure events. This is the sum of 10 individual failure-type rates: bandwidth egress/ingress limit exceeded, conntrack limit exceeded, linklocal limit exceeded, PPS limit exceeded, rx/tx drops, rx/tx errors, and received frame errors. | rate_pkt_fail | |
| Peak Received Rate | The peak inbound throughput in bits per second as reported by the gateway for the collection interval. | rate_peak_received | |
| Peak Total Rate | The peak bidirectional throughput in bits per second as reported by the gateway for the collection interval. | rate_peak_total | |
| Peak Transmitted Rate | The peak outbound throughput in bits per second as reported by the gateway for the collection interval. | rate_peak_sent | |
| Received Frames Rate | The per-second rate of frame alignment errors on received packets. | rate_rx_frame | |
| Received Rate | The inbound throughput in bits per second on the interface. Computed from the byte counter delta. | rate_received | ✓ |
| Received Rate (PPS) | The inbound packet throughput in packets per second. | pkt_rx_rate | |
| Receiver FIFO Frames Rate | The per-second rate of receive FIFO buffer overflow events. | rate_rx_fifo | |
| Total Attempted Rate | The total bidirectional packet rate including both successfully processed packets and all failure events, in packets per second. Computed as Total Rate (in packets) + Packet Failure Rate. Used as the denominator for all percentage metrics. | rate_pkt_attempted | |
| Total Rate | The total bidirectional throughput in bits per second on the interface. Computed as the sum of Received Rate and Transmitted Rate. | rate_total | ✓ |
| Total Rate (in packets) | The total bidirectional packet throughput in packets per second. Computed as the sum of Received Rate (PPS) and Transmitted Rate (PPS). Instance size determines how many packets per second a gateway can handle. | pkt_rate_total | |
| Transmission FIFO Frames Rate | The per-second rate of transmit FIFO buffer underrun events. | rate_tx_fifo | |
| Transmitted Carrier Frames Rate | The per-second rate of carrier sense errors during transmission. | rate_tx_carrier | |
| Transmitted Rate | The outbound throughput in bits per second on the interface. Computed from the byte counter delta. | rate_sent | ✓ |
| Transmitted Rate (PPS) | The outbound packet throughput in packets per second. | pkt_tx_rate |
Percentage Metrics
Percentage metrics express a specific failure rate as a fraction of total attempted packets. All percentage metrics use the same bidirectional denominator:rate_pkt_attempted (see About Percentage Metrics).
| Name (Network Metric) | Description | Internal Metric Name | Accessible by API |
|---|---|---|---|
| Bandwidth Egress Limit Exceeded (%) | (AWS Only) Egress bandwidth-limit-exceeded events as a percentage of total attempted packets. | per_bandwidth_egress_limit | |
| Bandwidth Ingress Limit Exceeded (%) | (AWS Only) Ingress bandwidth-limit-exceeded events as a percentage of total attempted packets. | per_bandwidth_ingress_limit_exceeded | |
| Conntrack Limit Exceeded (%) | (AWS Only) Connection-tracking limit-exceeded events as a percentage of total attempted packets. | per_conntrack_limit_exceeded | |
| Interface Drops during Transmission (%) | Packets dropped during transmission as a percentage of total attempted packets. | per_tx_drop | |
| Interface Drops while Receiving (%) | Packets dropped while receiving as a percentage of total attempted packets. | per_rx_drop | |
| Interface Errors during Transmission (%) | Transmission errors as a percentage of total attempted packets. | per_tx_errs | |
| Interface Errors while Receiving (%) | Receive errors as a percentage of total attempted packets. | per_rx_errs | |
| Linklocal Limit Exceeded (%) | (AWS Only) Link-local rate limit-exceeded events as a percentage of total attempted packets. | per_linklocal_limit_exceeded | |
| Packet Drop (%) | All dropped packets (rx + tx) as a percentage of total attempted packets. | per_pkt_drop | |
| Packet Failure (%) | All failure events (drops, errors, and limit-exceeded events) as a percentage of total attempted packets. This is the broadest failure percentage metric. | per_pkt_fail | |
| PPS Limit Exceeded Drop (%) | (AWS Only) Packets-per-second limit-exceeded events as a percentage of total attempted packets. | per_pps_limit_exceeded |