Skip to main content
The system metrics and network metrics that you can access in CoPilot are captured by the Aviatrix Controller. The Controller pulls the data from virtual machines (instances/hosts) that Aviatrix Gateways run on and feeds that data to CoPilot. Some metrics can be used for triggering actions such as alerting and Gateway Scaling. You can also use metrics to monitor the performance of Gateway hosts. You can monitor performance in CoPilot from the Monitor > Performance page. In addition, with Aviatrix Network Insights API, you can use APIs to analyze the performance and health of your Aviatrix-managed resources in external monitoring systems. See External Monitoring with the Metrics and Status APIs for information about using the Metrics and Status APIs.

About Metrics that are Monitored for Aviatrix Resources

Aviatrix Controller captures system metric and network metric information about the virtual machines (instances/hosts) that Aviatrix Gateways run on. Health-type metric information is also captured for Controller and CoPilot virtual machines. See Global Control Plane Health Alert. Metrics that are monitored by Aviatrix Controller and Aviatrix CoPilot include the following: On the CoPilot Monitor > Performance page, you can select metrics to monitor performance on your resource VMs. On the CoPilot Monitor > Notifications > Alert Configurations page, you can configure how to use the pre-existing set of metrics to send notifications about events that occur in your network, such as performance bottlenecks or other problems. To better understand how notifications and alerts work and how to configure them in CoPilot, see Notifications (Alerts) About Network Events. For more information about integrating Aviatrix metric and status APIs with external monitoring tools, see External Monitoring with the Metrics and Status APIs.

About Percentage Metrics

Several network metrics in the tables below are expressed as percentages (metrics with names beginning with per_). All percentage metrics share the same denominator: total attempted packets per second (rate_pkt_attempted), which is the sum of successfully processed packets (rx + tx) and all failure events across both directions. Because the denominator is bidirectional, a directional percentage (such as per_rx_drop) represents that failure type as a fraction of all traffic, not just inbound or outbound traffic.

Health Metrics for Triggering Notifications or Other Actions

The following health metrics are available in CoPilot. They are listed in alphabetical order, by the name used in the CoPilot UI.
Name (Health Metric)DescriptionInternal Metric NameAccessible by API
BGP Peering StatusAny BGP peering status change triggers an alert.BGPpeeringStatus
Connection StatusAny connection status change on the specified gateways/connections triggers an alert.ConnectionStatus
Gateway StatusAny gateway status change triggers an alert.GatewayStatus
Underlay Connection StatusMonitors the syslog from any connection that includes the host as the source or destination. When syslog data indicates a potential problem from each direction of the connection between that host and another host within 30 seconds of the other, the alert is triggered. On the same connection, if the syslog data later indicates the problem is resolved from either direction, the alert is automatically resolved.UnderlayConnectionStatus

System Metrics for Triggering Notifications or Other Actions

For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following system metrics. Aviatrix gateways report live Linux system statistics (such as memory, CPU, I/O, processes, and swap) for the instances/virtual machines on which they run. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.
Name (System Metric)DescriptionInternal Metric NameAccessible by API
CPU Idle (%)Of the total CPU time, the percentage of time the CPU(s) spent idle. Collected as a 3-second average from the gateway.cpu_idle
CPU Kernel Space (%)Of the total CPU time, the percentage of time spent running kernel code (system mode).cpu_ks
CPU Steal (%)Of the total CPU time, the percentage of time a virtual CPU waited for a real CPU while the hypervisor serviced another virtual processor. Relevant for shared-tenancy instances.cpu_steal
CPU Used (%)The percentage of CPU used, calculated as 100% minus CPU Idle.cpu_used_per
CPU User Space (%)Of the total CPU time, the percentage of time spent running user-space (non-kernel) code.cpu_us
CPU Wait (%)Of the total CPU time, the percentage of time spent waiting for I/O operations to complete.cpu_wait
Disk FreeThe free (unused) storage space on the disk volume, in bytes.hdisk_free
Disk Free (%)Of the total storage space on the disk volume, the percentage that is free and unused.hdisk_free_per
Disk TotalThe total storage capacity of the disk volume, in bytes.hdisk_tot
IO Blocks InThe number of blocks per second received from block devices during the sampling interval.io_blk_in
IO Blocks OutThe number of blocks per second sent to block devices during the sampling interval.io_blk_out
Memory AvailableThe amount of memory (in bytes) available to be allocated to new or existing processes, including free memory and reclaimable caches.memory_available
Memory Available (%)Of the total memory, the percentage that is available to be allocated to new or existing processes.memory_available_per
Memory BufferThe amount of memory (in bytes) used by kernel buffers.memory_buf
Memory CacheThe amount of memory (in bytes) used by the page cache.memory_cached
Memory FreeThe amount of memory (in bytes) that is completely unused and available. Unlike Memory Available, this does not include reclaimable caches.memory_free
Memory SwappedThe amount of memory (in bytes) written to swap space. Reports 0 when swap is not in use.memory_swpd
Memory TotalThe total physical memory (in bytes) on the host.memory_tot
Memory UsedThe amount of memory (in bytes) actively in use by processes.memory_used
Memory Used (%)Of the total memory, the percentage actively in use by processes.memory_used_per
Processes Uninterruptible SleepThe number of processes in an uninterruptible sleep state, typically blocked waiting for I/O to complete.nproc_non_int_sleep
Processes Waiting To Be RunThe number of processes that are currently running or are in the run queue waiting for CPU time.nproc_running
Swaps From DiskThe amount of memory (in kilobytes) swapped in from disk per second.swap_from_disk
Swaps To DiskThe amount of memory (in kilobytes) swapped out to disk per second.swap_to_disk
System Context SwitchesThe number of CPU context switches per second.system_cs
System InterruptsThe number of hardware interrupts per second, including the clock interrupt.system_int

Per-vCPU Metrics

Starting in CoPilot 4.32, the following per-vCPU metrics are available through the Metrics API. These metrics provide CPU utilization broken down by individual virtual CPU core for each gateway, enabling identification of single-core bottlenecks that aggregated CPU metrics may mask.
Name (vCPU Metric)DescriptionInternal Metric NameAccessible by API
vCPU Average Usage (%)The average CPU utilization percentage for an individual vCPU core over the sampling interval.vcpu_avg_usage
vCPU Minimum Usage (%)The minimum CPU utilization percentage observed for an individual vCPU core during the sampling interval.vcpu_min_usage
vCPU Maximum Usage (%)The maximum CPU utilization percentage observed for an individual vCPU core during the sampling interval.vcpu_max_usage

Network Metrics for Triggering Notifications or Other Actions

For Aviatrix Controller and Aviatrix gateways, you can configure alerts based on the following network metrics. Metrics are listed in alphabetical order, by the name used in the CoPilot UI.

Cumulative Counters

Cumulative counters represent running totals since the interface was last reset. CoPilot uses the difference between consecutive readings to compute per-second rate and percentage metrics.
Name (Network Metric)DescriptionInternal Metric NameAccessible by API
Bandwidth Egress Limit Exceeded(AWS Only) The cumulative count of events where the outbound (egress) bandwidth allowance for the instance type was exceeded. Sourced from the Elastic Network Adapter (ENA) driver.bandwidth_egress_limit_exceeded
Bandwidth Ingress Limit Exceeded(AWS Only) The cumulative count of events where the inbound (ingress) bandwidth allowance for the instance type was exceeded. Sourced from the ENA driver.bandwidth_ingress_limit_exceeded
Collisions during TransmissionThe cumulative count of collisions detected during packet transmission on the interface.tx_colls
Compressed Packets ReceivedThe cumulative count of compressed packets received by the interface.rx_compressed
Compressed Packets TransmittedThe cumulative count of compressed packets transmitted by the interface.tx_compressed
Conntrack Allowance Available(AWS Only) The number of tracked connections that can still be established before the instance’s connection-tracking allowance is exhausted. Sourced from the ENA driver.conntrack_allowance_available
Conntrack Limit Exceeded(AWS Only) The cumulative count of events where the connection-tracking (conntrack) table limit for the instance type was exceeded. Sourced from the ENA driver.conntrack_limit_exceeded
Errored Packets ReceivedThe cumulative count of packets received with errors as reported by the network interface (e.g., CRC errors, framing errors).rx_errs
Errored Packets TransmittedThe cumulative count of packets that encountered errors during transmission.tx_errs
Linklocal Limit Exceeded(AWS Only) The cumulative count of events where the link-local packet rate limit for the instance type was exceeded. Sourced from the ENA driver.linklocal_limit_exceeded
Multicast Packets ReceivedThe cumulative count of multicast packets received by the interface.rx_multicast
Packets Dropped during TransmissionThe cumulative count of outbound packets dropped by the interface, typically due to resource constraints such as transmit queue overflow.tx_drop
Packets Dropped while ReceivingThe cumulative count of inbound packets dropped by the interface, typically due to resource limitations such as receive buffer overflow.rx_drop
PPS Limit Exceeded(AWS Only) The cumulative count of events where the packets-per-second allowance for the instance type was exceeded. This is a single aggregate counter covering both inbound and outbound directions. Sourced from the ENA driver.pps_limit_exceeded
Received BytesThe cumulative count of bytes received by the interface.rx_bytes
Received FramesThe cumulative count of frame alignment errors on received packets.rx_frame
Received PacketsThe cumulative count of packets successfully received by the interface.rx_packets
Receiver FIFO FramesThe cumulative count of FIFO buffer overflow events when receiving packets.rx_fifo
Transmission FIFO FramesThe cumulative count of FIFO buffer underrun events when transmitting packets.tx_fifo
Transmitted BytesThe cumulative count of bytes transmitted by the interface.tx_bytes
Transmitted Carrier FramesThe cumulative count of carrier sense errors encountered during transmission (e.g., loss of link signal).tx_carrier
Transmitted PacketsThe cumulative count of packets successfully transmitted by the interface.tx_packets

Per-Second Rates

CoPilot computes per-second rates from cumulative counter deltas between consecutive collection intervals. Throughput rates (rate_sent, rate_received, rate_total) are reported in bits per second.
Name (Network Metric)DescriptionInternal Metric NameAccessible by API
Bandwidth Egress Limit Exceeded Rate(AWS Only) The per-second rate of egress bandwidth-limit-exceeded events. Sourced from the ENA driver.rate_bandwidth_egress_limit_exceeded
Bandwidth Ingress Limit Exceeded Rate(AWS Only) The per-second rate of ingress bandwidth-limit-exceeded events. Sourced from the ENA driver.rate_bandwidth_ingress_limit_exceeded
Collisions Rate during TransmissionThe per-second rate of collisions during packet transmission.rate_tx_colls
Compressed Packets Received RateThe per-second rate of compressed packets received.rate_rx_compressed
Compressed Packets Transmitted RateThe per-second rate of compressed packets transmitted.rate_tx_compressed
Conntrack Limit Exceeded Rate(AWS Only) The per-second rate of connection-tracking limit-exceeded events.rate_conntrack_limit_exceeded
Conntrack Usage Rate(AWS Only) The rate at which connection-tracking capacity is being consumed, reported in packets per second. Only available on instances where the Conntrack Allowance Available metric is present.conntrack_usage_rate
Drop Rate during TransmissionThe per-second rate of packets dropped during transmission.rate_tx_drop
Drop Rate while ReceivingThe per-second rate of packets dropped while receiving.rate_rx_drop
Errored Packets Received RateThe per-second rate of packets received with errors.rate_rx_errs
Errored Packets Transmitted RateThe per-second rate of packet transmission errors.rate_tx_errs
Limit Exceeded Rate (PPS) - AWS Only(AWS Only) The per-second rate of packets-per-second limit-exceeded events on the instance.rate_pps_limit_exceeded
Linklocal Limit Exceeded Rate(AWS Only) The per-second rate of link-local packet rate limit-exceeded events.rate_linklocal_limit_exceeded
Multicast Packets Received RateThe per-second rate of multicast packets received.rate_rx_multicast
Packet Drop RateThe per-second rate of all dropped packets across both directions. Computed as the sum of Drop Rate during Transmission and Drop Rate while Receiving.rate_pkt_drop
Packet Failure RateThe aggregate per-second rate of all network failure events. This is the sum of 10 individual failure-type rates: bandwidth egress/ingress limit exceeded, conntrack limit exceeded, linklocal limit exceeded, PPS limit exceeded, rx/tx drops, rx/tx errors, and received frame errors.rate_pkt_fail
Peak Received RateThe peak inbound throughput in bits per second as reported by the gateway for the collection interval.rate_peak_received
Peak Total RateThe peak bidirectional throughput in bits per second as reported by the gateway for the collection interval.rate_peak_total
Peak Transmitted RateThe peak outbound throughput in bits per second as reported by the gateway for the collection interval.rate_peak_sent
Received Frames RateThe per-second rate of frame alignment errors on received packets.rate_rx_frame
Received RateThe inbound throughput in bits per second on the interface. Computed from the byte counter delta.rate_received
Received Rate (PPS)The inbound packet throughput in packets per second.pkt_rx_rate
Receiver FIFO Frames RateThe per-second rate of receive FIFO buffer overflow events.rate_rx_fifo
Total Attempted RateThe total bidirectional packet rate including both successfully processed packets and all failure events, in packets per second. Computed as Total Rate (in packets) + Packet Failure Rate. Used as the denominator for all percentage metrics.rate_pkt_attempted
Total RateThe total bidirectional throughput in bits per second on the interface. Computed as the sum of Received Rate and Transmitted Rate.rate_total
Total Rate (in packets)The total bidirectional packet throughput in packets per second. Computed as the sum of Received Rate (PPS) and Transmitted Rate (PPS). Instance size determines how many packets per second a gateway can handle.pkt_rate_total
Transmission FIFO Frames RateThe per-second rate of transmit FIFO buffer underrun events.rate_tx_fifo
Transmitted Carrier Frames RateThe per-second rate of carrier sense errors during transmission.rate_tx_carrier
Transmitted RateThe outbound throughput in bits per second on the interface. Computed from the byte counter delta.rate_sent
Transmitted Rate (PPS)The outbound packet throughput in packets per second.pkt_tx_rate

Percentage Metrics

Percentage metrics express a specific failure rate as a fraction of total attempted packets. All percentage metrics use the same bidirectional denominator: rate_pkt_attempted (see About Percentage Metrics).
Name (Network Metric)DescriptionInternal Metric NameAccessible by API
Bandwidth Egress Limit Exceeded (%)(AWS Only) Egress bandwidth-limit-exceeded events as a percentage of total attempted packets.per_bandwidth_egress_limit
Bandwidth Ingress Limit Exceeded (%)(AWS Only) Ingress bandwidth-limit-exceeded events as a percentage of total attempted packets.per_bandwidth_ingress_limit_exceeded
Conntrack Limit Exceeded (%)(AWS Only) Connection-tracking limit-exceeded events as a percentage of total attempted packets.per_conntrack_limit_exceeded
Interface Drops during Transmission (%)Packets dropped during transmission as a percentage of total attempted packets.per_tx_drop
Interface Drops while Receiving (%)Packets dropped while receiving as a percentage of total attempted packets.per_rx_drop
Interface Errors during Transmission (%)Transmission errors as a percentage of total attempted packets.per_tx_errs
Interface Errors while Receiving (%)Receive errors as a percentage of total attempted packets.per_rx_errs
Linklocal Limit Exceeded (%)(AWS Only) Link-local rate limit-exceeded events as a percentage of total attempted packets.per_linklocal_limit_exceeded
Packet Drop (%)All dropped packets (rx + tx) as a percentage of total attempted packets.per_pkt_drop
Packet Failure (%)All failure events (drops, errors, and limit-exceeded events) as a percentage of total attempted packets. This is the broadest failure percentage metric.per_pkt_fail
PPS Limit Exceeded Drop (%)(AWS Only) Packets-per-second limit-exceeded events as a percentage of total attempted packets.per_pps_limit_exceeded