Skip to main content
The Aviatrix Metrics and Status APIs allow you to integrate CoPilot with third-party monitoring platforms such as Datadog, Splunk, Grafana, or Prometheus. This guide covers which APIs to use, what data is available, how to configure alerting that stays consistent with CoPilot’s built-in monitoring, and how to avoid common pitfalls like double-counted traffic or noisy metrics. CoPilot exposes two complementary APIs for external consumption:

Metrics API

Endpoint: /metrics-api/v1/gatewaysPerformance metrics including CPU, memory, throughput, and packet drops.Scrape interval: Every 5 minutes

Status API

Endpoint: /status-api/v1/Availability status for gateways, tunnels, and BGP peerings.Scrape interval: Every 1 minute
Both APIs support Prometheus text format and JSON output. All data transmissions are encrypted using industry-standard protocols.

Authentication

Both APIs use the same API key, passed as a Bearer token:
Authorization: Bearer <your-api-key>
The API key is configured in CoPilot under Settings > API Keys. A single key grants access to both endpoints.

Enabling the API and Downloading Specifications

To use the APIs, you need to enable API access in CoPilot and create an authentication key.
The Aviatrix API uses port 443, the same port as the CoPilot UI. Ensure that port 443 is accessible and not restricted by any security groups.
The API key created during this procedure will not be accessible again. Save it in a secure place. If you lose the key, you must reset it.
  1. In CoPilot, navigate to Settings > Configuration > General.
  2. Scroll down to Features and select Metrics API or Status API.
  3. Click Download to download the associated OpenAPI .yaml specification.
The OpenAPI specification files provide complete endpoint documentation, request/response schemas, and examples. Download the latest versions from CoPilot rather than relying on external copies, as the specifications are updated with each CoPilot release.

Testing API Access

Verify access to your CoPilot instance with a curl command:
curl -k -X 'GET' \
  'https://<copilot-address>/metrics-api/v1/gateways?format=json' \
  -H 'Authorization: Bearer <your-api-key>'
A 200 response with metric data confirms the API is working. A 403 response indicates an incorrect or expired key.

Metrics API

Endpoints

GET https://<copilot-address>/metrics-api/v1/gateways
GET https://<copilot-address>/metrics-api/v1/gateways?format=json
The default format is Prometheus text. Append ?format=json for structured JSON output.

How Data Is Collected

CoPilot collects performance data from every managed gateway once per minute via the Aviatrix Controller. The Metrics API serves a snapshot of the most recent collection, rounded to the nearest 5-minute boundary with a 15-minute lookback window. Values represent point-in-time samples, not averages or aggregations.
Recommended scrape interval: 5 minutes. Polling more frequently returns the same cached data.

Gateway-Level Metrics

These metrics are reported once per gateway, with a gateway label.
MetricDescriptionUnit
cpu_idleCPU idle timePercent (0-100)
cpu_used_perCPU utilizationPercent (0-100)
cpu_usCPU time spent in user spacePercent
cpu_ksCPU time spent in kernel spacePercent
cpu_waitCPU time waiting on I/OPercent
memory_availableMemory available to applicationsBytes
memory_freeCompletely unused memoryBytes
memory_cachedMemory used for disk cacheBytes
memory_bufMemory used for kernel buffersBytes
memory_swpdMemory written to swapBytes
memory_used_perMemory utilizationPercent (0-100)
cpu_used_per and memory_used_per are available in CoPilot 4.32+. On older versions, derive CPU utilization as 100 - cpu_idle. For memory percentage on older versions, you must know the instance type’s total memory from your cloud provider.

Per-vCPU Metrics

On gateways with multiple virtual CPUs, the API reports per-core utilization with gateway and vcpu_name labels.
MetricDescriptionUnit
vcpu_avg_usageAverage CPU usage for this vCPUPercent (0-100)
vcpu_min_usageMinimum CPU usage for this vCPUPercent (0-100)
vcpu_max_usageMaximum CPU usage for this vCPUPercent (0-100)
These metrics are useful for identifying core imbalance, for example one vCPU pegged at 100% while others are idle, which may indicate a single-threaded bottleneck.

Interface-Level Metrics

These metrics are reported per gateway and per network interface, with gateway and interface labels.
MetricDescriptionUnit
rate_receivedInbound throughputBits/sec
rate_sentOutbound throughputBits/sec
rate_totalCombined throughputBits/sec
rx_dropInbound packet drops (cumulative)Count
tx_dropOutbound packet drops (cumulative)Count
rate_rx_dropInbound drop rateDrops/sec
rate_tx_dropOutbound drop rateDrops/sec
rate_pkt_dropCombined drop rateDrops/sec
bandwidth_ingress_limit_exceededTimes ingress bandwidth limit was exceeded (cumulative)Count
pps_limit_exceededTimes packets-per-second limit was exceeded (cumulative)Count
Throughput metrics (rate_sent, rate_received, rate_total) are reported in bits per second, not bytes. The pps_limit_exceeded and bandwidth_ingress_limit_exceeded counters are cumulative counts of packets throttled by the cloud provider’s instance-type network limits (AWS ENA driver). These counters are only present on AWS instances.

Example Prometheus Output

cpu_idle{gateway="spoke-gw-us-east-1"} 74.3 1734006896789
cpu_used_per{gateway="spoke-gw-us-east-1"} 25.7 1734006896789
memory_available{gateway="spoke-gw-us-east-1"} 3221225472 1734006896789
memory_used_per{gateway="spoke-gw-us-east-1"} 30.2 1734006896789
memory_swpd{gateway="spoke-gw-us-east-1"} 0 1734006896789
rate_received{gateway="spoke-gw-us-east-1", interface="eth0"} 125400.5 1734006896789
rate_pkt_drop{gateway="spoke-gw-us-east-1", interface="eth0"} 0 1734006896789
pps_limit_exceeded{gateway="spoke-gw-us-east-1", interface="eth0"} 0 1734006896789
vcpu_avg_usage{gateway="spoke-gw-us-east-1", vcpu_name="0"} 30.5 1734006896789
vcpu_avg_usage{gateway="spoke-gw-us-east-1", vcpu_name="1"} 20.9 1734006896789

Status API

Endpoints

GET https://<copilot-address>/status-api/v1/
GET https://<copilot-address>/status-api/v1/?format=json

How Data Is Collected

CoPilot polls gateway, tunnel, and BGP status every minute. The Status API serves the latest cached state with no windowing or aggregation.
Recommended scrape interval: 1 minute. The Status API refreshes at this rate, and availability events are time-sensitive.

Gateway Status

Reported as status with a gateway label.
ValueMeaning
1Up — gateway is operating normally
0Down — gateway is not responding
-1Degraded — keepalive failure, configuration failure, upgrade failure, or transitional state
status{gateway="transit-gw-1"} 1 1734006896789
status{gateway="spoke-gw-1"} 0 1734006896789

Tunnel Status

Includes both gateway-to-gateway peering tunnels and Site-to-Cloud (S2C) tunnels, reported as status with a tunnel label.
ValueMeaning
1Up
0Down
status{tunnel="spoke-gw-1.transit-gw-1"} 1 1734006896789
status{tunnel="s2c-branch-office"} 0 1734006896789

BGP Peering Status

Reported as bgp_status with gateway and bgp_neighbor labels.
ValueMeaning
1Established
0Not established
bgp_status{gateway="transit-gw-1",bgp_neighbor="vgw(169.254.254.185)"} 1 1734006896789
bgp_status{gateway="transit-gw-1",bgp_neighbor="peer(10.0.0.1)"} 0 1734006896789

Network Interface Guidance

Each gateway reports interface-level metrics for every network interface on the instance. Understanding which interfaces to monitor is essential for accurate dashboards and alerts.

Interfaces to Monitor

InterfaceDescriptionRecommendation
eth0Primary management and data interfaceAlways monitor — carries production traffic
eth1, eth2, etc.Additional data-plane interfaces (if present)Monitor — carries production traffic

Interfaces to Exclude

InterfaceDescriptionRecommendation
loLoopback (127.0.0.1)Exclude — internal-only, no operational value
tun-*IPsec/GRE tunnel interfacesExclude in most cases — see note below

Why Exclude Tunnel Interfaces?

Traffic on tun-* interfaces is a subset of traffic already counted on the underlying eth interface. Including both leads to double-counted bandwidth in dashboards and inflated throughput numbers. For example, a packet traversing an IPsec tunnel from spoke-gw to transit-gw is counted once on eth0 (encrypted) and once on tun-abc123 (decrypted). Summing both overstates actual bandwidth consumption.
If you need per-tunnel visibility (for example, to identify which specific S2C tunnel is experiencing packet drops) you may collect tun-* metrics separately. In that case, do not sum them with eth interface metrics.
In Prometheus-based systems, apply a relabel or query filter:
# Throughput on production interfaces only
rate_total{interface=~"eth.*"}
# Drop loopback and tunnel metrics at ingestion
metric_relabel_configs:
  - source_labels: [interface]
    regex: '(lo|tun-.*)'
    action: drop

Matching CoPilot’s Built-in Alerts

CoPilot ships with three default alert definitions. This section maps each to the equivalent external alert you can configure in your monitoring platform.

Global Control Plane Health

CoPilot monitors Controller and CoPilot CPU, memory, and disk health. Any condition true for 15 minutes triggers the alert.
CoPilot ConditionExternal EquivalentNotes
CPU Usage > 90%cpu_used_per > 90 for 15mCoPilot 4.32+. On older versions: (100 - cpu_idle) > 90
Memory Usage > 90%memory_used_per > 90 for 15mCoPilot 4.32+. On older versions: memory_available < (total_memory * 0.10)
Disk Free < 5%Not available via APIUse CoPilot webhook

Global Network Health

CoPilot monitors all gateways and their network interfaces. Any condition true for 15 minutes triggers the alert.
CoPilot ConditionExternal EquivalentAPI
Gateway Status: DOWN, KEEPALIVE_FAIL, CONFIG_FAIL, UPGRADE_FAILstatus{gateway=~".+"} != 1 for 5mStatus API
PPS Limit Exceeded > 1%increase(pps_limit_exceeded[5m]) > 0Metrics API
Packet Failure > 5%Not available via APIUse CoPilot webhook
Memory Usage > 90%memory_used_per > 90 for 15mMetrics API

Global Memory Swap Surge

CoPilot monitors all gateways for unexpected swap usage on instances with sufficient RAM. All conditions must be true for 15 minutes.
CoPilot ConditionExternal EquivalentAPI
Swap > 0 bytesmemory_swpd > 0 for 15mMetrics API
Total Memory > 1 GBFilter by instance type (known RAM)External knowledge

Tier 1 — Critical Alerts

Configure these in every deployment. They cover the most impactful failure conditions.
Alert NameSourceExpressionDurationSeverity
Gateway DownStatus APIstatus{gateway=~".+"} != 15 minCritical
Tunnel DownStatus APIstatus{tunnel=~".+"} == 05 minCritical
BGP Peer DownStatus APIbgp_status != 15 minCritical
High CPUMetrics APIcpu_used_per > 9015 minCritical
Memory ExhaustionMetrics APImemory_used_per > 9015 minCritical
Swap ActiveMetrics APImemory_swpd > 015 minWarning
Adjust the memory threshold based on your gateway instance sizes. A gateway with 2 GB of RAM should alert at a different absolute threshold than one with 16 GB.

Tier 2 — Operational Alerts

These provide early warning for capacity issues and performance degradation.
Alert NameSourceExpressionDurationSeverity
Cloud PPS Limit HitMetrics APIincrease(pps_limit_exceeded{interface=~"eth.*"}[5m]) > 015 minWarning
Cloud BW Limit HitMetrics APIincrease(bandwidth_ingress_limit_exceeded{interface=~"eth.*"}[5m]) > 015 minWarning
Packet DropsMetrics APIrate_pkt_drop{interface=~"eth.*"} > 05 minWarning
I/O WaitMetrics APIcpu_wait > 2015 minWarning
High ThroughputMetrics APIrate_total{interface="eth0"} > <80% of instance limit>15 minInfo

Tier 3 — CoPilot Webhook Alerts

The following conditions cannot be monitored via the external APIs. Configure these as CoPilot alert definitions with a webhook notification channel that forwards events to your monitoring platform.
ConditionWhy It Requires a Webhook
Disk Free < 5%Disk metrics are not exposed in the Metrics API
Packet Failure Rate > 5%The per_pkt_fail metric is not exposed in the Metrics API
Underlay Connection DownDPD status is tracked internally but not exposed in the Status API
To configure a webhook channel in CoPilot, navigate to Notifications > Alert Configuration > Channels and create a webhook channel pointing to your monitoring platform’s ingest URL.

Scrape Configuration

scrape_configs:
  # Performance metrics - 5-minute resolution
  - job_name: 'aviatrix-metrics'
    scrape_interval: 5m
    scrape_timeout: 30s
    metrics_path: /metrics-api/v1/gateways
    scheme: https
    authorization:
      type: Bearer
      credentials: '<your-api-key>'
    static_configs:
      - targets: ['<copilot-address>']
    # Exclude loopback and tunnel interfaces
    metric_relabel_configs:
      - source_labels: [interface]
        regex: '(lo|tun-.*)'
        action: drop

  # Availability status - 1-minute resolution
  - job_name: 'aviatrix-status'
    scrape_interval: 1m
    scrape_timeout: 15s
    metrics_path: /status-api/v1/
    scheme: https
    authorization:
      type: Bearer
      credentials: '<your-api-key>'
    static_configs:
      - targets: ['<copilot-address>']

Data Freshness and Timing

AspectMetrics APIStatus API
Collection frequencyEvery 1 minute (internal)Every 1 minute
API cache resolution5-minute intervalsReal-time (latest cache)
Recommended scrapeEvery 5 minutesEvery 1 minute
Lag vs. CoPilot alerts~4-5 minutes~1 minute
CoPilot’s built-in alert engine evaluates metrics every 60 seconds against a real-time internal cache. External monitoring introduces a small lag:
  • Status alerts (gateway/tunnel/BGP down): Expect approximately 1-2 minutes of lag compared to CoPilot’s built-in alerting.
  • Performance alerts (CPU, memory, drops): Expect approximately 5-10 minutes of lag due to the Metrics API’s 5-minute caching window.
This lag is by design. CoPilot’s built-in alerts default to 15-minute evaluation windows to avoid alert fatigue from transient issues. If more granular data is required, you can leverage CoPilot as a drill-down tool, use CoPilot webhook-based alerts, or leverage Syslog via the Aviatrix SIEM connector.

CoPilot Performance Page vs. Metrics API

Values shown on CoPilot’s Monitor > Performance page may not exactly match Metrics API output. This is expected — the two serve different purposes.
AspectCoPilot Performance PageMetrics API
PurposeProcessed insights and trend analysisRaw data for external processing
Data pointsTime-series of aggregated values over a selected rangeSingle latest raw sample per gateway
AggregationConfigurable: Average (default), Min, or MaxNone — returns the most recent raw data point
Time rangeUser-selectable (last hour to 60+ days)Fixed: latest sample from last 15 minutes
Time resolutionDynamic buckets (1 min for last hour, 30 min for last 24 hours, etc.)Single snapshot rounded to 5-minute boundary
Metrics available~90 metrics including derived fields~24 curated metrics (4.32+)
Aggregation is the primary cause of differences. The Performance page displays the average of all 1-minute samples within each time bucket. The Metrics API returns a single raw sample. Dashboards built from the Metrics API will appear noisier than CoPilot’s Performance charts. Your external monitoring platform should apply its own aggregation and smoothing functions.
Trends should align. If CoPilot’s Performance page shows CPU usage climbing over time, your external dashboards should show the same trend. The individual data points may differ, but the direction and magnitude of changes will be consistent.

API Coverage Reference

DataMetrics APIStatus APIAlternative
Gateway up/downNoYes
Tunnel up/downNoYes
BGP peer stateNoYes
CPU utilizationYesNo
Memory utilizationYesNo
Swap usageYesNo
ThroughputYesNo
Packet dropsYesNo
Cloud instance limitsYesNo
Disk utilizationNoNoCoPilot webhook
Packet failure rateNoNoCoPilot webhook
Underlay connection stateNoNoCoPilot webhook
Memory totalNoNoCloud provider API or known instance specs

Resetting the API Key

You can reset the API key from CoPilot. The Network Insights API card displays on the Configuration page only if the feature has been enabled.
If you reset the authorization key, the old key is purged from the system and cannot be retrieved. You must generate a new key and update any scripts that use the old key.
  1. Navigate to Settings > Configuration > General.
  2. Scroll to the Features section.
  3. Under Network Insights API Key, click Reset API Key.
  4. Select the checkbox for “I understand the implications,” and then click Reset.
  5. Copy the key and close the confirmation window.