Global Network Health Alert

The Global Network Health alert is included by default in CoPilot. It monitors the virtual machines of all your Aviatrix Gateways.

An alert is triggered when the virtual machine of any gateway meets the configured conditions. The default trigger settings for the Global Network Health Alert include the following:

  • Condition: default is matches any condition (OR).

    If any condition is met, an alert is triggered. This can be modified.

  • Gateway Status: default values are Down, Upgrade Fail, Config Fail, or Keep Alive Fail.

    Alert is triggered if the gateway is in any of the selected states. The Gateway Status condition cannot be changed, but values can be added or deleted.

  • PPS Limit Exceeded Drop (%): default is more than 1%.

    Alert is triggered if more than 1% of bidirectional packets are dropped by the Aviatrix gateway due to CSP packets-per-second bandwidth rate limiting. The condition cannot be changed, but the percentage value can be modified.

  • Packet Drop (%): default is more than 5%.

    Alert is triggered if more than 5% of total bidirectional transmitted packets are dropped. The condition cannot be changed, but the percentage value can be modified.

  • Memory Used (%): default is more than 90%.

    Alert is triggered if more than 90% of the total memory is used. The condition cannot be changed, but the percentage value can be modified.

  • Evaluation Period: default is 15 minutes.

    Every 15 minutes, CoPilot evaluates the average of the metric(s) specified to trigger the alert. This setting can be modified.

    For more information about the Evaluation Period, see Setting an Evaluation Period for an Alert.

  • Minimum Count of Matching Entities: default is 1.

    If one or more entities (in this case, gateways) meet the conditions, an alert is triggered. This setting can be modified.

    For an alert configuration, if you set the minimum count of matching entities to be higher than the number of entities being monitored, no alerts will be triggered.

    To see the total number of gateways being monitored, go to table on Cloud Fabric > Gateways > Gateway Management.

All percentage-based network metrics are based on attempted sent + attempted received, where attempted = success (normal sent) + failures (e.g. errors, drops, etc.)

When the alert triggers, the event is listed in CoPilot on the Monitor > Notifications > Alerts page. In the Alerts page, search on Global Network Health to see the number of times by date the alert type was triggered.

If the Global Network Health alert is triggered often, it is recommended that you increase the size of the virtual machine running the gateway that triggered the alert.

You can customize the configuration or configure CoPilot to send a message to a notification channel (email or webhook) of choice when this alert is triggered. See Configuring the Global Network Health Alert.

Configuring the Global Network Health Alert

The Global Network Health alert is configured in CoPilot as a default alert. You can modify the alert, but you cannot delete the alert configuration.

You can also configure CoPilot to send a message to a notification channel (email or webhook) when this alert is triggered.

  1. Go to the CoPilot > Monitor > Notifications page and click on Alert Configurations.

  2. In the Search box, type Global Network Health to quickly locate the relevant row in the table list and click the pencil icon in the row.

  3. In the Edit Alert dialog, do any of the following:

    • Change the OR operator if needed. By default, the alert triggers when any individual condition is met (Matches any condition (OR)). If you want the alert to trigger when all conditions are met, change the operator to Matches All Conditions (ALL).

    • Change the condition threshold of the Limit Exceeded Rate (PPS) metric. This is the number of packets per second that exceed the maximum for the instance type that are processed (bidirectionally) by the Aviatrix gateway. The condition threshold for packets per second can be any positive number.

    • Change the Evaluation Period CoPilot uses for triggering the alert. The period is the duration of time (in minutes) where CoPilot evaluates the average of the metric(s) specified to trigger the alert. When the average value is outside the ideal operating range of your conditions, CoPilot will trigger the alert. The higher the number, the more data points CoPilot includes in the average calculation.

    • Set or change the recipients (email or webhook channels) of notification messages for the alert.

    • Specify whether to send a separate notification message for each gateway that met the alert condition. If this option is off, CoPilot sends one notification message showing all gateways that met the alert condition.

    • (Reset to Defaults) Revert to default configuration values if you decide not to customize the alert’s configuration.

  4. Click Save.

CoPilot automatically resolves this alert when the alert metric(s) no longer meets the condition to trigger the alert. CoPilot resolves this alert only if the Limit Exceeded Rate (PPS) metric on the affected Gateway virtual machines do not meet the conditions any longer.