Overview
Aviatrix Controller HA in AWS leverages an auto-scaling group and a Lambda function to perform monitoring the health of the current Controller, launching a new Controller and restoring the configuration when the active Controller instance becomes unreachable. When a new Controller is launched, the existing Controller is terminated, its EIP is associated to the newly launched Controller, and the private IP is created in the new Controller subnet. Existing configuration is restored, resulting in a seamless experience when failover happens.Prerequisites
- Existing AVX Controller. If you have not yet launched an AVX
Controller, please see AWS Getting Started Guide.
- Enable Controller backups.
- AMI aviatrix_cloud_services_gateway_043018_YYYY-xxxxxx or later. If you are on an older AMI, see Migrating Your Aviatrix Controller to migrate to the latest Controller AMI first (Controller > Settings > Maintenance > Backup & Restore).
- The Controller’s VPC should have one or more public subnets, preferably in different AZs for HA across multiple AZ.
- To use Controller HA with an ELB, see Load Balancing.
- Controller has enabled backup function.
Controller HA Details
Aviatrix Controller HA operates by relying on an AWS Auto Scaling Group. This ASG has a desired capacity of 1 (and minimum capacity = 0 and maximum capacity = 1). If the Controller EC2 instance is stopped or terminated, it will be automatically re-deployed by the ASG. An AWS Lambda script is notified via SNS when new instances are launched by the Auto Scaling Group. This script handles configuration using a recent Controller backup file. The Aviatrix Controller manages these backups once enabled. Restoring the Aviatrix Controller from a newly built instance requires access to the S3 bucket to retrieve the latest backup file. In order to do this, the newly built EC2 Controller instance must be granted permission to read files in the bucket. The simplest method of doing this is via an IAM user with programmatic access to the S3 bucket. The lambda script also requires access to the S3 bucket. It is recommended that the backup bucket is used in the same account that was used to launch the controller.Enabling AWS Controller High Availability
- Log in to the AWS console and switch to the region where your existing AVX Controller is installed.
- Launch a CloudFormation template stack:
- For Controller version 7.2 or later, use this CloudFormation template.
- For Controller earlier than version 7.2, use this CloudFormation template.
Template v4 creates its own
execute-api VPC endpoint. If the VPC already has a centralized execute-api endpoint with PrivateDnsEnabled: true, it overrides DNS resolution and redirects traffic away from the template endpoint, causing a 403 AccessDeniedException.
In this case, you have two options to resolve the issue: either remove or opt out of the centralized endpoint from the Controller VPC (recommended), or use template v3 with a Lambda Function URL as a fallback.- Click Next on the Create stack page to accept the CloudFormation stack defaults:
- Prerequisite - Prepare template: Template is ready
- Specify template: Amazon S3 URL
- On the Specify Stack details page, populate the fields as follows:
| Field | Expected Value |
|---|---|
| Stack name | Any valid stack name. |
| Network Configuration | |
| Enter VPC of existing controller instance | Select the VPC in this region where the AVX Controller is installed. |
| Enter one or more subnets in different Availability zones within that VPC | Select a PUBLIC subnet of the controller VPC. Optionally one additional subnet for redundancy. |
| Aviatrix Controller Backup Configuration | |
| Enter Name tag of the existing Aviatrix Controller instance | Enter the Name tag for the existing Controller EC2 instance. |
| Enter the controller IAM EC2 role name | Enter the controller IAM APP role name if it is different than the default? |
| Enter Name tag of the existing Aviatrix Controller instance | Enter the Name tag for the existing Controller EC2 instance. |
| Enter the controller IAM EC2 role name | Enter the controller IAM APP role name if it is different than the default? |
| Enter S3 Bucket which will be used to store backup files. | Name of S3 bucket that stores the backup files from the AVX Controller. |
| Enter an email to receive notifications for autoscaling group events | Enter an email address that will be notified whenever a new Controller is provisioned. |
The S3 bucket you use or create for Controller HA and Backups does not need to have public access enabled and should be configured to restrict general public access.
- Click Next.
- Populate any additional CloudFormation Options.
- Click Next.
- Check “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”
- Click Create.
- Refresh the Stacks page and wait for the status of this stack to change to CREATE_COMPLETE
If the stack fails (and ends with status of ROLLBACK_COMPLETE) check
the log messages in the Events section. If you see an error that says
“Failed to create resource. AMI is not latest. Cannot enable Controller
HA. Please backup/restore to the latest AMI before enabling controller
HA. ”, then follow the steps outlined
here.
This CloudFormation template creates the following:Core HA Components
- An Aviatrix Auto Scaling Group (size 1): Maintains exactly one Controller instance at all times. If the instance fails, a replacement automatically launches to ensure continuous availability
- A new security group: Creates fresh firewall rules to control network access to the Controller instance
- An SNS topic with the same name as the existing Controller instance
- An email subscription to the SNS topic (optional)
- A Lambda function for setting up HA and restoring configuration automatically
- An Aviatrix Role for Lambda with corresponding role policy and required permissions
- VPC Endpoint
- Private API Gateway: The Controller can call the Lambda function’s /controller_version endpoint privately (without going over public internet). This endpoint is restricted to access from the VPC Endpoint only
Please note that if you change the Controller name or change the backup
destination bucket on S3, your Controller HA will not work as expected.
You would have to delete the Controller HA CloudFormation Stack and
redeploy it.
During spinning up the HA after the current active controller stops or being terminated by accident, you won’t see a new Controller for a few minutes on AWS console. This is expected.
Disabling AWS Controller High Availability
You can disable Controller HA by deleting the Controller HA CloudFormation stack.- Please take a backup from the Controller first: Go to Controller > Settings > Maintenance > Backup & Restore > Backup Now. Verify that the S3 bucket now contains these backup files.
- Check the ASG capacity first. It should be minimum capacity=0, maximum capacity=1, desired capacity=1. If these are changed, deleting the Controller HA Cloudformation stack could have an impact on your current Controller.
- Log in to AWS Console, go to CloudFormation Service, identify the CloudFormation stack you used to enable Controller HA and delete the stack.
AWS High Availability FAQ
- How can I know which version of the HA script I am running?
- How can I get notifications for H/A events?
- My H/A event failed. What can I do?
- How do I ensure that lambda is pointing to the right backup?
- Where do I find logs related to Controller H/A ?
- How do I make lambda talk to the Controller privately within the VPC?
- Can two Controllers in two different regions be linked such that they can detect if one or the other is down? Is this possible?
- Could a Controller in a different region be used to restore a saved configuration in case of disaster recovery? Will the change in the Controller’s IP cause any issues?
- How do I manage the Controller HA stack if the controller instance’s disk is encrypted?
- What do I need to do after I change the Controller name?
- How do I update from CloudFormation template version 3 to version 4?
- Direct stack update (recommended): Click Update Stack in the CloudFormation console and apply the v4 template. This is faster than deleting and recreating the stack.
- Full replacement: If there are changes to the controller configuration, delete the current stack and create a new one with the v4 template.