Controller HA in AWS¶
Aviatrix Controller HA in AWS leverages an auto scaling group and a Lambda function to perform monitoring the health of the current Controller, launching a new controller and restoring the configuration when the active controller instance become unreachable.
When a new controller is launched, the existing controller is terminated and its EIP is associated to the newly launched controller. Existing configuration is restored, resulting in a seamless experience when failover happens.
- Existing AVX Controller. If you have not yet launched an AVX Controller, please follow this guide.
- The Controller’s VPC should have one or more public subnets, preferably in different AZs for HA across multiple AZ.
- To use Controller HA with an ELB, refer to here
- Controller has enabled backup function.
Controller HA Details¶
Aviatrix Controller HA operates by relying on an AWS Auto Scaling Group. This ASG has a desired capacity of 1. If the Controller EC2 instance is stopped or terminated, it will be automatically re-deployed by the ASG.
An AWS Lambda script is notified via SNS when new instances are launched by the Auto Scaling Group. This script handles configuration using a recent Controller backup file. The Aviatrix Controller manages these backups once enabled.
Restoring the Aviatrix Controller from a newly built instance requires access to the S3 bucket to retrieve the latest backup file. In order to do this, the newly built EC2 Controller instance must be granted permission to read files in the bucket. The simplest method of doing this is via an IAM user with programmatic access to the S3 bucket.
Steps to Enable Controller HA¶
Launch CloudFormation Stack¶
Log in to the AWS console and switch to the region where your existing AVX Controller is installed.
Launch this CloudFormation stack
Populate the fields as follows:
Field Expected Value Stack name Any valid stack name. Enter VPC of existing controller instance. Select the VPC in this region where the AVX Controller is installed. Enter one or more subnets in different Availability zones within that VPC. Select the subnet where the Controller is installed and optionally one additional subnet for redundancy. Enter Name tag of the existing Aviatrix Controller instance. Enter the Name tag for the existing Controller EC2 instance. Enter S3 Bucket which will be used to store backup files. Name of S3 bucket that stores the backup files from the AVX Controller. Enter an email to receive notifications for autoscaling group events Enter an email address that will be notified whenever a new Controller is provisioned.
Populate any additional CloudFormation Options.
Check “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”
Refresh the Stacks page and wait for the status of this stack to change to CREATE_COMPLETE
If the stack fails (and ends with status of ROLLBACK_COMPLETE) check the log messages in the Events section. If you see an error that says “Failed to create resource. AMI is not latest. Cannot enable Controller HA. Please backup/restore to the latest AMI before enabling controller HA. ”, then follow the steps outlined here.
This stack creates the following:
- An Autoscaling group of size 1 and associated security group
- A SNS topic with same name as of existing controller instance
- An email subscription to the SNS topic (optional)
- A Lambda function for setting up HA and restoring configuration automatically
- An AWS Role for Lambda and corresponding role policy with required permissions
Additional instructions and code are available here.
Steps to Disable Controller HA¶
You can disable Controller HA by deleting the Controller HA CloudFormation stack.
Log in to AWS Console, go to CloudFormation Service, identify the CloudFormation stack you used to enable Controller HA and delete the stack.
- Can two controllers in two different regions be linked such that they can detect if one or the other is down? Is this possible?
- Our Controller HA script leverages EC2 auto scaling. EC2 auto scaling doesn’t support cross regions but it does support cross AZs. The script will automatically bring up a new Controller in case the existing Controller enters an unhealthy state.
- Could a controller in a different region be used to restore a saved configuration in case of disaster recovery? Will the change in controller’s IP cause any issues?
- A controller can be manually launched from a different region and the backed up configuration can be restored on it. The controller’s new EIP shouldn’t cause any issue unless SAML VPN authentication is being used. (All peering tunnels will still work). In that case, SAML VPN client will need reach the controller IP address. If FQDN hostname is used for the controller for SAML, then it should work after changing the Route 53 to resolve to the correct EIP in the different region.