Using the Neptune Blue/Green solution to perform blue-green updates - Amazon Neptune
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using the Neptune Blue/Green solution to perform blue-green updates

Amazon Neptune engine upgrades can require application downtime because the database is unavailable while the updates are being installed and verified. This is true whether they are initiated manually or automatically.

Neptune provides a Blue/Green deployment solution that you can run using an Amazon CloudFormation stack and that greatly reduces such downtime. It creates a green staging environment that is synchronized with your blue production environment. You can then update that staging environment to perform a minor or major engine version upgrade, a graph data model change, or an operating-system update, and test the result. Finally, you can switch it over quickly to become your production environment, with very little downtime.

The Neptune Blue/Green solution goes through two phases, as illustrated in this diagram:

High-level flow diagram of the blue-green deployment strategy

Phase 1 creates a Green DB cluster identical to your production cluster

The solution creates a DB cluster with a unique blue/green deployment identifier and with the same cluster topology as your production cluster. That is, it has the same number and sizes of DB instances, the same parameter groups and all the same configurations as the production (blue) DB cluster except that it has been upgraded to the target engine version that you specified, which must be higher than your current (blue) engine version. You can specify a minor and major engine version for the target. If necessary, the solution will perform any intermediate upgrades required to reach the specified target engine version. This new cluster becomes the green staging environment.

Phase 2 sets up continuous data synchronization

After the green environment has been fully prepared, the solution sets up continuous replication between the source (blue) cluster and the target (green) cluster using Neptune streams. When the replication difference between them reaches zero, the staging environment is ready for testing. At that point you must pause writing to the blue cluster to avoid any further replication lag.

Your target engine version may have new features or dependencies that affect your applications. Check the target engine release page and intervening engine release pages under Engine releases to see what has changed since your current engine version. It's best to run integration tests or verify your applications manually on the green cluster before promoting it to the production environment.

After you have tested and qualified the changes in the green cluster, just switch the database endpoint in your applications from the blue to the green cluster.

After switchover, the Neptune Blue/Green solution does not delete the old blue production environment. You will still have access to it for additional validation and testing if needed. Standard billing charges do apply to its instances until you delete them. The Blue/Green solution also uses other Amazon services, the costs for which are billed at normal prices. Details on deleting the solution when you're done with it are covered in the clean up section.

Prerequisites for running the Neptune Blue/Green stack

Before launching the Neptune Blue/Green stack:

  • Be sure to enable Neptune streams on your production (blue) cluster.

  • All the instances in your blue cluster must be in the available state. You can check instance states in the Neptune console or by using the describe-db-instances API.

  • All instances must also be in sync with the DB cluster parameter group.

  • The Neptune Blue/Green solution requires a DynamoDB VPC endpoint in the VPC where your blue cluster is located. See Using Amazon VPC endpoints to access DynamoDB.

  • Choose at time to run the solution when the write workload on your blue production DB cluster will be as light as possible. Avoid, for example, running the solution when a bulk load will be taking place, or when there's likely to be a large number of write operations for any other reason.

Using an Amazon CloudFormation template to run the Neptune Blue/Green solution

You can use Amazon CloudFormation to deploy the Neptune Blue/Green solution. The CloudFormation template creates an Amazon EC2 instance in the same VPC as your blue source Neptune database, installs the solution there, and runs it. You can monitor its progress in CloudWatch logs, as explained in Monitoring progress.

You can use these links to review the solution template, or select the Launch Stack button to launch it in the Amazon CloudFormation console:

In the console, choose the Amazon region where you want to run the solution from the dropdown at the upper right of the window.

Set the stack parameters as follows:

  • DeploymentID   –   An identifier that is unique to each Neptune Blue/Green deployment.

    It is used as the green DB cluster identifier, and as a prefix for naming new resources created during the deployment.

  • NeptuneSourceClusterId   –   The identifier of the blue DB cluster that you want to upgrade.

  • NeptuneTargetClusterVersion:   –   The Neptune engine version that you want to upgrade the blue DB cluster to.

    This must be higher than the current blue DB cluster's engine version.

  • DeploymentMode   –   Indicates whether this is a new deployment or an attempt to resume a previous deployment. When you are using same DeploymentID as a previous deployment, set DeploymentMode to resume.

    Valid values are: new (the default), and resume.

  • GraphQueryType   –   The graph data type for your database.

    Valid values are: propertygraph (the default), and rdf.

  • SubnetId   –   A subnet ID from the same VPC that your blue DB cluster is located in. (see Connecting to a Neptune DB Cluster from an Amazon EC2 instance in the same VPC).

    Provide the ID of a public subnet if you want to SSH to the instance through EC2 Connect.

  • InstanceSecurityGroup   –   A security group for your Amazon EC2 instance.

    The security group must have access to your blue DB cluster, and you must be able to SSH to the instance. See Create a security group using the VPC console.

Wait until the stack is complete. As soon as it's done the solution is started. You can then monitor deployment process using CloudWatch logs as described in the next section.

Monitoring the progress of a Neptune Blue/Green deployment

You can monitor the progress of the Neptune Blue/Green solution by going to the CloudWatch console and looking at logs in the /aws/neptune/(Neptune Blue/Green deployment ID) CloudWatch log group. You can find a link to the CloudWatch logs in the outputs of the solution's Amazon CloudFormation stack:

Screenshot of the Blue/Green Amazon CloudFormation stack output

If you provided a public subnet as a stack parameter, you can also SSH to your Amazon EC2 instance created as part of the stack and refer to the log in /var/log/cloud-init-output.log.

The log shows the actions taken by the Neptune Blue/Green solution, as shown in this screenshot:

Screenshot of the Neptune Blue/Green log screen

Log messages show the sync status between the blue and green clusters:

Screenshot of Neptune Blue/Green solution log messages

The sync process checks the replication lag by computing the difference between the latest stream eventID on the blue cluster and the replication checkpoint present in the DynamoDB checkpoint table created by the Neptune-to-Neptune replication stack. Using these messages, you can monitor the current replication difference.

Cutting over from the production blue cluster to the updated green cluster

Before promoting the green cluster to production, ensure that the commit difference between the blue and green clusters is zero and then disable all write traffic to the blue cluster. Continuing to write to the blue cluster while switching the database endpoint to the green cluster can result in data corruption caused by writing partial data to both clusters. You may not need to disable read traffic yet.

If you have enabled IAM authentication on the source (blue) cluster, be sure to update any IAM policies used in your applications to point to the green cluster (for an example of such a policy, see this unrestricted access policy).

After disabling write traffic, wait for replication to finish and then enable write traffic on the green cluster (but not on the blue cluster). Switch read traffic from the blue to the green cluster as well.

Cleaning up after the Neptune Blue/Green solution has completed

After you have promoted the staging (green) cluster to production, clean up the resources created by the Neptune Blue/Green solution:

  • Delete the Amazon EC2 instance that was created to run the solution.

  • Delete the Amazon CloudFormation templates for the Neptune streams-based replication that kept the green cluster in sync with the blue cluster. The main one has the stack name that you provided earlier, and one is composed of the deployment ID followd by "-replication": that is, (DeploymentID)-replication.

Deleting Amazon CloudFormation templates doesn't delete the clusters themselves. Once you have verified that the green cluster is working as expected, you can optionally take a snapshot before manually deleting the blue cluster.

Neptune Blue/Green solution best practices

  • Before switching your green cluster over to production, it is worth thoroughly verifying that it is functioning properly. Check the consistency of the data and the configuration of the database. It is possible that some of the new engine versions require client upgrades as well. Check the engine release notes before you upgrade. It is worth testing all this in development, testing, and pre-production environments before starting a blue/green upgrade in production.

  • It is best to perform the switch-over from the blue to the green server during your maintence window.

  • To ensure that everything is working properly after upgrading and synchronizing, it's worth keeping your original cluster for some period of time before deleting it. It could prove useful if an unforseen issue arises.

  • Avoid heavy write operations such as bulk loads when running the Neptune Blue/Green solution, because they can cause replication lag that introduces significant downtime. Ideally, the time between turning off writes to your blue cluster and turning them on for your green cluster is just a few moments.

Troubleshooting the Neptune Blue/Green solution

Errors raised by the Neptune Blue/Green solution
  • Cluster with id = (blue_green_deployment_id) already exists   –   There is an existing cluster with identifier (blue_green_deployment_id).

    Provide a new deployment ID or set the deployment mode to resume if the cluster was created in a previous Neptune Blue/Green run.

  • Streams should be enabled on the source Cluster for Blue Green Deployment   –   Enable Neptune streams on the blue (source) cluster.

  • No Bulkload should be in progress on source cluster: (cluster_id)   –   The Neptune Blue/Green solution terminates if it identifies an ongoing bulk load.

    This is to ensure that the sync process is able to catch up with writes being made. Avoid or cancel any ongoing bulk load job before starting the Neptune Blue/Green solution.

  • Blue Green deployment requires instances to be in sync with db cluster parameter group   –   Any changes to cluster parameter group should be in sync throughout the DB cluster. See Amazon Neptune parameter groups.

  • Invalid target engine version for Blue Green Deployment   –   The target engine version must be listed as active in Engine releases for Amazon Neptune, and must be higher than the current engine release of the source (blue) cluster.