Best Practices for using Read Replicas - Amazon ElastiCache
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Best Practices for using Read Replicas

Many applications, such as session stores, leaderboards, and recommendation engines, require high availability and handle significantly more read operations than write operations. These applications can often tolerate slightly stale data (eventual consistency), meaning that it's acceptable if different users momentarily see slightly different versions of the same data. For example:

  • Cached query results can often tolerate slightly stale data, especially for cache-aside patterns where the source of truth is external.

  • In a gaming leaderboard, a few seconds delay in updated scores often won't significantly impact the user experience.

  • For session stores, some slight delays in propagating session data across replicas rarely affect application functionality.

  • Recommendation engines typically use historical data analysis, so real-time consistency is less critical.

Eventual consistency means that all replica nodes will eventually return the same data once the replication process is complete, typically within milliseconds. For such use cases, implementing read replicas is an effective strategy to reduce latency when reading from your ElastiCache instance.

Using read replicas in Amazon ElastiCache can provide significant performance benefits through:

Enhanced Read Scalability

  • Distributes read operations across multiple replica nodes

  • Offloads read traffic from the primary node

  • Reduces read latency by serving requests from geographically closer replicas

Optimized Primary Node Performance

  • Dedicates primary node resources to write operations

  • Reduces connection overhead on the primary node

  • Improves write performance and maintains better response times during peak traffic periods

Using Read from Replica in ElastiCache Serverless

ElastiCache serverless provides two different endpoints, for different consistency requirements. The two endpoints use the same DNS name but different ports. In order to use the read-from-replica port, you must authorize access to both ports from your client application by configuring the security groups and network access control lists of your VPC.

Primary endpoint (Port 6379)

  • Use for operations requiring immediate consistency

  • Guarantees reading the most up-to-date data

  • Best for critical transactions and write operations

  • Necessary for write operations

  • Example: test-12345.serverless.use1.cache.amazonaws.com:6379

Latency optimized endpoint (Port 6380)

  • Optimized for read operations that can tolerate eventual consistency

  • When possible, ElastiCache serverless automatically routes read requests to the replica node in the client's local Availability Zone. This optimization provides lower latency by avoiding the additional network latency incurred when retrieving data from a node in a different availability zone.

  • ElastiCache serverless automatically selects available nodes in other zones if a local node is unavailable

  • Example: test-12345.serverless.use1.cache.amazonaws.com:6380

  • Clients like Glide and Lettuce will automatically detect and route reads to the latency optimized endpoint if you provide the read from replica configuration. If your client doesn’t support routing configuration (e.g., valkey-java and older jedis versions), you must define the right port and client configuration to read from replicas.

Connecting to read replicas in ElastiCache Serverless - Valkey and Glide

The following code snippet shows how you can configure read from replica for ElastiCache Serverless in the Valkey glide library. You don’t need to specify port for read from replicas, but you need to configure the routing configuration ReadFrom.PREFER_REPLICA.

package glide.examples; import glide.api.GlideClusterClient; import glide.api.logging.Logger; import glide.api.models.configuration.GlideClusterClientConfiguration; import glide.api.models.configuration.NodeAddress; import glide.api.models.exceptions.ClosingException; import glide.api.models.exceptions.ConnectionException; import glide.api.models.exceptions.TimeoutException; import glide.api.models.configuration.ReadFrom; import java.util.concurrent.CompletableFuture; import java.util.concurrent.ExecutionException; public class ClusterExample { public static void main(String[] args) { // Set logger configuration Logger.setLoggerConfig(Logger.Level.INFO); GlideClusterClient client = null; try { System.out.println("Connecting to Valkey Glide..."); // Configure the Glide Client GlideClusterClientConfiguration config = GlideClusterClientConfiguration.builder() .address(NodeAddress.builder() .host("your-endpoint") .port(6379) .build()) .useTLS(true) .readFrom(ReadFrom.PREFER_REPLICA) .build(); // Create the GlideClusterClient client = GlideClusterClient.createClient(config).get(); System.out.println("Connected successfully."); // Perform SET operation CompletableFuture<String> setResponse = client.set("key", "value"); System.out.println("Set key 'key' to 'value': " + setResponse.get()); // Perform GET operation CompletableFuture<String> getResponse = client.get("key"); System.out.println("Get response for 'key': " + getResponse.get()); // Perform PING operation CompletableFuture<String> pingResponse = client.ping(); System.out.println("PING response: " + pingResponse.get()); } catch (ClosingException | ConnectionException | TimeoutException | ExecutionException e) { System.err.println("An exception occurred: "); e.printStackTrace(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } finally { // Close the client connection if (client != null) { try { client.close(); System.out.println("Client connection closed."); } catch (ClosingException | ExecutionException e) { System.err.println("Error closing client: " + e.getMessage()); } } } } }