Amazon Athena Redis connector - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Athena Redis connector

The Amazon Athena Redis connector enables Amazon Athena to communicate with your Redis instances so that you can query your Redis data with SQL. You can use the Amazon Glue Data Catalog to map your Redis key-value pairs into virtual tables.

Unlike traditional relational data stores, Redis does not have the concept of a table or a column. Instead, Redis offers key-value access patterns where the key is essentially a string and the value is a string, z-set, or hmap.

You can use the Amazon Glue Data Catalog to create schema and configure virtual tables. Special table properties tell the Athena Redis connector how to map your Redis keys and values into a table. For more information, see Setting up databases and tables in Amazon Glue later in this document.

If you have Lake Formation enabled in your account, the IAM role for your Athena federated Lambda connector that you deployed in the Amazon Serverless Application Repository must have read access in Lake Formation to the Amazon Glue Data Catalog.

The Amazon Athena Redis connector supports Amazon MemoryDB for Redis and Amazon ElastiCache for Redis.

Prerequisites

Parameters

Use the Lambda environment variables in this section to configure the Redis connector.

  • spill_bucket – Specifies the Amazon S3 bucket for data that exceeds Lambda function limits.

  • spill_prefix – (Optional) Defaults to a subfolder in the specified spill_bucket called athena-federation-spill. We recommend that you configure an Amazon S3 storage lifecycle on this location to delete spills older than a predetermined number of days or hours.

  • spill_put_request_headers – (Optional) A JSON encoded map of request headers and values for the Amazon S3 putObject request that is used for spilling (for example, {"x-amz-server-side-encryption" : "AES256"}). For other possible headers, see PutObject in the Amazon Simple Storage Service API Reference.

  • kms_key_id – (Optional) By default, any data that is spilled to Amazon S3 is encrypted using the AES-GCM authenticated encryption mode and a randomly generated key. To have your Lambda function use stronger encryption keys generated by KMS like a7e63k4b-8loc-40db-a2a1-4d0en2cd8331, you can specify a KMS key ID.

  • disable_spill_encryption – (Optional) When set to True, disables spill encryption. Defaults to False so that data that is spilled to S3 is encrypted using AES-GCM – either using a randomly generated key or KMS to generate keys. Disabling spill encryption can improve performance, especially if your spill location uses server-side encryption.

  • glue_catalog – (Optional) Use this option to specify a cross-account Amazon Glue catalog. By default, the connector attempts to get metadata from its own Amazon Glue account.

Setting up databases and tables in Amazon Glue

To enable an Amazon Glue table for use with Redis, you can set the following table properties on the table: redis-endpoint, redis-value-type, and either redis-keys-zset or redis-key-prefix.

In addition, any Amazon Glue database that contains Redis tables must have a redis-db-flag in the URI property of the database. To set the redis-db-flag URI property, use the Amazon Glue console to edit the database.

The following list describes the table properties.

  • redis-endpoint – (Required) The hostname:port:password of the Redis server that contains data for this table (for example, athena-federation-demo.cache.amazonaws.com:6379) Alternatively, you can store the endpoint, or part of the endpoint, in Amazon Secrets Manager by using ${Secret_Name} as the table property value.

Note

To use the Athena Federated Query feature with Amazon Secrets Manager, the VPC connected to your Lambda function should have internet access or a VPC endpoint to connect to Secrets Manager.

  • redis-keys-zset – (Required if redis-key-prefix is not used) A comma-separated list of keys whose value is a zset (for example, active-orders,pending-orders). Each of the values in the zset is treated as a key that is part of the table. Either the redis-keys-zset property or the redis-key-prefix property must be set.

  • redis-key-prefix – (Required if redis-keys-zset is not used) A comma separated list of key prefixes to scan for values in the table (for example, accounts-*,acct-). Either the redis-key-prefix property or the redis-keys-zset property must be set.

  • redis-value-type – (Required) Defines how the values for the keys defined by either redis-key-prefix or redis-keys-zset are mapped to your table. A literal maps to a single column. A zset also maps to a single column, but each key can store many rows. A hash enables each key to be a row with multiple columns (for example, a hash, literal, or zset.)

  • redis-ssl-flag – (Optional) When True, creates a Redis connection that uses SSL/TLS. The default is False.

  • redis-cluster-flag – (Optional) When True, enables support for clustered Redis instances. The default is False.

  • redis-db-number – (Optional) Applies only to standalone, non-clustered instances.) Set this number (for example 1, 2, or 3) to read from a non-default Redis database. The default is Redis logical database 0. This number does not refer to a database in Athena or Amazon Glue, but to a Redis logical database. For more information, see SELECT index in the Redis documentation.

Data types

The Redis connector supports the following data types. Redis streams are not supported.

All Redis values are retrieved as the string data type. Then they are converted to one of the following Apache Arrow data types based on how your tables are defined in the Amazon Glue Data Catalog.

Amazon Glue data type Apache Arrow data type
int INT
string VARCHAR
bigint BIGINT
double FLOAT8
float FLOAT4
smallint SMALLINT
tinyint TINYINT
boolean BIT
binary VARBINARY

Required Permissions

For full details on the IAM policies that this connector requires, review the Policies section of the athena-redis.yaml file. The following list summarizes the required permissions.

  • Amazon S3 write access – The connector requires write access to a location in Amazon S3 in order to spill results from large queries.

  • Athena GetQueryExecution – The connector uses this permission to fast-fail when the upstream Athena query has terminated.

  • Amazon Glue Data Catalog – The Redis connector requires read only access to the Amazon Glue Data Catalog to obtain schema information.

  • CloudWatch Logs – The connector requires access to CloudWatch Logs for storing logs.

  • Amazon Secrets Manager read access – If you choose to store Redis endpoint details in Secrets Manager, you must grant the connector access to those secrets.

  • VPC access – The connector requires the ability to attach and detach interfaces to your VPC so that it can connect to it and communicate with your Redis instances.

Performance

The Athena Redis connector attempts to parallelize queries against your Redis instance according to the type of table that you have defined (for example, zset keys or prefix keys).

The Athena Redis connector performs predicate pushdown to decrease the data scanned by the query. However, queries containing a predicate against the primary key fail with timeout. LIMIT clauses reduce the amount of data scanned, but if you do not provide a predicate, you should expect SELECT queries with a LIMIT clause to scan at least 16 MB of data. The Redis connector is resilient to throttling due to concurrency.

License information

The Amazon Athena Redis connector project is licensed under the Apache-2.0 License.

See also

For additional information about this connector, visit the corresponding site on GitHub.com.