Working with Amazon OpenSearch Service direct queries with Amazon S3 (preview) - Amazon OpenSearch Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Working with Amazon OpenSearch Service direct queries with Amazon S3 (preview)

This is prerelease documentation for Amazon OpenSearch Service direct queries with Amazon S3, which is in preview release. The documentation and the feature are both subject to change. We recommend that you use this feature only in test environments, and not in production environments. For preview terms and conditions, see Betas and Previews in Amazon Service Terms.

You can use Amazon OpenSearch Service direct queries to query data in Amazon S3. Amazon OpenSearch Service provides a direct query integration with Amazon S3 as a way to analyze operational logs in Amazon S3 and data lakes based in Amazon S3 without having to switch between services. You can now analyze data in cloud object stores—and simultaneously use the operational analytics and visualizations of OpenSearch Service.

With direct queries with Amazon S3, you no longer need to build complex ETL pipelines or incur the expense of duplicating data in both OpenSearch Service and Amazon S3 storage. You can also install integrations of popular log-type templates that include predefined dashboards, and configure data accelerations tailored to that log type. The templates include VPC Flow Logs, Amazon CloudTrail logs, and Amazon S3 logs. The accelerations include skipping indexes, materialized views, and covered indexes.

Pricing

You pay for existing OpenSearch Service and Amazon S3 resources that are used to create and process direct queries. Queries that are sent to Amazon S3 use billable compute and show up as OpenSearch Compute Units (OCUs) per hour.

Direct queries with Amazon S3 are of two types—interactive and index maintenance. Interactive queries perform analytics on your data in Amazon S3. When you run a new query, OpenSearch Service starts a new session that lasts for a minimum of ten minutes. OpenSearch Service keeps the session active to ensure that subsequent queries run quickly. Index maintenance queries use compute to maintain indexes in OpenSearch Service. These queries usually take longer because they ingest a configurable amount of data into OpenSearch Service to make interactive queries run faster.

For more information, see Amazon OpenSearch Service Pricing.

Limitations

The following limitations apply to OpenSearch Service direct queries with Amazon S3.

  • Your OpenSearch domain must be version 2.11 or later to support OpenSearch Service direct queries.

  • OpenSearch Service direct queries with Amazon S3 only support Spark tables within the Amazon Glue Data Catalog. Hive tables don’t support Spark streaming, which is needed to keep indexes up to date.

  • Some data types aren't supported. Supported data types are limited to Parquet, CSV, and JSON.

  • Amazon CloudFormation templates aren't supported in the preview release of direct queries.

  • Your OpenSearch domain and Amazon Glue Data Catalog must be in the same Amazon Web Services account. Your Amazon S3 tables can be in a different account, but must be in the same Amazon Web Services Region as your domain.

  • Nested Spark structures aren't supported. If your source data uses nested structures, you must explode them to rows.

  • Tables created via Athena are not supported.

  • Missing columns may require using the COALESCE SQL function to return results.

  • Not available in OpenSearch Serverless

  • Data must be flattened ahead of querying or you must use SQL in OpenSearch Service to change your nested columns into dedicated columns.

Quotas

Your account has the following quotas related to OpenSearch Service direct queries with Amazon S3. Each time you initiate a query, OpenSearch Service opens a session and keeps it alive for at least ten minutes. This reduces query latency by removing session startup time in subsequent queries.

Description Maxiumum
Connections per domain 20
Data sources per domain 20
Indexes per domain 50
Concurrent sessions per data source 100

Supported Regions

The following Regions are available for OpenSearch Service direct queries with Amazon S3: Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), US East (N. Virginia), US East (Ohio), and US West (Oregon).