Save Storage by Using Derived Source - Amazon OpenSearch Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Save Storage by Using Derived Source

By default, OpenSearch Serverless stores each ingested document in the _source field, which contains the original JSON document body, and indexes individual fields for search. While the _source field is not searchable, it is retained so that the full document can be returned when executing fetch requests, such as get and search. When derived source is enabled, OpenSearch Serverless skips storing the _source field and instead reconstructs it dynamically on demand — for example, during search, get, mget, reindex, or update operations. Using the derived source setting can reduce storage usage by up to 50%.

Configuration

To configure derived source for your index, create the index using the index.derived_source.enabled setting:

PUT my-index1 { "settings": { "index": { "derived_source": { "enabled": true } } } }

Important considerations

  • Only certain field types are supported. For a list of supported fields and limitations, refer to the OpenSearch documentation. If you create an index with derived source and an unsupported field, index creation will fail. If you attempt to ingest a document with an unsupported field in a derived source-enabled index, ingestion will fail. Use this feature only when you are aware of the field types that will be added to your index.

  • The setting index.derived_source.enabled is a static setting. This cannot be changed after the index is created.

Limitations on query responses

When derived source is enabled, it imposes certain limitations on how query responses are generated and returned.

  • Date fields with multiple formats specified always use the first format in the list for all requested documents, regardless of the original ingested format.

  • Geopoint values are returned in a fixed {"lat": lat_val, "lon": lon_val} format and may lose some precision.

  • Multi-value arrays may be sorted, and keyword fields may be deduplicated.

For more details, refer to the OpenSearch blog.

Performance benchmarking

Based on benchmark testing with the nyc_taxi dataset, derived source achieved 58% reduction in index size compared to baseline.

Metric Derived Source
Index Size Reduction 58.3%
Indexing Throughput Change 3.7%
Indexing p90 Latency Change 6.9%
Match-all Query p90 Latency Improvement 19%
Range Query p90 Latency Improvement -18.8%
Distance Amount p90 Agg Latency Improvement -7.3%

For more details, refer to the OpenSearch blog.