Supported plugins and options for Amazon OpenSearch Ingestion pipelines
Amazon OpenSearch Ingestion supports a subset of sources, processors, and sinks within open source
OpenSearch Data
Prepper
Note
OpenSearch Ingestion doesn't support any buffer plugins because it automatically configures a default buffer. You receive a validation error if you include a buffer in your pipeline configuration.
Topics
Supported plugins
OpenSearch Ingestion supports the following Data Prepper plugins:
Sources:
Processors:
-
Mutate event
(series of processors) -
Mutate string
(series of processors)
Sinks:
-
OpenSearch
(supports OpenSearch Service, OpenSearch Serverless, and Elasticsearch 6.8 or later)
Sink codecs:
Stateless versus stateful processors
Stateless processors perform operations like transformations and
filtering, while stateful processors perform operations like
aggregations, which remember the result of the previous run. OpenSearch Ingestion supports
the stateful processors Aggregate
For pipelines that contain only stateless processors, the maximum capacity limit is 96 Ingestion OCUs. If a pipeline contains any stateful processors, the maximum capacity limit is 48 Ingestion OCUs. However, if a pipeline has persistent buffering enabled, it can have a maximum of 384 Ingestion OCUs with only stateless processors, or 192 Ingestion OCUs if it contains any stateful processors. For more information, see Scaling pipelines in Amazon OpenSearch Ingestion.
End-to-end acknowledgment is only supported for stateless processors. For more information, see End-to-end acknowledgement.
Configuration requirements and constraints
Unless otherwise specified below, all options described in the Data Prepper configuration reference for the supported plugins listed above are allowed in OpenSearch Ingestion pipelines. The following sections explain the constraints that OpenSearch Ingestion places on certain plugin options.
Note
OpenSearch Ingestion doesn't support any buffer plugins because it automatically configures a default buffer. You receive a validation error if you include a buffer in your pipeline configuration.
Many options are configured and managed internally by OpenSearch Ingestion, such as
authentication and acm_certificate_arn. Other options,
such as thread_count and request_timeout, have performance
impacts if changed manually. Therefore, these values are set internally to ensure
optimal performance of your pipelines.
Lastly, some options can't be passed to OpenSearch Ingestion, such as
ism_policy_file and sink_template, because they're local
files when run in open source Data Prepper. These values aren't supported.
Topics
General pipeline options
The following general pipeline options
-
workers -
delay
Grok processor
The following Grok
-
patterns_directories -
patterns_files_glob
HTTP source
The HTTP
-
The
pathoption is required. The path is a string such as/log/ingest, which represents the URI path for log ingestion. This path defines the URI that you use to send data to the pipeline. For example,https://log-pipeline.us-west-2.osis.amazonaws.com. The path must start with a slash (/), and can contain the special characters '-', '_', '.', and '/', as well as the/log/ingest${pipelineName}placeholder. -
The following HTTP source options are set by OpenSearch Ingestion and aren't supported in pipeline configurations:
-
port -
ssl -
ssl_key_file -
ssl_certificate_file -
aws_region -
authentication -
unauthenticated_health_check -
use_acm_certificate_for_ssl -
thread_count -
request_timeout -
max_connection_count -
max_pending_requests -
health_check_service -
acm_private_key_password -
acm_certificate_timeout_millis -
acm_certificate_arn
-
OpenSearch sink
The OpenSearch
-
The
awsoption is required, and must contain the following options:-
sts_role_arn -
region -
hosts -
serverless(if the sink is an OpenSearch Serverless collection)
-
-
The
sts_role_arnoption must point to the same role for each sink within a YAML definition file. -
The
hostsoption must specify an OpenSearch Service domain endpoint or an OpenSearch Serverless collection endpoint. You can't specify a custom endpoint for a domain; it must be the standard endpoint. -
If the
hostsoption is a serverless collection endpoint, you must set theserverlessoption totrue. In addition, if your YAML definition file contains theindex_typeoption, it must be set tomanagement_disabled, otherwise validation fails. -
The following options aren't supported:
-
username -
password -
cert -
proxy -
dlq_file- If you want to offload failed events to a dead letter queue (DLQ), you must use thedlqoption and specify an S3 bucket. -
ism_policy_file -
socket_timeout -
template_file -
insecure
-
OTel metrics source, OTel trace source, and OTel logs source
The OTel metrics
-
The
pathoption is required. The path is a string such as/log/ingest, which represents the URI path for log ingestion. This path defines the URI that you use to send data to the pipeline. For example,https://log-pipeline.us-west-2.osis.amazonaws.com. The path must start with a slash (/), and can contain the special characters '-', '_', '.', and '/', as well as the/log/ingest${pipelineName}placeholder. -
The following options are set by OpenSearch Ingestion and aren't supported in pipeline configurations:
-
port -
ssl -
sslKeyFile -
sslKeyCertChainFile -
authentication -
unauthenticated_health_check -
useAcmCertForSSL -
unframed_requests -
proto_reflection_service -
thread_count -
request_timeout -
max_connection_count -
acmPrivateKeyPassword -
acmCertIssueTimeOutMillis -
health_check_service -
acmCertificateArn -
awsRegion
-
OTel trace group processor
The OTel trace group
-
The
awsoption is required, and must contain the following options:-
sts_role_arn -
region -
hosts
-
-
The
sts_role_arnoption specify the same role as the pipeline role that you specify in the OpenSearch sink configuration. -
The
username,password,cert, andinsecureoptions aren't supported. -
The
aws_sigv4option is required and must be set to true. -
The
serverlessoption within the OpenSearch sink plugin isn't supported. The Otel trace group processor doesn't currently work with OpenSearch Serverless collections. -
The number of
otel_trace_groupprocessors within the pipeline configuration body can't exceed 8.
OTel trace processor
The OTel trace
-
The value of the
trace_flush_intervaloption can't exceed 300 seconds.
Service-map processor
The Service-map
-
The value of the
window_durationoption can't exceed 300 seconds.
S3 source
The S3
-
The
awsoption is required, and must containregionandsts_role_arnoptions. -
The value of the
records_to_accumulateoption can't exceed 200. -
The value of the
maximum_messagesoption can't exceed 10. -
If specified, the
disable_bucket_ownership_validationoption must be set to false. -
If specified, the
input_serializationoption must be set toparquet.