Granting Amazon OpenSearch Ingestion pipelines access to collections - Amazon OpenSearch Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Granting Amazon OpenSearch Ingestion pipelines access to collections

An Amazon OpenSearch Ingestion pipeline can write to an OpenSearch Serverless public collection or VPC collection. To provide access to the collection, you configure an Amazon Identity and Access Management (IAM) pipeline role with a permissions policy that grants access to the collection. Before you specify the role in your pipeline configuration, you must configure it with an appropriate trust relationship, and then grant it data access permissions through a data access policy.

During pipeline creation, OpenSearch Ingestion creates an Amazon PrivateLink connection between the pipeline and the OpenSearch Serverless collection. All traffic from the pipeline goes through this VPC endpoint and is routed to the collection. In order to reach the collection, the endpoint must be granted access to the collection through a network access policy.

Limitations

The following limitations apply for pipelines that write to OpenSearch Serverless collections:

  • The OTel trace group processor doesn't currently work with OpenSearch Serverless collection sinks.

  • Currently, OpenSearch Ingestion only supports the legacy _template operation, while OpenSearch Serverless supports the composable _index_template operation. Therefore, if your pipeline configuration includes the index_type option, it must be set to management_disabled.

Providing network access to pipelines

Each collection that you create in OpenSearch Serverless has at least one network access policy associated with it. Network access policies determine whether the collection is accessible over the internet from public networks, or whether it must be accessed privately. For more information about network policies, see Network access for Amazon OpenSearch Serverless.

Within a network access policy, you can only specify OpenSearch Serverless-managed VPC endpoints. For more information, see Access Amazon OpenSearch Serverless using an interface endpoint (Amazon PrivateLink). However, in order for the pipeline to write to the collection, the policy must also grant access to the VPC endpoint that OpenSearch Ingestion automatically creates between the pipeline and the collection. Therefore, when you create a pipeline that has an OpenSearch Serverless collection sink, you must provide the name of the associated network policy using the network_policy_name option.

For example:

... sink: - opensearch: hosts: [ "https://{collection-id}.{region}.aoss.amazonaws.com" ] index: "my-index" aws: serverless: true serverless_options: network_policy_name: "{network-policy-name}"

During pipeline creation, OpenSearch Ingestion checks for the existence of the specified network policy. If it doesn't exist, OpenSearch Ingestion creates it. If it does exist, OpenSearch Ingestion updates it by adding a new rule to it. The rule grants access to the VPC endpoint that connects the pipeline and the collection.

For example:

{ "Rules":[ { "Resource":[ "collection/my-collection" ], "ResourceType":"collection" } ], "SourceVPCEs":[ "vpce-0c510712627e27269" # The ID of the VPC endpoint that OpenSearch Ingestion creates between the pipeline and collection ], "Description":"Created by Data Prepper" }

In the console, any rules that OpenSearch Ingestion adds to your network policies are named Created by Data Prepper:

Note

In general, a rule that specifies public access for a collection overrides a rule that specifies private access. Therefore, if the policy already had public access configured, this new rule that OpenSearch Ingestion adds doesn't actually change the behavior of the policy. For more information, see Policy precedence.

If you stop or delete the pipeline, OpenSearch Ingestion deletes the VPC endpoint between the pipeline and the collection. It also modifies the network policy to remove the VPC endpoint from the list of allowed endpoints. If you restart the pipeline, it recreates the VPC endpoint and re-updates the network policy with the endpoint ID.

Step 1: Create a pipeline role

The role that you specify in the sts_role_arn parameter of a pipeline configuration must have an attached permissions policy that allows it to send data to the collection sink. It must also have a trust relationship that allows OpenSearch Ingestion to assume the role. For instructions on how to attach a policy to a role, see Adding IAM identity permissions in the IAM User Guide.

The following sample policy demonstrates the least privilege that you can provide in a pipeline configuration's sts_role_arn role for it to write to collections:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Statement1", "Effect": "Allow", "Action": [ "aoss:APIAccessAll", "aoss:BatchGetCollection", "aoss:CreateSecurityPolicy", "aoss:GetSecurityPolicy", "aoss:UpdateSecurityPolicy" ], "Resource": "*" } ] }

The role must have the following trust relationship, which allows OpenSearch Ingestion to assume it:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "osis-pipelines.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }

Step 2: Create a collection

Create an OpenSearch Serverless collection with the following settings. For instructions to create a collection, see Creating collections.

Data access policy

Create a data access policy for the collection that grants the required permissions to the pipeline role. For example:

[ { "Rules": [ { "Resource": [ "index/{collection-name}/*" ], "Permission": [ "aoss:CreateIndex", "aoss:UpdateIndex", "aoss:DescribeIndex", "aoss:WriteDocument" ], "ResourceType": "index" } ], "Principal": [ "arn:aws:iam::{account-id}:role/{pipeline-role}" ], "Description": "Pipeline role access" } ]
Note

In the Principal element, specify the Amazon Resource Name (ARN) of the pipeline role that you created in the previous step.

Network access policy

Create a network access policy for the collection. You can ingest data into a public collection or a VPC collection. For example, the following policy provides access to a single OpenSearch Serverless-managed VPC endpoint:

[ { "Description":"Rule 1", "Rules":[ { "ResourceType":"collection", "Resource":[ "collection/{collection-name}" ] } ], "AllowFromPublic": false, "SourceVPCEs":[ "vpce-050f79086ee71ac05" ] } ]
Important

You must specify the name of the network policy within the network_policy_name option in the pipeline configuration. At the time of pipeline creation, OpenSearch Ingestion updates this network policy to allow access to the VPC endpoint that it automatically creates between the pipeline and the collection. See step 3 for an example pipeline configuration. For more information, see Providing network access to pipelines.

Step 3: Create a pipeline

Finally, create a pipeline in which you specify the pipeline role and collection details. The pipeline assumes this role in order to sign requests to the OpenSearch Serverless collection sink.

Make sure to do the following:

  • For the hosts option, specify the endpoint of the collection that you created in step 2.

  • For the sts_role_arn option, specify the Amazon Resource Name (ARN) of the pipeline role that you created in step 1.

  • Set the serverless option to true.

  • Set the network_policy_name option to the name of the network policy attached to the collection. OpenSearch Ingestion automatically updates this network policy to allow access from the VPC that it creates between the pipeline and the collection. For more information, see Providing network access to pipelines.

version: "2" log-pipeline: source: http: path: "/log/ingest" processor: - date: from_time_received: true destination: "@timestamp" sink: - opensearch: hosts: [ "https://{collection-id}.{region}.aoss.amazonaws.com" ] index: "my-index" aws: serverless: true serverless_options: network_policy_name: "{network-policy-name}" # If the policy doesn't exist, a new policy is created. region: "us-east-1" sts_role_arn: "arn:aws:iam::{account-id}:role/{pipeline-role}"

For a full reference of required and unsupported parameters, see Supported plugins and options for Amazon OpenSearch Ingestion pipelines.