Using an OpenSearch Ingestion pipeline with OpenSearch - Amazon OpenSearch Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using an OpenSearch Ingestion pipeline with OpenSearch

You can use an OpenSearch Ingestion pipeline with self-managed OpenSearch or Elasticsearch to migrate data to Amazon OpenSearch Service domains and OpenSearch Serverless collections. OpenSearch Ingestion supports both public and private network configurations for the migration of data from self-managed OpenSearch and Elasticsearch.

Connectivity to public OpenSearch clusters

You can use OpenSearch Ingestion pipelines to migrate data from a self-managed OpenSearch or Elasticsearch cluster with a public configuration, which means that the domain DNS name can be publicly resolved. To do so, set up an OpenSearch Ingestion pipeline with self-managed OpenSearch or Elasticsearch as the source and OpenSearch Service or OpenSearch Serverless as the destination. This effectively migrates your data from a self-managed source cluster to an Amazon-managed destination domain or collection.

Prerequisites

Before you create your OpenSearch Ingestion pipeline, perform the following steps:

  1. Create a self-managed OpenSearch or Elastisearch cluster that contains the data you want to migrate and configure a public DNS name.

  2. Create an OpenSearch Service domain or OpenSearch Serverless collection where you want to migrate data to. For more information, see Creating OpenSearch Service domains and Creating collections.

  3. Set up authentication on your self-managed cluster with Amazon Secrets Manager. Enable secrets rotation by following the steps in Rotate Amazon Secrets Manager secrets.

  4. Attach a resource-based policy to your domain or a data access policy to your collection. These access policies allow OpenSearch Ingestion to write data from your self-managed cluster to your domain or collection.

    The following sample domain access policy allows the pipeline role, which you create in the next step, to write data to a domain. Make sure that you update the resource with your own ARN.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::{pipeline-account-id}:role/pipeline-role" }, "Action": [ "es:DescribeDomain", "es:ESHttp*" ], "Resource": [ "arn:aws:es:{region}:{account-id}:domain/domain-name" ] } ] }

    To create an IAM role with the correct permissions to access write data to the collection or domain, see Required permissions for domains and Required permissions for collections.

Step 1: Configure the pipeline role

After you have your OpenSearch pipeline prerequisites set up, configure the pipeline role that you want to use in your pipeline configuration, and add permission to write to an OpenSearch Service domain or OpenSearch Serverless collection, as well as permission to read secrets from Secrets Manager.

Step 2: Create the pipeline

You can then configure an OpenSearch Ingestion pipeline like the following, which specifies OpenSearch as the source.

You can specify multiple OpenSearch Service domains as destinations for your data. This capability enables conditional routing or replication of incoming data into multiple OpenSearch Service domains.

You can also migrate data from a source OpenSearch or Elasticsearch cluster to an OpenSearch Serverless VPC collection. Ensure you provide a network access policy within the pipeline configuration.

version: "2" opensearch-migration-pipeline: source: opensearch: acknowledgments: true host: [ "https://my-self-managed-cluster-name:9200" ] indices: include: - index_name_regex: "include-.*" exclude: - index_name_regex: '\..*' authentication: username: ${aws_secrets:secret:username} password: ${aws_secrets:secret:password} scheduling: interval: "PT2H" index_read_count: 3 start_time: "2023-06-02T22:01:30.00Z" sink: - opensearch: hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"] aws: sts_role_arn: "arn:aws:iam::{account-id}:role/pipeline-role" region: "us-east-1" #Uncomment the following lines if your destination is an OpenSearch Serverless collection #serverless: true # serverless_options: # network_policy_name: "network-policy-name" index: "${getMetadata(\"opensearch-index\")}" document_id: "${getMetadata(\"opensearch-document_id\")}" enable_request_compression: true dlq: s3: bucket: "bucket-name" key_path_prefix: "apache-log-pipeline/logs/dlq" region: "us-east-1" sts_role_arn: "arn:aws:iam::{account-id}:role/pipeline-role" extension: aws: secrets: secret: secret_id: "my-opensearch-secret" region: "us-east-1" sts_role_arn: "arn:aws:iam::{account-id}:role/pipeline-role" refresh_interval: PT1H

You can use a preconfigured blueprint to create this pipeline. For more information, see Using blueprints to create a pipeline.

Connectivity to OpenSearch clusters in a VPC

You can also use OpenSearch Ingestion pipelines to migrate data from a self-managed OpenSearch or Elasticsearch cluster running in a VPC. To do so, set up an OpenSearch Ingestion pipeline with self-managed OpenSearch or Elasticsearch as the source and OpenSearch Service or OpenSearch Serverless as the destination. This effectively migrates your data from a self-managed source cluster to an Amazon-managed destination domain or collection.

Prerequisites

Before you create your OpenSearch Ingestion pipeline, perform the following steps:

  1. Create a self-managed OpenSearch or Elastisearch cluster with a VPC network configuration that contains the data you want to migrate.

  2. Create an OpenSearch Service domain or OpenSearch Serverless collection where you want to migrate data to. For more information, see Creating OpenSearch Service domains and Creating collections.

  3. Set up authentication on your self-managed cluster with Amazon Secrets Manager. Enable secrets rotation by following the steps in Rotate Amazon Secrets Manager secrets.

  4. Obtain the ID of the VPC that that has access to self-managed OpenSearch or Elasticsearch. Choose the VPC CIDR to be used by OpenSearch Ingestion.

    Note

    If you're using the Amazon Web Services Management Console to create your pipeline, you must also attach your OpenSearch Ingestion pipeline to your VPC in order to use self-managed OpenSearch or Elasticsearch. To do so, find the Network configuration section, select the Attach to VPC checkbox, and choose your CIDR from one of the provided default options, or select your own.

    To provide a custom CIDR, select Other from the dropdown menu. To avoid a collision in IP addresses between OpenSearch Ingestion and self-managed OpenSearch, ensure that the self-managed OpenSearch VPC CIDR is different from the CIDR for OpenSearch Ingestion.

  5. Attach a resource-based policy to your domain or a data access policy to your collection. These access policies allow OpenSearch Ingestion to write data from your self-managed cluster to your domain or collection.

    The following sample domain access policy allows the pipeline role, which you create in the next step, to write data to a domain. Make sure that you update the resource with your own ARN.

    { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::{pipeline-account-id}:role/pipeline-role" }, "Action": [ "es:DescribeDomain", "es:ESHttp*" ], "Resource": [ "arn:aws:es:{region}:{account-id}:domain/domain-name" ] } ] }

    To create an IAM role with the correct permissions to access write data to the collection or domain, see Required permissions for domains and Required permissions for collections.

Step 1: Configure the pipeline role

After you have your pipeline prerequisites set up, configure the pipeline role that you want to use in your pipeline configuration, and add the following permissions in the role:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "SecretsManagerReadAccess", "Effect": "Allow", "Action": [ "secretsmanager:GetSecretValue" ], "Resource": ["arn:aws:secretsmanager:{region}:{account-id}:secret:secret-name"] }, { "Effect": "Allow", "Action": [ "ec2:AttachNetworkInterface", "ec2:CreateNetworkInterface", "ec2:CreateNetworkInterfacePermission", "ec2:DeleteNetworkInterface", "ec2:DeleteNetworkInterfacePermission", "ec2:DetachNetworkInterface", "ec2:DescribeNetworkInterfaces" ], "Resource": [ "arn:aws:ec2:*:{account-id}:network-interface/*", "arn:aws:ec2:*:{account-id}:subnet/*", "arn:aws:ec2:*:{account-id}:security-group/*" ] }, { "Effect": "Allow", "Action": [ "ec2:DescribeDhcpOptions", "ec2:DescribeRouteTables", "ec2:DescribeSecurityGroups", "ec2:DescribeSubnets", "ec2:DescribeVpcs", "ec2:Describe*" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ec2:CreateTags" ], "Resource": "arn:aws:ec2:*:*:network-interface/*", "Condition": { "StringEquals": { "aws:RequestTag/OSISManaged": "true" } } } ] }

You must provide the above Amazon EC2 permissions on the IAM role that you use to create the OpenSearch Ingestion pipeline because the pipeline uses these permissions to create and delete a network interface in your VPC. The pipeline can only access the OpenSearch cluster through this network interface.

Step 2: Create the pipeline

You can then configure an OpenSearch Ingestion pipeline like the following, which specifies OpenSearch as the source.

You can specify multiple OpenSearch Service domains as destinations for your data. This capability enables conditional routing or replication of incoming data into multiple OpenSearch Service domains.

You can also migrate data from a source OpenSearch or Elasticsearch cluster to an OpenSearch Serverless VPC collection. Ensure you provide a network access policy within the pipeline configuration.

version: "2" opensearch-migration-pipeline: source: opensearch: acknowledgments: true host: [ "https://my-self-managed-cluster-name:9200" ] indices: include: - index_name_regex: "include-.*" exclude: - index_name_regex: '\..*' authentication: username: ${aws_secrets:secret:username} password: ${aws_secrets:secret:password} scheduling: interval: "PT2H" index_read_count: 3 start_time: "2023-06-02T22:01:30.00Z" sink: - opensearch: hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"] aws: sts_role_arn: "arn:aws:iam::{account-id}:role/pipeline-role" region: "us-east-1" #Uncomment the following lines if your destination is an OpenSearch Serverless collection #serverless: true # serverless_options: # network_policy_name: "network-policy-name" index: "${getMetadata(\"opensearch-index\")}" document_id: "${getMetadata(\"opensearch-document_id\")}" enable_request_compression: true dlq: s3: bucket: "bucket-name" key_path_prefix: "apache-log-pipeline/logs/dlq" region: "us-east-1" sts_role_arn: "arn:aws:iam::{account-id}:role/pipeline-role" extension: aws: secrets: secret: secret_id: "my-opensearch-secret" region: "us-east-1" sts_role_arn: "arn:aws:iam::{account-id}:role/pipeline-role" refresh_interval: PT1H

You can use a preconfigured blueprint to create this pipeline. For more information, see Using blueprints to create a pipeline.