Connectivity to Confluent Cloud public Kafka clusters

Using an OpenSearch Ingestion pipeline with Confluent Cloud Kafka

You can use an OpenSearch Ingestion pipeline to stream data from Confluent Cloud Kafka clusters to Amazon OpenSearch Service domains and OpenSearch Serverless collections. OpenSearch Ingestion supports both public and private network configurations for the streaming of data from Confluent Cloud Kafka clusters to domains or collections managed by OpenSearch Service or OpenSearch Serverless.

Connectivity to Confluent Cloud public Kafka clusters

You can use OpenSearch Ingestion pipelines to migrate data from a Confluent Cloud Kafka cluster with a public configuration, which means that the domain DNS name can be publicly resolved. To do so, set up an OpenSearch Ingestion pipeline with Confluent Cloud public Kafka cluster as the source and OpenSearch Service or OpenSearch Serverless as the destination. This processes your streaming data from a self-managed source cluster to an Amazon-managed destination domain or collection.

Prerequisites

Before you create your OpenSearch Ingestion pipeline, perform the following steps:

Create a Confluent Cloud Kafka clusters cluster acting as a source. The cluster should contain the data you want to ingest into OpenSearch Service.
Create an OpenSearch Service domain or OpenSearch Serverless collection where you want to migrate data to. For more information, see Creating OpenSearch Service domains and Creating collections.
Set up authentication on your Confluent Cloud Kafka cluster with Amazon Secrets Manager. Enable secrets rotation by following the steps in Rotate Amazon Secrets Manager secrets.
Attach a resource-based policy to your domain or a data access policy to your collection. These access policies allow OpenSearch Ingestion to write data from your self-managed cluster to your domain or collection.

The following sample domain access policy allows the pipeline role, which you create in the next step, to write data to a domain. Make sure that you update the resource with your own ARN.
JSON
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::pipeline-account-id:role/pipeline-role" }, "Action": [ "es:DescribeDomain", "es:ESHttp*" ], "Resource": [ "arn:aws:es:region:account-id:domain/domain-name" ] } ] }
To create an IAM role with the correct permissions to access write data to the collection or domain, see Setting up roles and users in Amazon OpenSearch Ingestion.

Step 1: Configure the pipeline role

After you have your Confluent Cloud Kafka cluster pipeline prerequisites set up, configure the pipeline role that you want to use in your pipeline configuration, and add permission to write to an OpenSearch Service domain or OpenSearch Serverless collection, as well as permission to read secrets from Secrets Manager.

The following permission is needed to manage the network interface:

The following is permission needed to read secrets from Amazon Secrets Manager service:

The following permissions are needed to write to an Amazon OpenSearch Service domain:


{
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::account-id:role/pipeline-role"
      },
      "Action": ["es:DescribeDomain", "es:ESHttp*"],
      "Resource": "arn:aws:es:region:account-id:domain/domain-name/*"
    }
  ]
}

Step 2: Create the pipeline

You can then configure an OpenSearch Ingestion pipeline like the following, which specifies your Confluent Cloud Kafka as the source.

You can specify multiple OpenSearch Service domains as destinations for your data. This capability enables conditional routing or replication of incoming data into multiple OpenSearch Service domains.

You can also migrate data from a source Confluent Kafka cluster to an OpenSearch Serverless VPC collection. Ensure you provide a network access policy within the pipeline configuration. You can use a Confluent schema registry to define a Confluent schema.


version: "2"
kafka-pipeline:
  source:
    kafka:
      encryption:
        type: "ssl"
      topics:
        - name: "topic-name"
          group_id: "group-id"
      bootstrap_servers:
        - "bootstrap-server.us-east-1.aws.private.confluent.cloud:9092"
      authentication:
        sasl:
          plain:
            username: ${aws_secrets:confluent-kafka-secret:username}
            password: ${aws_secrets:confluent-kafka-secret:password}
      schema:
        type: confluent
        registry_url: https://my-registry.us-east-1.aws.confluent.cloud
        api_key: "${{aws_secrets:schema-secret:schema_registry_api_key}}"
        api_secret: "${{aws_secrets:schema-secret:schema_registry_api_secret}}"
        basic_auth_credentials_source: "USER_INFO"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
  aws:
    secrets:
      confluent-kafka-secret:
        secret_id: "my-kafka-secret"
        region: "us-east-1"
      schema-secret:
        secret_id: "my-self-managed-kafka-schema"
        region: "us-east-1"

You can use a preconfigured blueprint to create this pipeline. For more information, see Working with blueprints.

Connectivity to Confluent Cloud Kafka clusters in a VPC

You can also use OpenSearch Ingestion pipelines to migrate data from a Confluent Cloud Kafka cluster running in a VPC. To do so, set up an OpenSearch Ingestion pipeline with a Confluent Cloud Kafka cluster as a source and OpenSearch Service or OpenSearch Serverless as the destination. This processes your streaming data from a Confluent Cloud Kafka source cluster to an Amazon-managed destination domain or collection.

OpenSearch Ingestion supports Confluent Cloud Kafka clusters configured in all supported network modes in Confluent. The following modes of network configuration are supported as a source in OpenSearch Ingestion:

Amazon VPC peering
Amazon PrivateLink for dedicated clusters
Amazon PrivateLink for Enterprise clusters
Amazon Transit Gateway

Prerequisites

Before you create your OpenSearch Ingestion pipeline, perform the following steps:

Create a Confluent Cloud Kafka cluster with a VPC network configuration that contains the data you want to ingest into OpenSearch Service.
Create an OpenSearch Service domain or OpenSearch Serverless collection where you want to migrate data to. For more information, see For more information, see Creating OpenSearch Service domains and Creating collections.
Set up authentication on your Confluent Cloud Kafka cluster with Amazon Secrets Manager. Enable secrets rotation by following the steps in Rotate Amazon Secrets Manager secrets.
Obtain the ID of the VPC that has access to the Confluent Cloud Kafka cluster. Choose the VPC CIDR to be used by OpenSearch Ingestion.

Note
If you're using the Amazon Web Services Management Console to create your pipeline, you must also attach your OpenSearch Ingestion pipeline to your VPC in order to use Confluent Cloud Kafka cluster. To do so, find the Network configuration section, select the Attach to VPC checkbox, and choose your CIDR from one of the provided default options, or select your own. You can use any CIDR from a private address space as defined in the RFC 1918 Best Current Practice.
To provide a custom CIDR, select Other from the dropdown menu. To avoid a collision in IP addresses between OpenSearch Ingestion and self-managed OpenSearch, ensure that the self-managed OpenSearch VPC CIDR is different from the CIDR for OpenSearch Ingestion.
Attach a resource-based policy to your domain or a data access policy to your collection. These access policies allow OpenSearch Ingestion to write data from your self-managed cluster to your domain or collection.
Note
If you are using Amazon PrivateLink to connect your Confluent Cloud Kafka, you will need to configure VPC DHCP Options. DNS hostnames and DNS resolutionshould be enabled.
Specifically, use the following option set values:
```
domain-name: aws.private.confluent.cloud
domain-name-servers: AmazonProvidedDNS
```
This change ensures that DNS resolution for the Confluent PrivateLink endpoint works correctly within the VPC.
The following sample domain access policy allows the pipeline role, which you create in the next step, to write data to a domain. Make sure that you update the resource with your own ARN.
JSON
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::pipeline-account-id:role/pipeline-role" }, "Action": [ "es:DescribeDomain", "es:ESHttp*" ], "Resource": [ "arn:aws:es:region:account-id:domain/domain-name" ] } ] }
To create an IAM role with the correct permissions to access write data to the collection or domain, see Setting up roles and users in Amazon OpenSearch Ingestion.

Step 1: Configure the pipeline role

After you have your pipeline prerequisites set up, configure the pipeline role that you want to use in your pipeline configuration, and add the following permissions in the role:

You must provide the above Amazon EC2 permissions on the IAM role that you use to create the OpenSearch Ingestion pipeline because the pipeline uses these permissions to create and delete a network interface in your VPC. The pipeline can only access the Kafka cluster through this network interface.

Step 2: Create the pipeline

You can then configure an OpenSearch Ingestion pipeline like the following, which specifies Kafka as the source.


 version: "2"
kafka-pipeline:
  source:
    kafka:
      encryption:
        type: "ssl"
      topics:
        - name: "topic-name"
          group_id: "group-id"
      bootstrap_servers:
        - "bootstrap-server.us-east-1.aws.private.confluent.cloud:9092"
      authentication:
        sasl:
          plain:
            username: ${aws_secrets:confluent-kafka-secret:username}
            password: ${aws_secrets:confluent-kafka-secret:password}
      schema:
        type: confluent
        registry_url: https://my-registry.us-east-1.aws.confluent.cloud
        api_key: "${{aws_secrets:schema-secret:schema_registry_api_key}}"
        api_secret: "${{aws_secrets:schema-secret:schema_registry_api_secret}}"
        basic_auth_credentials_source: "USER_INFO"
  sink:
  - opensearch:
      hosts: ["https://search-mydomain.us-east-1.es.amazonaws.com"]
      aws:
          region: "us-east-1"
      index: "confluent-index"
extension:
  aws:
    secrets:
      confluent-kafka-secret:
        secret_id: "my-kafka-secret"
        region: "us-east-1"
      schema-secret:
        secret_id: "my-self-managed-kafka-schema"
        region: "us-east-2"

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Amazon DocumentDB

Amazon MSK

Using an OpenSearch Ingestion pipeline with Confluent Cloud Kafka

Connectivity to Confluent Cloud public Kafka clusters

Prerequisites

Step 1: Configure the pipeline role

Step 2: Create the pipeline

Connectivity to Confluent Cloud Kafka clusters in a VPC

Prerequisites

Note

Note

Step 1: Configure the pipeline role

Step 2: Create the pipeline