Step 5: Create a job that uses the OpenSearch connection - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Step 5: Create a job that uses the OpenSearch connection

After creating a role for your ETL job, you can create a job in Amazon Glue Studio that uses the connection and connector for Open Spark ElasticSearch.

If your job runs within a Amazon Virtual Private Cloud (Amazon VPC), make sure the VPC is configured correctly. For more information, see Configure a VPC for your ETL job.

To create a job that uses the Elasticsearch Spark Connector
  1. In Amazon Glue Studio, choose Connectors.

  2. In the Your connections list, select the connection you just created and choose Create job.

  3. In the visual job editor, choose the Data source node. On the right, on the Data source properties - Connector tab, configure additional information for the connector.

    1. Choose Add schema and enter the schema of the data set in the data source. Connections do not use tables stored in the Data Catalog, which means that Amazon Glue Studio doesn't know the schema of the data. You must manually provide this schema information. For instructions on how to use the schema editor, see Editing the schema in a custom transform node.

    2. Expand Connection options.

    3. Choose Add new option and enter the information needed for the connector that was not entered in the Amazon secret:

      • es.nodes: https://<OpenSearch domain endpoint>

      • es.port: 443

      • path: test

      • es.nodes.wan.only: true

      For an explanation of these connection options, refer to: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html.

  4. Add a target node to the graph.

    Your data target can be Amazon S3, or it can use information from an Amazon Glue Data Catalog or a connector to write data in a different location. For example, you can use a Data Catalog table to write to a database in Amazon RDS, or you can use a connector as your data target to write to data stores that are not natively supported in Amazon Glue.

    If you choose a connector for your data target, you must choose a connection created for that connector. Also, if required by the connector provider, you must add options to provide additional information to the connector. If you use a connection that contains information for an Amazon secret, then you don’t need to provide the user name and password authentication in the connection options.

  5. Optionally, add additional data sources and one or more transform nodes as described in Editing Amazon Glue managed data transform nodes.

  6. Configure the job properties as described in Modify the job properties, starting with step 3, and save the job.

Next step

Step 6: Run the job