Tutorial: Using the Amazon Glue Connector for Elasticsearch
Elasticsearch is a popular open-source search and analytics engine for use cases such as log
analytics, real-time application monitoring, and clickstream analysis. You can use OpenSearch as a
data store for your extract, transform, and load (ETL) jobs by configuring the
Amazon Glue Connector for Elasticsearch in Amazon Glue Studio. This connector
is available for free from
Amazon Web Services Marketplace
Note
The Amazon Web Services Marketplace
Elasticsearch Spark Connector
In this tutorial, we will show how to connect to your Amazon OpenSearch Service nodes with a minimal number of steps.
Topics
- Prerequisites
- Step 1: (Optional) Create an Amazon secret for your OpenSearch cluster information
- Step 2: Subscribe to the connector
- Step 3: Activate the connector in Amazon Glue Studio and create a connection
- Step 4: Configure an IAM role for your ETL job
- Step 5: Create a job that uses the OpenSearch connection
- Step 6: Run the job
Prerequisites
To use this tutorial, you must have the following:
-
Access to Amazon Glue Studio
-
Access to an OpenSearch cluster in the Amazon Cloud
-
(Optional) Access to Amazon Secrets Manager.
Step 1: (Optional) Create an Amazon secret for your OpenSearch cluster information
To safely store and use your connection credential, save your credential in Amazon Secrets Manager. The secret you create will be used later in the tutorial by the connection. The credential key-value pairs will be fed into the Amazon Glue Connector for Elasticsearch as normal connection options.
For more information about creating secrets, see Creating and Managing Secrets with Amazon Secrets Manager in the Amazon Secrets Manager User Guide.
To create an Amazon secret
-
Sign in to the Amazon Secrets Manager console
. -
On either the service introduction page or the Secrets list page, choose Store a new secret.
-
On the Store a new secret page, choose Other type of secret. This option means that you must supply the structure and details of your secret.
-
Add a Key and Value pair for the OpenSearch cluster user name. For example:
es.net.http.auth.user
:username
-
Choose + Add row, and enter another key-value pair for the password. For example:
es.net.http.auth.pass
:password
-
Choose Next.
-
Enter a secret name. For example: my-es-secret. You can optionally include a description.
Record the secret name, which is used later in this tutorial, and then choose Next.
-
Choose Next again, and then choose Store to create the secret.
Next step
Step 2: Subscribe to the connector
Step 2: Subscribe to the connector
The Amazon Glue Connector for Elasticsearch is available for free from
Amazon Web Services Marketplace
To subscribe to the Amazon Glue Connector for Elasticsearch on Amazon Web Services Marketplace
-
If you have not already configured your Amazon account to use License Manager, do the following:
-
Open the Amazon License Manager console at https://console.amazonaws.cn/license-manager
. -
Choose Create customer managed license.
-
In the IAM permissions (one-time setup) window, choose I grant Amazon License Manager the required permissions, and then choose Grant permissions.
If you do not see this window, then you have already configured the necessary permissions.
-
Open the Amazon Glue Studio console at https://console.amazonaws.cn/gluestudio/
. -
In the Amazon Glue Studio console, expand the menu icon ( ), and then choose Connectors in the navigation pane.
-
On the Connectors page, choose Go to Amazon Web Services Marketplace.
-
In Amazon Web Services Marketplace, in the Search Amazon Glue Studio products section, enter Amazon Glue Connector for Elasticsearch in the search field, and then press Enter.
-
Choose the name of the connector, Amazon Glue Connector for Elasticsearch.
-
On the product page for the connector, use the tabs to view information about the connector. When you're ready to continue, choose Continue to Subscribe.
-
Review the terms of use for the software. Click Accept Terms.
-
When the subscription process completes, you will see a notification: "Thank you for subscribing to this product! You can now configure your software." Above the banner will be the button Continue to Configuration. Choose Continue to Configuration.
-
Choose the Fulfillment option on the Configure this software page. You can either choose between Amazon Glue 1.0/2.0 or Amazon Glue 3.0. Then, choose Continue to Launch.
Next step
Step 3: Activate the connector in Amazon Glue Studio and create a connection
Step 3: Activate the connector in Amazon Glue Studio and create a connection
After you choose Continue to Launch, you see the Launch this software page in Amazon Web Services Marketplace. After you use the link to activate the connector in Amazon Glue Studio, you create a connection.
To deploy the connector and create a connection in Amazon Glue Studio
-
On the Launch this software page in the Amazon Web Services Marketplace console, choose Usage Instructions, and then choose the link in the window that appears.
Your browser is redirected to the Amazon Glue Studio console Create marketplace connection page.
-
Enter a name for the connection. For example: my-es-connection.
-
In the Connection access section, for Connection credential type, choose User name and password.
-
For the Amazon secret, enter the name of your secret. For example: my-es-secret.
-
In the Network options section, enter the VPC information to connect to OpenSearch cluster.
-
Choose Create connection and activate connector.
Next step
Step 4: Configure an IAM role for your ETL job
Step 4: Configure an IAM role for your ETL job
When you create the Amazon Glue ETL job, you specify an Amazon Identity and Access Management (IAM) role for the job to use. The role must grant access to all resources used by the job, including Amazon S3 (for any sources, targets, scripts, driver files, and temporary directories), and also Amazon Glue Data Catalog objects.
The assumed IAM role for the Amazon Glue ETL job must also have access to the secret that was
created in the previous section. By default, the AWS managed role AWSGlueServiceRole
does not have access to the secret. To set up access control for your secrets, see Authentication
and Access Control for Amazon Secrets Manager and Limiting Access to Specific Secrets.
To configure an IAM role for your ETL job
-
Configure the permissions described in Review IAM permissions needed for ETL jobs.
-
Configure the additional permissions needed when using connectors with Amazon Glue Studio, as described in Permissions required for using connectors.
Next step
Step 5: Create a job that uses the OpenSearch connection
Step 5: Create a job that uses the OpenSearch connection
After creating a role for your ETL job, you can create a job in Amazon Glue Studio that uses the connection and connector for Open Spark ElasticSearch.
If your job runs within a Amazon Virtual Private Cloud (Amazon VPC), make sure the VPC is configured correctly. For more information, see Configure a VPC for your ETL job.
To create a job that uses the Elasticsearch Spark Connector
-
In Amazon Glue Studio, choose Connectors.
-
In the Your connections list, select the connection you just created and choose Create job.
-
In the visual job editor, choose the Data source node. On the right, on the Data source properties - Connector tab, configure additional information for the connector.
-
Choose Add schema and enter the schema of the data set in the data source. Connections do not use tables stored in the Data Catalog, which means that Amazon Glue Studio doesn't know the schema of the data. You must manually provide this schema information. For instructions on how to use the schema editor, see Editing the schema in a custom transform node.
-
Expand Connection options.
-
Choose Add new option and enter the information needed for the connector that was not entered in the Amazon secret:
-
es.nodes: https://<OpenSearch domain endpoint>
-
es.port: 443
-
path: test
-
es.nodes.wan.only: true
For an explanation of these connection options, refer to: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html
. -
-
-
Add a target node to the graph.
Your data target can be Amazon S3, or it can use information from an Amazon Glue Data Catalog or a connector to write data in a different location. For example, you can use a Data Catalog table to write to a database in Amazon RDS, or you can use a connector as your data target to write to data stores that are not natively supported in Amazon Glue.
If you choose a connector for your data target, you must choose a connection created for that connector. Also, if required by the connector provider, you must add options to provide additional information to the connector. If you use a connection that contains information for an Amazon secret, then you don’t need to provide the user name and password authentication in the connection options.
-
Optionally, add additional data sources and one or more transform nodes as described in Transform data with Amazon Glue managed transforms.
-
Configure the job properties as described in Modify the job properties, starting with step 3, and save the job.
Next step
Step 6: Run the job
After you save your job, you can run the job to perform the ETL operations.
To run the job you created for the Amazon Glue Connector for Elasticsearch
-
Using the Amazon Glue Studio console, on the visual editor page, choose Run.
-
In the success banner, choose Run Details, or you can choose the Runs tab of the visual editor to view information about the job run.