

# Tutorial: Using the Amazon Glue Connector for Elasticsearch
<a name="tutorial-elastisearch-connector"></a>

Elasticsearch is a popular open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis. You can use OpenSearch as a data store for your extract, transform, and load (ETL) jobs by configuring the Amazon Glue Connector for Elasticsearch in Amazon Glue Studio. This connector is available for free from [Amazon Web Services Marketplace](https://aws.amazon.com/marketplace/pp/prodview-v5ygernwn2gb6). 

**Note**  
 The [Amazon Web Services Marketplace Elasticsearch Spark Connector ](https://aws.amazon.com/marketplace/pp/B08PPT2V5J) has been deprecated. Please use the [Amazon Glue Connector for Elasticsearch ](https://aws.amazon.com/marketplace/pp/prodview-v5ygernwn2gb6) instead. 

 In this tutorial, we will show how to connect to your Amazon OpenSearch Service nodes with a minimal number of steps. 

**Topics**
+ [Prerequisites](#tutorial-prerequisites)
+ [Step 1: (Optional) Create an Amazon secret for your OpenSearch cluster information](#tutorial-step1)
+ [Step 2: Subscribe to the connector](#tutorial-step2)
+ [Step 3: Activate the connector in Amazon Glue Studio and create a connection](#tutorial-step3)
+ [Step 4: Configure an IAM role for your ETL job](#tutorial-step4)
+ [Step 5: Create a job that uses the OpenSearch connection](#tutorial-step5)
+ [Step 6: Run the job](#tutorial-step6)

## Prerequisites
<a name="tutorial-prerequisites"></a>

To use this tutorial, you must have the following:
+ Access to Amazon Glue Studio
+ Access to an OpenSearch cluster in the Amazon Cloud
+ (Optional) Access to Amazon Secrets Manager.

## Step 1: (Optional) Create an Amazon secret for your OpenSearch cluster information
<a name="tutorial-step1"></a>

 To safely store and use your connection credential, save your credential in Amazon Secrets Manager. The secret you create will be used later in the tutorial by the connection. The credential key-value pairs will be fed into the Amazon Glue Connector for Elasticsearch as normal connection options. 

For more information about creating secrets, see [Creating and Managing Secrets with Amazon Secrets Manager](https://docs.amazonaws.cn/secretsmanager/latest/userguide/managing-secrets.html) in the *Amazon Secrets Manager User Guide*.

**To create an Amazon secret**

1. Sign in to the [Amazon Secrets Manager console](https://console.amazonaws.cn/secretsmanager/).

1. On either the service introduction page or the **Secrets** list page, choose **Store a new secret**.

1. On the **Store a new secret** page, choose **Other type of secret**. This option means that you must supply the structure and details of your secret.

1. Add a **Key** and **Value** pair for the OpenSearch cluster user name. For example:

   `es.net.http.auth.user`: {{username}}

1. Choose **\+ Add row**, and enter another key-value pair for the password. For example:

   `es.net.http.auth.pass`: {{password}}

1. Choose **Next**.

1. Enter a secret name. For example: **my-es-secret**. You can optionally include a description.

   Record the secret name, which is used later in this tutorial, and then choose **Next**.

1. Choose **Next** again, and then choose **Store** to create the secret.

### Next step
<a name="tutorial-step1.2"></a>

 [Step 2: Subscribe to the connector](#tutorial-step2) 

## Step 2: Subscribe to the connector
<a name="tutorial-step2"></a>

The Amazon Glue Connector for Elasticsearch is available for free from [Amazon Web Services Marketplace](https://aws.amazon.com/marketplace/pp/prodview-v5ygernwn2gb6#pdp-pricing). 

**To subscribe to the Amazon Glue Connector for Elasticsearch on Amazon Web Services Marketplace**

1. If you have not already configured your Amazon account to use License Manager, do the following:

   1. Open the Amazon License Manager console at [https://console.amazonaws.cn/license-manager](https://console.amazonaws.cn/license-manager).

   1. Choose **Create customer managed license**.

   1. In the **IAM permissions (one-time setup)** window, choose **I grant Amazon License Manager the required permissions**, and then choose **Grant permissions**.

      If you do not see this window, then you have already configured the necessary permissions.

1. Open the Amazon Glue Studio console at [https://console.amazonaws.cn/gluestudio/](https://console.amazonaws.cn/gluestudio/).

1. In the Amazon Glue Studio console, expand the menu icon (![3 short, horizontal lines in a vertical stack](http://docs.amazonaws.cn/en_us/glue/latest/dg/images/nav-menu-icon.png)), and then choose **Connectors** in the navigation pane.

1. On the **Connectors** page, choose **Go to Amazon Web Services Marketplace**.

1. In Amazon Web Services Marketplace, in the **Search Amazon Glue Studio products** section, enter **Amazon Glue Connector for Elasticsearch ** in the search field, and then press Enter. 

1.  Choose the name of the connector, **Amazon Glue Connector for Elasticsearch**. 

1. On the product page for the connector, use the tabs to view information about the connector. When you're ready to continue, choose **Continue to Subscribe**.

1.  Review the terms of use for the software. Click **Accept Terms**. 

1.  When the subscription process completes, you will see a notification: "Thank you for subscribing to this product\! You can now configure your software." Above the banner will be the button **Continue to Configuration**. Choose **Continue to Configuration**. 

1.  Choose the Fulfillment option on the **Configure this software** page. You can either choose between Amazon Glue 1.0/2.0 or Amazon Glue 3.0. Then, choose **Continue to Launch**.

### Next step
<a name="tutorial-step2.1"></a>

 [Step 3: Activate the connector in Amazon Glue Studio and create a connection](#tutorial-step3) 

## Step 3: Activate the connector in Amazon Glue Studio and create a connection
<a name="tutorial-step3"></a>

After you choose **Continue to Launch**, you see the **Launch this software** page in Amazon Web Services Marketplace. After you use the link to activate the connector in Amazon Glue Studio, you create a connection. 

**To deploy the connector and create a connection in Amazon Glue Studio**

1. On the **Launch this software** page in the Amazon Web Services Marketplace console, choose **Usage Instructions**, and then choose the link in the window that appears.

   Your browser is redirected to the Amazon Glue Studio console **Create marketplace connection** page.

1. Enter a name for the connection. For example: **my-es-connection**.

1. In the **Connection access** section, for **Connection credential type**, choose **User name and password**. 

1. For the **Amazon secret**, enter the name of your secret. For example: **my-es-secret**.

1. In the **Network options** section, enter the VPC information to connect to OpenSearch cluster. 

1. Choose **Create connection and activate connector**.

### Next step
<a name="tutorial-step3.1"></a>

 [Step 4: Configure an IAM role for your ETL job](#tutorial-step4) 

## Step 4: Configure an IAM role for your ETL job
<a name="tutorial-step4"></a>

When you create the Amazon Glue ETL job, you specify an Amazon Identity and Access Management (IAM) role for the job to use. The role must grant access to all resources used by the job, including Amazon S3 (for any sources, targets, scripts, driver files, and temporary directories), and also Amazon Glue Data Catalog objects.

The assumed IAM role for the Amazon Glue ETL job must also have access to the secret that was created in the previous section. By default, the AWS managed role `AWSGlueServiceRole` does not have access to the secret. To set up access control for your secrets, see [Authentication and Access Control for Amazon Secrets Manager](https://docs.amazonaws.cn/secretsmanager/latest/userguide/auth-and-access.html) and [Limiting Access to Specific Secrets](https://docs.amazonaws.cn/secretsmanager/latest/userguide/auth-and-access_identity-based-policies.html#permissions_grant-limited-resources).

**To configure an IAM role for your ETL job**

1. Configure the permissions described in [Review IAM permissions needed for ETL jobs](getting-started-min-privs-job.md).

1. Configure the additional permissions needed when using connectors with Amazon Glue Studio, as described in [Permissions required for using connectors](getting-started-min-privs-job.md#getting-started-min-privs-connectors).

### Next step
<a name="tutorial-step4.1"></a>

 [Step 5: Create a job that uses the OpenSearch connection](#tutorial-step5) 

## Step 5: Create a job that uses the OpenSearch connection
<a name="tutorial-step5"></a>

After creating a role for your ETL job, you can create a job in Amazon Glue Studio that uses the connection and connector for Open Spark ElasticSearch.

If your job runs within a Amazon Virtual Private Cloud (Amazon VPC), make sure the VPC is configured correctly. For more information, see [Configure a VPC for your ETL job](getting-started-vpc-config.md).

**To create a job that uses the Elasticsearch Spark Connector**

1. In Amazon Glue Studio, choose **Connectors**.

1. In the **Your connections** list, select the connection you just created and choose **Create job**.

1. In the visual job editor, choose the Data source node. On the right, on the **Data source properties - Connector** tab, configure additional information for the connector. 

   1. Choose **Add schema** and enter the schema of the data set in the data source. Connections do not use tables stored in the Data Catalog, which means that Amazon Glue Studio doesn't know the schema of the data. You must manually provide this schema information. For instructions on how to use the schema editor, see [Editing the schema in a custom transform node](transforms-custom.md#transforms-custom-editschema).

   1. Expand **Connection options**.

   1. Choose **Add new option** and enter the information needed for the connector that was not entered in the Amazon secret:
      +  **es.nodes**: https://*<OpenSearch domain endpoint>* 
      +  **es.port**: 443
      +  **path**: test 
      +  **es.nodes.wan.only**: true 

      For an explanation of these connection options, refer to: [https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html).

1. Add a target node to the graph. 

   Your data target can be Amazon S3, or it can use information from an Amazon Glue Data Catalog or a connector to write data in a different location. For example, you can use a Data Catalog table to write to a database in Amazon RDS, or you can use a connector as your data target to write to data stores that are not natively supported in Amazon Glue.

   If you choose a connector for your data target, you must choose a connection created for that connector. Also, if required by the connector provider, you must add options to provide additional information to the connector. If you use a connection that contains information for an Amazon secret, then you don’t need to provide the user name and password authentication in the connection options.

1. Optionally, add additional data sources and one or more transform nodes as described in [Transform data with Amazon Glue managed transforms](edit-jobs-transforms.md).

1. Configure the job properties as described in [Modify the job properties](managing-jobs-chapter.md#edit-jobs-properties), starting with step 3, and save the job.

### Next step
<a name="tutorial-step5.1"></a>

 [Step 6: Run the job](#tutorial-step6) 

## Step 6: Run the job
<a name="tutorial-step6"></a>

After you save your job, you can run the job to perform the ETL operations.

**To run the job you created for the Amazon Glue Connector for Elasticsearch**

1. Using the Amazon Glue Studio console, on the visual editor page, choose **Run**.

1. In the success banner, choose **Run Details**, or you can choose the **Runs** tab of the visual editor to view information about the job run.