Amazon Q data integration in Amazon Glue - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon Q data integration in Amazon Glue

Amazon Q data integration in Amazon Glue is a new generative AI capability of Amazon Glue that enables data engineers and ETL developers to build data integration jobs using natural language. Engineers and developers can ask Amazon Q to author jobs, troubleshoot issues, and answer questions about Amazon Glue and data integration.

What is Amazon Q?

Note

Powered by Amazon Bedrock: Amazon implements automated abuse detection. Because Amazon Q data integration is built on Amazon Bedrock, users can take full advantage of the controls implemented in Amazon Bedrock to enforce safety, security, and the responsible use of artificial intelligence (AI).

Amazon Q is a generative artificial intelligence (AI) powered conversational assistant that can help you understand, build, extend, and operate Amazon applications. The model that powers Amazon Q has been augmented with high quality Amazon content to get you more complete, actionable, and referenced answers to accelerate your building on Amazon. For more information, see What is Amazon Q?

What is Amazon Q data integration in Amazon Glue?

Amazon Q data integration in Amazon Glue includes the following capabilities:

  • Chat – Amazon Q data integration in Amazon Glue can answer natural language questions in English about Amazon Glue and data integration domains like Amazon Glue source and destination connectors, Amazon Glue ETL jobs, Data Catalog, crawlers and Amazon Lake Formation, and other feature documentation, and best practices. Amazon Q data integration in Amazon Glue responds with step-by-step instructions, and includes references to its information sources.

  • Data integration code generation – Amazon Q data integration in Amazon Glue can answer questions about Amazon Glue ETL scripts, and generate new code given a natural language question in English.

  • Troubleshoot – Amazon Q data integration in Amazon Glue is purpose built to help you understand errors in Amazon Glue jobs and provides step-by-step instructions, to root cause and resolve your issues.

Note

Amazon Q data integration in Amazon Glue does not use the context of your conversation to inform future responses for the duration of your conversation. Each conversation with Amazon Q data integration in Amazon Glue is independent of your prior or future conversations.

Working with Amazon Q data integration in Amazon Glue?

In the Amazon Q panel you can request Amazon Q generate code for an Amazon Glue ETL script, or answer a question on Amazon Glue features or troubleshooting an error. The response is an ETL script in PySpark with step-by-step instructions to customize the script, review and execute it. For questions, the response is generated based on the data integration knowledge base with a summary and source URL for references.

For example, you can ask Amazon Q to "Please provide a Glue script that reads from Snowflake, renames the fields, and writes to Redshift" and in response, Amazon Q data integration in Amazon Glue will return an Amazon Glue job script that can perform the requested action. You can review the generated code to ensure that it fulfills the requested intent. If satisfied, you can deploy it as an Amazon Glue job in production. You can troubleshoot jobs by asking the integration to explain errors and failures, and to propose solutions. Amazon Q can answer questions about Amazon Glue or data integration best practices.

The following are example questions that demonstrate how Amazon Q data integration in Amazon Glue can help you build on Amazon Glue:

Amazon Glue ETL code generation:

  • Write an Amazon Glue script that reads JSON from S3, transforms fields using apply mapping and writes to Amazon Redshift

  • How do I write an Amazon Glue script for reading from DynamoDB, applying the DropNullFields transform and writing to S3 as Parquet?

  • Give me an Amazon Glue script that reads from MySQL, drops some fields based on my business logic, and writes to Snowflake

  • Write an Amazon Glue job to read from DynamoDB and write to S3 as JSON

  • Help me develop an Amazon Glue script for Amazon Glue Data Catalog to S3

  • Write an Amazon Glue job to read JSON from S3, drop nulls and write to Redshift

Amazon Glue feature explanations:

  • How do I use Amazon Glue Data Quality?

  • How to use Amazon Glue job bookmarks?

  • How do I enable Amazon Glue autoscaling?

  • What is the difference between Amazon Glue dynamic frames and Spark data frames?

  • What are the different types of connections supported by Amazon Glue?

Amazon Glue troubleshooting:

  • How to troubleshoot Out Of Memory (OOM) errors on Amazon Glue jobs?

  • What are some error messages you may see when setting up Amazon Glue Data Quality and how can you fix them?

  • How do I fix an Amazon Glue job with the error Amazon S3 access denied?

  • How do I resolve issues with data shuffle on Amazon Glue jobs?

Best practices for interacting with Amazon Q data integration

The following are best practices for interacting with Amazon Q data integration:

  • When interacting with Amazon Q data integration, ask specific questions, iterate when you have complex requests, and verify the answers for accuracy.

  • When providing data integration prompts in natural language, be as specific as possible to help the assistant understand exactly what you need. Instead of asking "extract data from S3," provide more details like “write an Amazon Glue script that extracts JSON files from S3.”

  • Review the generated script before running it to ensure accuracy. If the generated script has errors or does not match your intent, provide instructions to the assistant on how to correct it.

  • Generative AI technology is new and there can be mistakes, sometimes called hallucinations, in the responses. Test and review all code for errors and vulnerabilities before using it in your environment or workload.

Amazon Q data integration in Amazon Glue service improvement

To help Amazon Q data integration in Amazon Glue provide the most relevant information about Amazon services, we may use certain content from Amazon Q, such as questions that you ask Amazon Q and its responses, for service improvement.

For information about what content we use and how to opt out, see Amazon Q Developer service improvement in the Amazon Q Developer User Guide.

Considerations

Consider the following items before you use Amazon Q data integration in Amazon Glue:

  • Currently, the code generation only works with PySpark kernel. The generated code is for Amazon Glue jobs based on Python Spark.

  • For information about the supported combinations of code generation abilities of Amazon Q data integration in Amazon Glue, see Supported code generation abilities.