Replicating data using Amazon Services - General SAP Guides
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Replicating data using Amazon Services

Data replication using Managed Services

Amazon Glue

Amazon Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. With Amazon Glue, you can discover and connect to SAP using OData and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load SAP data into your data lakes and data warehouses.

The Connecting to SAP OData using Glue user guide offers comprehensive instructions for setting up Glue ETL jobs, configuring SAP OData connections, and reading data from SAP, including handling incremental transfers.

Amazon Glue Zero-ETL is a set of fully managed integrations by Amazon that minimizes the need to build ETL data pipelines for common ingestion and replication use cases. It makes data available in Amazon SageMaker Lakehouse and Amazon Redshift from multiple operational, transactional, and application sources. Leveraging the SAP OData Connectors, you can create full data replication jobs from SAP, with fully managed replication (Inserts, updates and deletions) as well as schema evolution.

Amazon Glue and Glue Zero-ETL serve distinct roles in data integration, with each offering unique advantages for different use cases. While Amazon Glue excels in complex ETL operations, data discovery, preparation, and extraction, particularly for specialized scenarios like SAP ODP-based replication. Amazon Glue Zero-ETL is designed as a more streamlined, no-code solution for fully managed data replication scenarios.

Amazon Glue requires more hands-on management, including code deployment and maintenance, but offers greater flexibility and control over data transformation processes. Amazon Glue performance is enhanced by its serverless, scale-out Apache Spark environment, which allows you to allocate Data Processing Units (DPUs) for scalable compute. This allows parallel processing and event-driven execution.