# Creating SAP OData job


Refer to [Building visual ETL jobs with Amazon Glue Studio](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html)

# Operational Data Provisioning (ODP) Sources


Operational Data Provisioning (ODP) provides a technical infrastructure that you can use to support data extraction and replication for various target applications and supports delta mechanisms in these scenarios. In case of a delta procedure, the data from a source (ODP Provider) is automatically written to a delta queue (Operational Delta Queue – ODQ) using an update process or passed to the delta queue using an extractor interface. An ODP Provider can be a DataSource (extractors), ABAP Core Data Services Views (ABAP CDS Views), SAP BW or SAP BW/4HANA, SAP Landscape Transformation Replication Server (SLT), and SAP HANA Information Views (calculation views). The target applications (referred to as ODQ 'subscribers' or more generally “ODP Consumers”) retrieve the data from the delta queue and continue processing the data.

## Full Load


In the context of SAP OData and ODP entities, a **Full Load** refers to the process of extracting all available data from an ODP entity in a single operation. This operation retrieves the complete dataset from the source system, ensuring that the target system has a comprehensive and up-to-date copy of the entity's data. Full loads are typically used for sources that do not support incremental loads or when a refresh of the target system is required.

**Example**

You can explicitly set the `ENABLE_CDC` flag to false, when creating the DynamicFrame. Note: `ENABLE_CDC` is false by default, if you don’t want to initialize the delta queue, you don’t have to send this flag or set it to true. Not setting this flag to true will result in a full load extraction.

```
sapodata_df = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "ENABLE_CDC": "false"
    }, transformation_ctx=key)
```

## Incremental Load


An **incremental load** in the context of ODP (Operational Data Provisioning) entities involves extracting only the new or changed data (deltas) from the source system since the last data extraction, avoiding preprocessing the already processed records. This approach significantly improves efficiency, reduces data transfer volumes, enhances performance, ensures efficient synchronization between systems, and minimizes processing time, especially for large datasets that change frequently.

# Delta Token based Incremental Transfers


To enable Incremental Transfer using Change Data Capture (CDC) for ODP-enabled entities that support it, follow these steps:

1. Create the Incremental Transfer job in script mode.

1. When creating the DataFrame or Glue DynamicFrame, you need to pass the option `"ENABLE_CDC": "True"`. This option ensures that you will receive a Delta Token from SAP, which can be used for subsequent retrieval of changed data.

The delta token will be present in the last row of the dataframe, in the DELTA\$1TOKEN column. This token can be used as a connector option in subsequent calls to incrementally retrieve the next set of data.

**Example**
+ We set the `ENABLE_CDC` flag to `true`, when creating the DynamicFrame. Note: `ENABLE_CDC` is `false` by default, if you don’t want to initialize the delta queue, you don’t need to send this flag or set it to true. Not setting this flag to true will result in a full load extraction.

  ```
  sapodata_df = glueContext.create_dynamic_frame.from_options(
      connection_type="SAPOData",
      connection_options={
          "connectionName": "connectionName",
          "ENTITY_NAME": "entityName",
          "ENABLE_CDC": "true"
      }, transformation_ctx=key)
  
  # Extract the delta token from the last row of the DELTA_TOKEN column
  delta_token_1 = your_logic_to_extract_delta_token(sapodata_df) # e.g., D20241029164449_000370000
  ```
+ The extracted delta token can be passed as a an option to retrieve new events.

  ```
  sapodata_df_2 = glueContext.create_dynamic_frame.from_options(
      connection_type="SAPOData",
      connection_options={
          "connectionName": "connectionName",
          "ENTITY_NAME": "entityName",
          // passing the delta token retrieved in the last run
          "DELTA_TOKEN": delta_token_1
      } , transformation_ctx=key)
  
  # Extract the new delta token for the next run
  delta_token_2 = your_logic_to_extract_delta_token(sapodata_df_2)
  ```

Note that the last record, in which the `DELTA_TOKEN` is present, is not a transactional record from source, and is only there for the purpose of passing the delta token value.

Apart from the `DELTA_TOKEN`, the following fields are returned in each row of the dataframe. 
+ **GLUE\$1FETCH\$1SQ**: This is a sequence field, generated from the EPOC timestamp in the order the record was received, and is unique for each record. This can be used if you need to know or establish the order of changes in the source system. This field will be present only for ODP enabled entities.
+ **DML\$1STATUS**: This will show `UPDATED` for all newly inserted and updated records from the source, and `DELETED` for records that have been deleted from source.

For more details about how to manage state and reuse the delta token to retrieve changed records through an example refer to the [Using the SAP OData state management script](sap-odata-state-management-script.md) section.

## Delta Token Invalidation


A delta token is associated with the service collection and a user. If a new initial pull with `“ENABLE_CDC” : “true”` is initiated for the same service collection and the user, all previous delta tokens issued as a result of a previous initialization will be invalidated by SAP OData service. Invoking the connector with an expired delta token will lead to an exception: 

`Could not open data access via extraction API RODPS_REPL_ODP_OPEN` 

# OData Services (Non-ODP Sources)


## Full Load


For Non-ODP (Operational Data Provisioning) systems, a **Full Load** involves extracting the entire dataset from the source system and loading it into the target system. Since Non-ODP systems do not inherently support advanced data extraction mechanisms like deltas, the process is straightforward but can be resource-intensive depending on the size of the data.

## Incremental Load


For systems or entities that do not support **ODP (Operational Data Provisioning)**, incremental data transfer can be managed manually by implementing a timestamp based mechanism to track and extract changes.

**Timestamp based Incremental Transfers**

For non-ODP enabled entities(or for ODP enabled entities that don’t use the ENABLE\$1CDC flag), we can use a `filteringExpression` option in the connector to indicate the `datetime` interval for which we want to retrieve data. This method relies on a timestamp field in you data that represents when each record was last created/modified.

**Example**

Retrieving records that changed after 2024-01-01T00:00:00.000

```
sapodata_df = glueContext.create_dynamic_frame.from_options(
    connection_type="SAPOData",
    connection_options={
        "connectionName": "connectionName",
        "ENTITY_NAME": "entityName",
        "filteringExpression": "LastChangeDateTime >= 2024-01-01T00:00:00.000"
    }, transformation_ctx=key)
```

Note: In this example, `LastChangeDateTime` is the field that represents when each record was last modified. The actual field name may vary depending on your specific SAP OData entity.

To get a new subset of data in subsequent runs, you would update the `filteringExpression` with a new timestamp. Typically, this would be the maximum timestamp value from the previously retrieved data.

**Example**

```
max_timestamp = get_max_timestamp(sapodata_df)  # Function to get the max timestamp from the previous run
next_filtering_expression = f"LastChangeDateTime > {max_timestamp}"

# Use this next_filtering_expression in your next run
```

In the next section, we will provide an automated approach to manage these timestamp-based incremental transfers, eliminating the need to manually update the filtering expression between runs.