Pass Data Between Steps
When you need to retrieve information from the output of a pipeline step,
you can use JsonGet
. JsonGet
helps you extract information
from Amazon S3 or property files. The following sections
explain methods you can use to extract step outputs with JsonGet
.
Pass data between steps with Amazon S3
You can use JsonGet
in a ConditionStep
to fetch the JSON output directly from Amazon S3.
The Amazon S3 URI can be a Std:Join
function containing
primitive strings, pipeline run variables, or pipeline parameters.
The following example shows how you can use JsonGet
in a ConditionStep
:
# Example json file in s3 bucket generated by a processing_step { "Output": [5, 10] } cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name="
<step-name>
", s3_uri="<s3-path-to-json>
", json_path="Output[1]" ), right=6.0 )
If you are using JsonGet
with an Amazon S3 path in the condition step, you must
explicitly add a dependency between the condition step and the step generating the JSON output. In
following example, the condition step is created with a dependency on the processing step:
cond_step = ConditionStep( name="
<step-name>
", conditions=[cond_lte], if_steps=[fail_step], else_steps=[register_model_step], depends_on=[processing_step], )
Pass data between steps with property files
Use property files to store information from the output of a processing step. This is
particularly useful when analyzing the results of a processing step to decide how a
conditional step should be executed. The JsonGet
function processes a property
file and enables you to use JsonPath notation to query the property JSON file. For more
information on JsonPath notation, see the JsonPath repo
To store a property file for later use, you must first create a PropertyFile
instance with the following format. The path
parameter is the name of the JSON
file to which the property file is saved. Any output_name
must match the
output_name
of the ProcessingOutput
that you define in your
processing step. This enables the property file to capture the ProcessingOutput
in the step.
from sagemaker.workflow.properties import PropertyFile
<property_file_instance>
= PropertyFile( name="<property_file_name>
", output_name="<processingoutput_output_name>
", path="<path_to_json_file>
" )
When you create your ProcessingStep
instance, add the
property_files
parameter to list all of the parameter files that the Amazon SageMaker Model Building Pipelines
service must index. This saves the property file for later use.
property_files=[
<property_file_instance>
]
To use your property file in a condition step, add the property_file
to the
condition that you pass to your condition step as shown in the following example to query the
JSON file for your desired property using the json_path
parameter.
cond_lte = ConditionLessThanOrEqualTo( left=JsonGet( step_name=step_eval.name, property_file=
<property_file_instance>
, json_path="mse" ), right=6.0 )
For more in-depth examples, see Property File