Troubleshooting
If you are having errors in Amazon SageMaker Batch Transform, refer to the following troubleshooting tips.
Max timeout errors
If you are getting max timeout errors when running batch transform jobs, try the following:
Begin with the single-record
BatchStrategy
, a batch size of the default (6 MB) or smaller which you specify in theMaxPayloadInMB
parameter, and a small sample dataset. Tune the maximum timeout parameterInvocationsTimeoutInSeconds
(which has a maximum of 1 hour) until you receive a successful invocation response.After you receive a successful invocation response, increase the
MaxPayloadInMB
(which has a maximum of 100 MB) and theInvocationsTimeoutInSeconds
parameters together to find the maximum batch size that can support your desired model timeout. You can use either the single-record or multi-recordBatchStrategy
in this step.Note
Exceeding the
MaxPayloadInMB
limit causes an error. This might happen with a large dataset if it can't be split, theSplitType
parameter is set to none, or individual records within the dataset exceed the limit.(Optional) Tune the
MaxConcurrentTransforms
parameter, which specifies the maximum number of parallel requests that can be sent to each instance in a batch transform job. However, the value ofMaxConcurrentTransforms * MaxPayloadInMB
must not exceed 100 MB.
Incomplete output
SageMaker uses the Amazon S3 Multipart Upload API to upload results from a batch transform job to Amazon S3. If an error occurs, the uploaded results are removed from Amazon S3. In some cases, such as when a network outage occurs, an incomplete multipart upload might remain in Amazon S3. An incomplete upload might also occur if you have multiple input files but some of the files can’t be processed by SageMaker Batch Transform. The input files that couldn’t be processed won’t have corresponding output files in Amazon S3.
To avoid incurring storage charges, we recommend that you add the S3 bucket policy to the S3 bucket lifecycle rules. This policy deletes incomplete multipart uploads that might be stored in the S3 bucket. For more information, see Object Lifecycle Management.
Job shows as failed
If a batch transform job fails to process an input file because of a problem with the
dataset, SageMaker marks the job as failed
. If an input file contains a bad
record, the transform job doesn't create an output file for that input file because
doing so prevents it from maintaining the same order in the transformed data as in the
input file. When your dataset has multiple input files, a transform job continues to
process input files even if it fails to process one. The processed files still generate
useable results.
If you are using your own algorithms, you can use placeholder text, such as
ERROR
, when the algorithm finds a bad record in an input file. For
example, if the last record in a dataset is bad, the algorithm places the placeholder
text for that record in the output file.