StartMLDataProcessingJob - Neptune Data API

StartMLDataProcessingJob

Creates a new Neptune ML data processing job for processing the graph data exported from Neptune for training. See The dataprocessing command.

When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:StartMLModelDataProcessingJob IAM action in that cluster.

Request Syntax

POST /ml/dataprocessing HTTP/1.1 Content-type: application/json { "configFileName": "string", "id": "string", "inputDataS3Location": "string", "modelType": "string", "neptuneIamRoleArn": "string", "previousDataProcessingJobId": "string", "processedDataS3Location": "string", "processingInstanceType": "string", "processingInstanceVolumeSizeInGB": number, "processingTimeOutInSeconds": number, "s3OutputEncryptionKMSKey": "string", "sagemakerIamRoleArn": "string", "securityGroupIds": [ "string" ], "subnets": [ "string" ], "volumeEncryptionKMSKey": "string" }

URI Request Parameters

The request does not use any URI parameters.

Request Body

The request accepts the following data in JSON format.

configFileName

A data specification file that describes how to load the exported graph data for training. The file is automatically generated by the Neptune export toolkit. The default is training-data-configuration.json.

Type: String

Required: No

id

A unique identifier for the new job. The default is an autogenerated UUID.

Type: String

Required: No

inputDataS3Location

The URI of the Amazon S3 location where you want SageMaker to download the data needed to run the data processing job.

Type: String

Required: Yes

modelType

One of the two model types that Neptune ML currently supports: heterogeneous graph models (heterogeneous), and knowledge graph (kge). The default is none. If not specified, Neptune ML chooses the model type automatically based on the data.

Type: String

Required: No

neptuneIamRoleArn

The Amazon Resource Name (ARN) of an IAM role that SageMaker can assume to perform tasks on your behalf. This must be listed in your DB cluster parameter group or an error will occur.

Type: String

Required: No

previousDataProcessingJobId

The job ID of a completed data processing job run on an earlier version of the data.

Type: String

Required: No

processedDataS3Location

The URI of the Amazon S3 location where you want SageMaker to save the results of a data processing job.

Type: String

Required: Yes

processingInstanceType

The type of ML instance used during data processing. Its memory should be large enough to hold the processed dataset. The default is the smallest ml.r5 type whose memory is ten times larger than the size of the exported graph data on disk.

Type: String

Required: No

processingInstanceVolumeSizeInGB

The disk volume size of the processing instance. Both input data and processed data are stored on disk, so the volume size must be large enough to hold both data sets. The default is 0. If not specified or 0, Neptune ML chooses the volume size automatically based on the data size.

Type: Integer

Required: No

processingTimeOutInSeconds

Timeout in seconds for the data processing job. The default is 86,400 (1 day).

Type: Integer

Required: No

s3OutputEncryptionKMSKey

The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.

Type: String

Required: No

sagemakerIamRoleArn

The ARN of an IAM role for SageMaker execution. This must be listed in your DB cluster parameter group or an error will occur.

Type: String

Required: No

securityGroupIds

The VPC security group IDs. The default is None.

Type: Array of strings

Required: No

subnets

The IDs of the subnets in the Neptune VPC. The default is None.

Type: Array of strings

Required: No

volumeEncryptionKMSKey

The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.

Type: String

Required: No

Response Syntax

HTTP/1.1 200 Content-type: application/json { "arn": "string", "creationTimeInMillis": number, "id": "string" }

Response Elements

If the action is successful, the service sends back an HTTP 200 response.

The following data is returned in JSON format by the service.

arn

The ARN of the data processing job.

Type: String

creationTimeInMillis

The time it took to create the new processing job, in milliseconds.

Type: Long

id

The unique ID of the new data processing job.

Type: String

Errors

For information about the errors that are common to all actions, see Common Errors.

BadRequestException

Raised when a request is submitted that cannot be processed.

HTTP Status Code: 400

ClientTimeoutException

Raised when a request timed out in the client.

HTTP Status Code: 408

ConstraintViolationException

Raised when a value in a request field did not satisfy required constraints.

HTTP Status Code: 400

IllegalArgumentException

Raised when an argument in a request is not supported.

HTTP Status Code: 400

InvalidArgumentException

Raised when an argument in a request has an invalid value.

HTTP Status Code: 400

InvalidParameterException

Raised when a parameter value is not valid.

HTTP Status Code: 400

MissingParameterException

Raised when a required parameter is missing.

HTTP Status Code: 400

MLResourceNotFoundException

Raised when a specified machine-learning resource could not be found.

HTTP Status Code: 404

PreconditionsFailedException

Raised when a precondition for processing a request is not satisfied.

HTTP Status Code: 400

TooManyRequestsException

Raised when the number of requests being processed exceeds the limit.

HTTP Status Code: 429

UnsupportedOperationException

Raised when a request attempts to initiate an operation that is not supported.

HTTP Status Code: 400

See Also

For more information about using this API in one of the language-specific AWS SDKs, see the following: