Neptune ML data-processing API - Amazon Neptune
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Neptune ML data-processing API

Data-processing actions:

ML general-purpose structures:

StartMLDataProcessingJob (action)

        The Amazon CLI name for this API is: start-ml-data-processing-job.

Creates a new Neptune ML data processing job for processing the graph data exported from Neptune for training. See The dataprocessing command.

When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:StartMLModelDataProcessingJob IAM action in that cluster.

Request

  • configFileName  (in the CLI: --config-file-name) –  a String, of type: string (a UTF-8 encoded string).

    A data specification file that describes how to load the exported graph data for training. The file is automatically generated by the Neptune export toolkit. The default is training-data-configuration.json.

  • id  (in the CLI: --id) –  a String, of type: string (a UTF-8 encoded string).

    A unique identifier for the new job. The default is an autogenerated UUID.

  • inputDataS3Location  (in the CLI: --input-data-s3-location) –  Required: a String, of type: string (a UTF-8 encoded string).

    The URI of the Amazon S3 location where you want SageMaker to download the data needed to run the data processing job.

  • modelType  (in the CLI: --model-type) –  a String, of type: string (a UTF-8 encoded string).

    One of the two model types that Neptune ML currently supports: heterogeneous graph models (heterogeneous), and knowledge graph (kge). The default is none. If not specified, Neptune ML chooses the model type automatically based on the data.

  • neptuneIamRoleArn  (in the CLI: --neptune-iam-role-arn) –  a String, of type: string (a UTF-8 encoded string).

    The Amazon Resource Name (ARN) of an IAM role that SageMaker can assume to perform tasks on your behalf. This must be listed in your DB cluster parameter group or an error will occur.

  • previousDataProcessingJobId  (in the CLI: --previous-data-processing-job-id) –  a String, of type: string (a UTF-8 encoded string).

    The job ID of a completed data processing job run on an earlier version of the data.

  • processedDataS3Location  (in the CLI: --processed-data-s3-location) –  Required: a String, of type: string (a UTF-8 encoded string).

    The URI of the Amazon S3 location where you want SageMaker to save the results of a data processing job.

  • processingInstanceType  (in the CLI: --processing-instance-type) –  a String, of type: string (a UTF-8 encoded string).

    The type of ML instance used during data processing. Its memory should be large enough to hold the processed dataset. The default is the smallest ml.r5 type whose memory is ten times larger than the size of the exported graph data on disk.

  • processingInstanceVolumeSizeInGB  (in the CLI: --processing-instance-volume-size-in-gb) –  an Integer, of type: integer (a signed 32-bit integer).

    The disk volume size of the processing instance. Both input data and processed data are stored on disk, so the volume size must be large enough to hold both data sets. The default is 0. If not specified or 0, Neptune ML chooses the volume size automatically based on the data size.

  • processingTimeOutInSeconds  (in the CLI: --processing-time-out-in-seconds) –  an Integer, of type: integer (a signed 32-bit integer).

    Timeout in seconds for the data processing job. The default is 86,400 (1 day).

  • s3OutputEncryptionKMSKey  (in the CLI: --s-3-output-encryption-kms-key) –  a String, of type: string (a UTF-8 encoded string).

    The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt the output of the processing job. The default is none.

  • sagemakerIamRoleArn  (in the CLI: --sagemaker-iam-role-arn) –  a String, of type: string (a UTF-8 encoded string).

    The ARN of an IAM role for SageMaker execution. This must be listed in your DB cluster parameter group or an error will occur.

  • securityGroupIds  (in the CLI: --security-group-ids) –  a String, of type: string (a UTF-8 encoded string).

    The VPC security group IDs. The default is None.

  • subnets  (in the CLI: --subnets) –  a String, of type: string (a UTF-8 encoded string).

    The IDs of the subnets in the Neptune VPC. The default is None.

  • volumeEncryptionKMSKey  (in the CLI: --volume-encryption-kms-key) –  a String, of type: string (a UTF-8 encoded string).

    The Amazon Key Management Service (Amazon KMS) key that SageMaker uses to encrypt data on the storage volume attached to the ML compute instances that run the training job. The default is None.

Response

  • arn   – a String, of type: string (a UTF-8 encoded string).

    The ARN of the data processing job.

  • creationTimeInMillis   – a Long, of type: long (a signed 64-bit integer).

    The time it took to create the new processing job, in milliseconds.

  • id   – a String, of type: string (a UTF-8 encoded string).

    The unique ID of the new data processing job.

ListMLDataProcessingJobs (action)

        The Amazon CLI name for this API is: list-ml-data-processing-jobs.

Returns a list of Neptune ML data processing jobs. See Listing active data-processing jobs using the Neptune ML dataprocessing command.

When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:ListMLDataProcessingJobs IAM action in that cluster.

Request

  • maxItems  (in the CLI: --max-items) –  a ListMLDataProcessingJobsInputMaxItemsInteger, of type: integer (a signed 32-bit integer), not less than 1 or more than 1024 ?st?s.

    The maximum number of items to return (from 1 to 1024; the default is 10).

  • neptuneIamRoleArn  (in the CLI: --neptune-iam-role-arn) –  a String, of type: string (a UTF-8 encoded string).

    The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.

Response

  • ids   – a String, of type: string (a UTF-8 encoded string).

    A page listing data processing job IDs.

GetMLDataProcessingJob (action)

        The Amazon CLI name for this API is: get-ml-data-processing-job.

Retrieves information about a specified data processing job. See The dataprocessing command.

When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:neptune-db:GetMLDataProcessingJobStatus IAM action in that cluster.

Request

  • id  (in the CLI: --id) –  Required: a String, of type: string (a UTF-8 encoded string).

    The unique identifier of the data-processing job to be retrieved.

  • neptuneIamRoleArn  (in the CLI: --neptune-iam-role-arn) –  a String, of type: string (a UTF-8 encoded string).

    The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.

Response

  • id   – a String, of type: string (a UTF-8 encoded string).

    The unique identifier of this data-processing job.

  • processingJob   – A MlResourceDefinition object.

    Definition of the data processing job.

  • status   – a String, of type: string (a UTF-8 encoded string).

    Status of the data processing job.

CancelMLDataProcessingJob (action)

        The Amazon CLI name for this API is: cancel-ml-data-processing-job.

Cancels a Neptune ML data processing job. See The dataprocessing command.

When invoking this operation in a Neptune cluster that has IAM authentication enabled, the IAM user or role making the request must have a policy attached that allows the neptune-db:CancelMLDataProcessingJob IAM action in that cluster.

Request

  • clean  (in the CLI: --clean) –  a Boolean, of type: boolean (a Boolean (true or false) value).

    If set to TRUE, this flag specifies that all Neptune ML S3 artifacts should be deleted when the job is stopped. The default is FALSE.

  • id  (in the CLI: --id) –  Required: a String, of type: string (a UTF-8 encoded string).

    The unique identifier of the data-processing job.

  • neptuneIamRoleArn  (in the CLI: --neptune-iam-role-arn) –  a String, of type: string (a UTF-8 encoded string).

    The ARN of an IAM role that provides Neptune access to SageMaker and Amazon S3 resources. This must be listed in your DB cluster parameter group or an error will occur.

Response

  • status   – a String, of type: string (a UTF-8 encoded string).

    The status of the cancellation request.

ML general-purpose structures:

MlResourceDefinition (structure)

Defines a Neptune ML resource.

Fields
  • arn – This is a String, of type: string (a UTF-8 encoded string).

    The resource ARN.

  • cloudwatchLogUrl – This is a String, of type: string (a UTF-8 encoded string).

    The CloudWatch log URL for the resource.

  • failureReason – This is a String, of type: string (a UTF-8 encoded string).

    The failure reason, in case of a failure.

  • name – This is a String, of type: string (a UTF-8 encoded string).

    The resource name.

  • outputLocation – This is a String, of type: string (a UTF-8 encoded string).

    The output location.

  • status – This is a String, of type: string (a UTF-8 encoded string).

    The resource status.

MlConfigDefinition (structure)

Contains a Neptune ML configuration.

Fields
  • arn – This is a String, of type: string (a UTF-8 encoded string).

    The ARN for the configuration.

  • name – This is a String, of type: string (a UTF-8 encoded string).

    The configuration name.