Example: Loading Data into a Neptune DB Instance
This example shows how to load data into Amazon Neptune. Unless stated otherwise, you must follow these steps from an Amazon Elastic Compute Cloud (Amazon EC2) instance in the same Amazon Virtual Private Cloud (VPC) as your Neptune DB instance.
Prerequisites for the Data Loading Example
Before you begin, you must have the following:
-
A Neptune DB instance.
For information about launching a Neptune DB instance, see Creating an Amazon Neptune cluster.
-
An Amazon Simple Storage Service (Amazon S3) bucket to put the data files in.
You can use an existing bucket. If you don't have an S3 bucket, see Create a Bucket in the Amazon S3 Getting Started Guide.
-
Graph data to load, in one of the formats supported by the Neptune loader:
If you are using Gremlin to query your graph, Neptune can load data in a comma-separated-values (
CSV
) format, as described in Gremlin load data format.If you are using openCypher to query your graph, Neptune can also load data in an openCypher-specific
CSV
format, as described in Load format for openCypher data.If you are using SPARQL, Neptune can load data in a number of RDF formats, as described in RDF load data formats.
-
An IAM role for the Neptune DB instance to assume that has an IAM policy that allows access to the data files in the S3 bucket. The policy must grant Read and List permissions.
For information about creating a role that has access to Amazon S3 and then associating it with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.
Note
The Neptune
Load
API needs read access to the data files only. The IAM policy doesn't need to allow write access or access to the entire bucket. An Amazon S3 VPC endpoint. For more information, see the Creating an Amazon S3 VPC Endpoint section.
Creating an Amazon S3 VPC Endpoint
The Neptune loader requires a VPC endpoint for Amazon S3.
To set up access for Amazon S3
Sign in to the Amazon Web Services Management Console and open the Amazon VPC console at https://console.amazonaws.cn/vpc/
. In the left navigation pane, choose Endpoints.
Choose Create Endpoint.
-
Choose the Service Name
com.amazonaws.
.region
.s3Note
If the Region here is incorrect, make sure that the console Region is correct.
Choose the VPC that contains your Neptune DB instance.
Select the check box next to the route tables that are associated with the subnets related to your cluster. If you only have one route table, you must select that box.
Choose Create Endpoint.
For information about creating the endpoint, see VPC Endpoints in the Amazon VPC User Guide. For information about the limitations of VPC endpoints, VPC Endpoints for Amazon S3.
To load data into a Neptune DB instance
-
Copy the data files to an Amazon S3 bucket. The S3 bucket must be in the same Amazon Region as the cluster that loads the data.
You can use the following Amazon CLI command to copy the files to the bucket.
Note
This command does not need to be run from the Amazon EC2 instance.
aws s3 cp
data-file-name
s3://bucket-name
/object-key-name
Note
In Amazon S3, an object key name is the entire path of a file, including the file name.
Example: In the command
aws s3 cp datafile.txt s3://examplebucket/mydirectory/datafile.txt
, the object key name ismydirectory/datafile.txt
.Alternatively, you can use the Amazon Web Services Management Console to upload files to the S3 bucket. Open the Amazon S3 console at https://console.amazonaws.cn/s3/
, and choose a bucket. In the upper-left corner, choose Upload to upload files. -
From a command line window, enter the following to run the Neptune loader, using the correct values for your endpoint, Amazon S3 path, format, and IAM role ARN.
The
format
parameter can be any of the following values:csv
for Gremlin,opencypher
for openCypher, orntriples
,nquads
,turtle
, andrdfxml
for RDF. For information about the other parameters, see Neptune Loader Command.For information about finding the hostname of your Neptune DB instance, see the Connecting to Amazon Neptune Endpoints section.
The Region parameter must match the Region of the cluster and the S3 bucket.
Amazon Neptune is available in the following Amazon Regions:
US East (N. Virginia):
us-east-1
US East (Ohio):
us-east-2
US West (N. California):
us-west-1
US West (Oregon):
us-west-2
Canada (Central):
ca-central-1
South America (São Paulo):
sa-east-1
Europe (Stockholm):
eu-north-1
Europe (Spain):
eu-south-2
Europe (Ireland):
eu-west-1
Europe (London):
eu-west-2
Europe (Paris):
eu-west-3
Europe (Frankfurt):
eu-central-1
Middle East (Bahrain):
me-south-1
Middle East (UAE):
me-central-1
Israel (Tel Aviv):
il-central-1
Africa (Cape Town):
af-south-1
Asia Pacific (Hong Kong):
ap-east-1
Asia Pacific (Tokyo):
ap-northeast-1
Asia Pacific (Seoul):
ap-northeast-2
Asia Pacific (Osaka):
ap-northeast-3
Asia Pacific (Singapore):
ap-southeast-1
Asia Pacific (Sydney):
ap-southeast-2
Asia Pacific (Jakarta):
ap-southeast-3
Asia Pacific (Mumbai):
ap-south-1
China (Beijing):
cn-north-1
China (Ningxia):
cn-northwest-1
Amazon GovCloud (US-West):
us-gov-west-1
Amazon GovCloud (US-East):
us-gov-east-1
curl -X POST \ -H 'Content-Type: application/json' \ https://
your-neptune-endpoint
:port
/loader -d ' { "source" : "s3://bucket-name
/object-key-name
", "format" : "format
", "iamRoleArn" : "arn:aws:iam::account-id
:role/role-name
", "region" : "region
", "failOnError" : "FALSE", "parallelism" : "MEDIUM", "updateSingleCardinalityProperties" : "FALSE", "queueRequest" : "TRUE", "dependencies" : ["load_A_id", "load_B_id"
] }'For information about creating and associating an IAM role with a Neptune cluster, see Prerequisites: IAM Role and Amazon S3 Access.
Note
See Neptune Loader Request Parameters) for detailed information about load request parameters. In brief:
The
source
parameter accepts an Amazon S3 URI that points to either a single file or a folder. If you specify a folder, Neptune loads every data file in the folder.The folder can contain multiple vertex files and multiple edge files.
The URI can be in any of the following formats.
s3://
bucket_name
/object-key-name
https://s3.amazonaws.com/
bucket_name
/object-key-name
https://s3-us-east-1.amazonaws.com/
bucket_name
/object-key-name
The
format
parameter can be one of the following:Gremlin CSV format (
csv
) for Gremlin property graphsopenCypher CSV format (
opencypher
) for openCypher property graphsN -Triples (
ntriples
) format for RDF / SPARQLN-Quads (
nquads
) format for RDF / SPARQLRDF/XML (
rdfxml
) format for RDF / SPARQLTurtle (
turtle
) format for RDF / SPARQL
The optional
parallelism
parameter lets you restrict the number of threads used in the bulk load process. It can be set toLOW
,MEDIUM
,HIGH
, orOVERSUBSCRIBE
.When
updateSingleCardinalityProperties
is set to"FALSE"
, the loader returns an error if more than one value is provided in a source file being loaded for an edge or single-cardinality vertex property.Setting
queueRequest
to"TRUE"
causes the load request to be placed in a queue if there is already a load job running.The
dependencies
parameter makes execution of the load request contingent on the successful completion of one or more load jobs that have already been placed in the queue. -
The Neptune loader returns a job
id
that allows you to check the status or cancel the loading process; for example:{ "status" : "200 OK", "payload" : { "loadId" : "
ef478d76-d9da-4d94-8ff1-08d9d4863aa5
" } } -
Enter the following to get the status of the load with the
loadId
from Step 3:curl -G 'https://
your-neptune-endpoint
:port
/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5
'If the status of the load lists an error, you can request more detailed status and a list of the errors. For more information and examples, see Neptune Loader Get-Status API.
-
(Optional) Cancel the
Load
job.Enter the following to
Delete
the loader job with the jobid
from Step 3:curl -X DELETE 'https://
your-neptune-endpoint
:port
/loader/ef478d76-d9da-4d94-8ff1-08d9d4863aa5
'The
DELETE
command returns the HTTP code200 OK
upon successful cancellation.The data from files from the load job that has finished loading is not rolled back. The data remains in the Neptune DB instance.