Export parameter fields in the params top-level JSON object - Amazon Neptune
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Export parameter fields in the params top-level JSON object

The Neptune export params JSON object allows you to control the export, including the type and format of the exported data.

List of possible fields in the export parameters params object

Listed below are all the possible top-level fields that can appear in a params object. Only a subset of these fields appear in any one object.

List of fields common to all types of export

List of fields for property-graph exports

List of fields for RDF exports

Fields common to all types of export

cloneCluster field in params

(Optional). Default: false.

If the cloneCluster parameter is set to true, the export process uses a fast clone of your DB cluster:

"cloneCluster" : true

By default, the export process exports data from the DB cluster that you specify using the endpoint, endpoints or clusterId parameters. However, if your DB cluster is in use while the export is going on, and data is changing, the export process cannot guarantee the consistency of the data being exported.

To ensure that the exported data is consistent, use the cloneCluster parameter to export from a static clone of your DB cluster instead.

The cloned DB cluster is created in the same VPC as the source DB cluster and inherits the security group, subnet group and IAM database authentication settings of the source. When the export is complete, Neptune deletes the cloned DB cluster.

By default, a cloned DB cluster consists of a single instance of the same instance type as the primary instance in the source DB cluster. You can change the instance type used for the cloned DB cluster by specifying a different one using cloneClusterInstanceType.

Note

If you don't use the cloneCluster option, and are exporting directly from your main DB cluster, you might need to increase the timeout on the instances from which data is being exported. For large data sets, the timeout should be set to several hours.

cloneClusterInstanceType field in params

(Optional).

If the cloneCluster parameter is present and set to true, you can use the cloneClusterInstanceType parameter to specify the instance type used for the cloned DB cluster:

By default, a cloned DB cluster consists of a single instance of the same instance type as the primary instance in the source DB cluster.

"cloneClusterInstanceType" : "(for example, r5.12xlarge)"

cloneClusterReplicaCount field in params

(Optional).

If the cloneCluster parameter is present and set to true, you can use the cloneClusterReplicaCount parameter to specify the number of read-replica instances created in the cloned DB cluster:

"cloneClusterReplicaCount" : (for example, 3)

By default, a cloned DB cluster consists of a single primary instance. The cloneClusterReplicaCount parameter lets you specify how many additional read-replica instances should be created.

clusterId field in params

(Optional).

The clusterId parameter specifies the ID of a DB cluster to use:

"clusterId" : "(the ID of your DB cluster)"

If you use the clusterId parameter, the export process uses all available instances in that DB cluster to extract data.

Note

The endpoint, endpoints, and clusterId parameters are mutually exclusive. Use one and only one of them.

endpoint field in params

(Optional).

Use endpoint to specify an endpoint of a Neptune instance in your DB cluster that the export process can query to extract data (see Endpoint Connections). This is the DNS name only, and does not include the protocol or port:

"endpoint" : "(a DNS endpoint of your DB cluster)"

Use a cluster or instance endpoint, but not the main reader endpoint.

Note

The endpoint, endpoints, and clusterId parameters are mutually exclusive. Use one and only one of them.

endpoints field in params

(Optional).

Use endpoints to specify a JSON array of endpoints in your DB cluster that the export process can query to extract data (see Endpoint Connections). These are DNS names only, and do not include the protocol or port:

"endpoints": [ "(one endpoint in your DB cluster)", "(another endpoint in your DB cluster)", "(a third endpoint in your DB cluster)" ]

If you have multiple instances in your cluster (a primary and one or more read replicas), you can improve export performance by using the endpoints parameter to distribute queries across a list of those endpoints.

Note

The endpoint, endpoints, and clusterId parameters are mutually exclusive. Use one and only one of them.

profile field in params

(Required to export training data for Neptune ML, unless the neptune_ml field is present in the additionalParams field).

The profile parameter provides sets of pre-configured parameters for specific workloads. At present, the export process only supports the neptune_ml profile

If you are exporting training data for Neptune ML, add the following parameter to the params object:

"profile" : "neptune_ml"

useIamAuth field in params

(Optional). Default: false.

If the database from which you are exporting data has IAM authentication enabled, you must include the useIamAuth parameter set to true:

"useIamAuth" : true

includeLastEventId field in params

If you set includeLastEventId to true, and the database from which you are exporting data has Neptune Streams enabled, the export process writes a lastEventId.json file to your specified export location. This file contains the commitNum and opNum of the last event in the stream.

"includeLastEventId" : true

A cloned database created by the export process inherits the streams setting of its parent. If the parent has streams enabled, the clone will likewise have streams enabled. The contents of the stream on the clone will reflect the contents of the parent (including the same event IDs) at the point in time the clone was created.

Fields for property-graph export

concurrency field in params

(Optional). Default: 4.

The concurrency parameter specifies the number of parallel queries that the export process should use:

"concurrency" : (for example, 24)

A good guideline is to set the concurrency level to twice the number of vCPUs on all the instances from which you are exporting data. An r5.xlarge instance, for example, has 4 vCPUs. If you are exporting from a cluster of 3 r5.xlarge instances, you can set the concurrency level to 24 (= 3 x 2 x 4).

If you are using the Neptune-Export service, the concurrency level is limited by the jobSize setting. A small job, for example, supports a concurrency level of 8. If you try to specify a concurrency level of 24 for a small job using the concurrency parameter, the effective level remains at 8.

If you export from a cloned cluster, the export process calculates an appropriate concurrency level based on the size of the cloned instances and the job size.

edgeLabels field in params

(Optional).

Use edgeLabels to export only those edges that have labels that you specify:

"edgeLabels" : ["(a label)", "(another label"]

Each label in the JSON array must be a single, simple label.

The scope parameter takes precedence over the edgeLabels parameter, so if the scope value does not include edges, the edgeLabels parameter has no effect.

filter field in params

(Optional).

Use filter to specify that only nodes and/or edges with specific labels should be exported, and to filter the properties that are exported for each node or edge.

The general structure of a filter object, either inline or in a filter-configuration file, is as follows:

"filter" : { "nodes": [ (array of node label and properties objects) ], "edges": [ (array of edge definition an properties objects) ] }
  • nodes   –   Contains a JSON array of nodes and node properties in the following form:

    "nodes : [ { "label": "(node label)", "properties": [ "(a property name)", "(another property name)", ( ... ) ] } ]
    • label  –   The node's property-graph label or labels.

      Takes a single value or, if the node has multiple labels, an array of values.

    • properties  –   Contains an array of the names of the node's properties that you want to export.

  • edges   –   Contains a JSON array of edge definitions in the following form:

    "edges" : [ { "label": "(edge label)", "properties": [ "(a property name)", "(another property name)", ( ... ) ] } ]
    • label   –   The edge's property graph label. Takes a single value.

    • properties  –   Contains an array of the names of the edge's properties that you want to export.

filterConfigFile field in params

(Optional).

Use filterConfigFile to specify a JSON file that contains a filter configuration in the same form that the filter parameter takes:

"filterConfigFile" : "s3://(your Amazon S3 bucket)/neptune-export/(the name of the JSON file)"

See filter for the format of the filterConfigFile file.

format field used for property-graph data in params

(Optional). Default: csv (comma-separated values)

The format parameter specifies the output format of the exported property graph data:

"format" : (one of: csv, csvNoHeaders, json, neptuneStreamsJson)
  • csv   –   Comma-separated value (CSV) formatted output, with column headings formatted according to the Gremlin load data format.

  • csvNoHeaders   –   CSV formatted data with no column headings.

  • json   –   JSON formatted data.

  • neptuneStreamsJson   –   JSON formatted data that uses the GREMLIN_JSON change serialization format.

gremlinFilter field in params

(Optional).

The gremlinFilter parameter allows you to supply a Gremlin snippet, such as a has() step, that is used to filter both nodes and edges:

"gremlinFilter" : (a Gremlin snippet)

Field names and string values should be surrounded by escaped double quotes. For dates and times, you can use the datetime method.

The following example exports only those nodes and edges with a date-created property whose value is greater than 2021-10-10:

"gremlinFilter" : "has(\"created\", gt(datetime(\"2021-10-10\")))"

gremlinNodeFilter field in params

(Optional).

The gremlinNodeFilter parameter allows you to supply a Gremlin snippet, such as a has() step, that is used to filter nodes:

"gremlinNodeFilter" : (a Gremlin snippet)

Field names and string values should be surrounded by escaped double quotes. For dates and times, you can use the datetime method.

The following example exports only those nodes with a deleted Boolean property whose value is true:

"gremlinNodeFilter" : "has(\"deleted\", true)"

gremlinEdgeFilter field in params

(Optional).

The gremlinEdgeFilter parameter allows you to supply a Gremlin snippet, such as a has() step, that is used to filter edges:

"gremlinEdgeFilter" : (a Gremlin snippet)

Field names and string values should be surrounded by escaped double quotes. For dates and times, you can use the datetime method.

The following example exports only those edges with a strength numerical property whose value is 5:

"gremlinEdgeFilter" : "has(\"strength\", 5)"

nodeLabels field in params

(Optional).

Use nodeLabels to export only those nodes that have labels you specify:

"nodeLabels" : ["(a label)", "(another label"]

Each label in the JSON array must be a single, simple label.

The scope parameter takes precedence over the nodeLabels parameter, so if the scope value does not include nodes, the nodeLabels parameter has no effect.

scope field in params

(Optional). Default: all.

The scope parameter specifies whether to export only nodes, or only edges, or both nodes and edges:

"scope" : (one of: nodes, edges, or all)
  • nodes   –   Export nodes and their properties only.

  • edges   –   Export edges and their properties only.

  • all   –   Export both nodes and edges and their properties (the default).

Fields for RDF export

format field used for RDF data in params

(Optional). Default: turtle

The format parameter specifies the output format of the exported RDF data:

"format" : (one of: turtle, nquads, ntriples, neptuneStreamsJson)
  • turtle   –   Turtle formatted output.

  • nquads   –   N-Quads formatted data with no column headings.

  • ntriples   –   N-Triples formatted data.

  • neptuneStreamsJson   –   JSON formatted data that uses the SPARQL NQUADS change serialization format.

rdfExportScope field in params

(Optional). Default: graph.

The rdfExportScope parameter specifies the scope of the RDF export:

"rdfExportScope" : (one of: graph, edges, or query)
  • graph   –   Export all RDF data.

  • edges   –   Export only those triples that represent edges.

  • query   –   Export data retrieved by a SPARQL query that issupplied using the sparql field.

sparql field in params

(Optional).

The sparql parameter allows you to specify a SPARQL query to retrieve the data to export:

"sparql" : (a SPARQL query)

If you supply a query using the sparql field, you must also set the rdfExportScope field to query.

namedGraph field in params

(Optional).

The namedGraph parameter allows you to specify an IRI to limit the export to a single named graph:

"namedGraph" : (Named graph IRI)

The namedGraph parameter can only be used with the rdfExportScope field set to graph.