Parameters used to control the Neptune export process
Whether you are using the Neptune-Export service or the neptune-export
command line utility, the parameters you use to control the export are mostly the same.
They contain a JSON object passed to the Neptune-Export endpoint or to
neptune-export
on the command line.
The object passed in to the export process has up to five top-level fields:
-d '{ "command" : "
(either
", "outputS3Path" : "s3:/export-pg
orexport-rdf
)(your Amazon S3 bucket)
/(path to the folder for exported data)
", "jobsize" : "(for Neptune-Export service only)
", "params" : {(a JSON object that contains export-process parameters)
}, "additionalParams": {(a JSON object that contains parameters for training configuration)
} }'
Contents
- The command parameter
- The outputS3Path parameter
- The jobSize parameter
- The params object
- The additionalParams object
- Export parameter fields in the params top-level JSON object
- List of possible fields in the export parameters params object
- Fields common to all types of export
- Fields for property-graph export
- concurrency field in params
- edgeLabels field in params
- filter field in params
- filterConfigFile field in params
- format field used for property-graph data in params
- gremlinFilter field in params
- gremlinNodeFilter field in params
- gremlinEdgeFilter field in params
- nodeLabels field in params
- scope field in params
- Fields for RDF export
- Examples of filtering
what is exported
- Filtering the export
of property-graph data
- Example of using scope to export only edges
- Example of using nodeLabels and edgeLabels to export only nodes and edges having specific labels
- Example of using filter to export only specified nodes, edges and properties
- Example that uses gremlinFilter
- Example that uses gremlinNodeFilter
- Example that uses gremlinEdgeFilter
- Combining filter, gremlinNodeFilter, nodeLabels, edgeLabels and scope
- Filtering the export of RDF data
- Filtering the export
of property-graph data
The command
parameter
The command
top-level parameter determines whether to export
property-graph data or RDF data. If you omit the command
parameter, the
export process defaults to exporting property-graph data.
export-pg
– Export property-graph data.export-rdf
– Export RDF data.
The outputS3Path
parameter
The outputS3Path
top-level parameter is required, and
must contain the URI of an Amazon S3 location to which the exported files can be published:
"outputS3Path" : "s3://
(your Amazon S3 bucket)
/(path to output folder)
"
The value must begin with s3://
, followed by a valid bucket name
and optionally a folder path within the bucket.
The jobSize
parameter
The jobSize
top-level parameter is only used with the
the Neptune-Export service, not with the neptune-export
command line utility, and is optional. It lets you characterize the size of
the export job you are starting, which helps determine the amount of compute
resources devoted to the job and its maximum concurrency level.
"jobsize" : "
(one of four size descriptors)
"
The four valid size descriptors are:
small
– Maximum concurrency: 8. Suitable for storage volumes up to 10 GB.medium
– Maximum concurrency: 32. Suitable for storage volumes up to 100 GB.large
– Maximum concurrency: 64. Suitable for storage volumes over 100 GB but less than 1 TB.xlarge
– Maximum concurrency: 96. Suitable for storage volumes over 1 TB.
By default, an export initiated on the Neptune-Export service runs as a
small
job.
The performance of an export depends not only on the jobSize
setting,
but also on the number of database instances that you're exporting from, the size of
each instance, and the effective concurrency level of the job.
For property-graph exports, you can configure the number of database instances using the cloneClusterReplicaCount parameter, and you can configure the job's effective concurrency level using the concurrency parameter.
The params
object
The params
top-level parameter is a JSON object that contains parameters
that you use to control the export process itself, as explained in Export parameter fields in the params top-level JSON object. Some of the fields in the
params
object are specific to property-graph exports, some to RDF.
The additionalParams
object
The additionalParams
top-level parameter is a JSON object that contains
parameters you can use to control actions that are applied to the data after it has been
exported. At present, additionalParams
is used only for exporting training
data for Neptune ML.