Migrating from Blazegraph to Amazon Neptune - Amazon Neptune
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Migrating from Blazegraph to Amazon Neptune

If you have a graph in the open-source Blazegraph RDF triplestore, you can migrate to your graph data to Amazon Neptune using the following steps:

  • Provision Amazon infrastructure. Begin by provisioning the required Neptune infrastructure using an Amazon CloudFormation template (see Create Neptune cluster ).

  • Export data from Blazegraph. There are two main methods for exporting data from Blazegraph, namely using SPARQL CONSTRUCT queries or using the Blazegraph Export utility.

This approach is also generally applicable for migrating from other RDF triplestore databases.

Blazegraph to Neptune compatibility

Before migrating your graph data to Neptune, there are several significant differences between Blazegraph and Neptune that you should be aware of. These differences can require changes to queries, the application architecture, or both, or even make migration impractical:

  • Full-text search   –   In Blazegraph, you can either use internal full-text search or external full-text search capabilities through an integration with Apache Solr. If you use either of these features, stay informed about the latest updates on the full-text search features that Neptune supports. See Neptune full text search.

  • Query hints   –   Both Blazegraph and Neptune extend SPARQL using the concept of query hints. During a migration, you need to migrate any query hints you use. For information about the latest query hints Neptune supports, see SPARQL query hints.

  • Inference   –   Blazegraph supports inference as a configurable option in triples mode, but not in quads mode. Neptune does not yet support inference.

  • Geospatial search   –   Blazegraph supports the configuration of namespaces that enable geospatial support. This feature is not yet available in Neptune.

  • Multi-tenancy   –   Blazegraph supports multi-tenancy within a single database. In Neptune, multi-tenancy is supported either by storing data in named graphs and using the USING NAMED clauses for SPARQL queries, or by creating a separate database cluster for each tenant.

  • Federation   –   Neptune currently supports SPARQL 1.1 federation to locations accessible to the Neptune instance, such as within the private VPC, across VPCs, or to external internet endpoints. Depending on the specific setup and required federation endpoints, you may need some additional network configuration.

  • Blazegraph standards extensions   –   Blazegraph includes multiple extensions to both the SPARQL and REST API standards, whereas Neptune is only compatible with the standards specifications themselves. This may require changes to your application, or make migration difficult.

Provisioning Amazon infrastructure for Neptune

Although you can construct the required Amazon infrastructure manually through the Amazon Web Services Management Console or Amazon CLI, it's often more convenient to use a CloudFormation template instead, as described below:

Provisioning Neptune using a CloudFormation template:
  1. Navigate to Creating an Amazon Neptune cluster using Amazon CloudFormation.

  2. Choose Launch Stack in your preferred region.

  3. Set the required parameters (stack name and EC2SSHKeyPairName). Also set the following optional parameters to ease the migration process:

    • Set AttachBulkloadIAMRoleToNeptuneCluster to true. This parameter allows for creating and attaching the appropriate IAM role to your cluster to allow for bulk loading data.

    • Set NotebookInstanceType to your preferred instance type. This parameter creates a Neptune workbook that you use to run the bulk load into Neptune and validate the migration.

  4. Choose Next.

  5. Set any other stack options you want.

  6. Choose Next.

  7. Review your options and select both check boxes to acknowledge that Amazon CloudFormation may require additional capabilities.

  8. Choose Create stack.

The stack creation process can take a few minutes.

Exporting data from Blazegraph

The next step is to export data out of Blazegraph in a format that is compatible with the Neptune bulk loader.

Depending on how the data is stored in Blazegraph (triples or quads) and how many named graphs are in use, Blazegraph may require that you perform the export process multiple times and generate multiple data files:

  • If the data is stored as triples, you need to run one export for each named graph.

  • If the data is stored as quads, you may choose to either export data in N-Quads format or export each named graph in a triples format.

Below we assume that you export a single namespace as N-Quads, but you can repeat the process for additional namespaces or desired export formats.

If you need Blazegraph to be online and available during the migration, use SPARQL CONSTRUCT queries. This requires that you install, configure, and run a Blazegraph instance with an accessible SPARQL endpoint.

If you don't need Blazegraph to be online, use the BlazeGraph Export utility. To do this you must download Blazegraph, and the data file and configuration files need to be accessible, but the server doesn’t need to be running.

Exporting data from Blazegraph using SPARQL CONSTRUCT

SPARQL CONSTRUCT is a feature of SPARQL that returns an RDF graph matching the a specified query template. For this use case, you use it to export your data one namespace at a time, using a query like the following:

CONSTRUCT WHERE { hint:Query hint:analytic "true" . hint:Query hint:constructDistinctSPO "false" . ?s ?p ?o }

Although other RDF tools exist to export this data, the easiest way to run this query is by using the REST API endpoint provided by Blazegraph. The following script demonstrates how to use a Python (3.6+) script to export data as N-Quads:

import requests # Configure the URL here: e.g. http://localhost:9999/sparql url = "http://localhost:9999/sparql" payload = {'query': 'CONSTRUCT WHERE { hint:Query hint:analytic "true" . hint:Query hint:constructDistinctSPO "false" . ?s ?p ?o }'} # Set the export format to be n-quads headers = { 'Accept': 'text/x-nquads' } # Run the http request response = requests.request("POST", url, headers=headers, data = payload, files = []) #open the file in write mode, write the results, and close the file handler f = open("export.nq", "w") f.write(response.text) f.close()

If the data is stored as triples, you need to change the Accept header parameter to export data in an appropriate format (N-Triples, RDF/XML, or Turtle) using the values specified on the Blazegraph GitHub repo.

Using the Blazegraph export utility to export data

Blazegraph contains a utility method to export data, namely the ExportKB class. ExportKB facilitates exporting data from Blazegraph, but unlike the previous method, requires that the server be offline while the export is running. This makes it the ideal method to use when you can take Blazegraph offline during migration, or the migration can occur from a backup of the data.

You run the utility from a Java command line on a machine that has Blazegraph installed but not running. The easiest way to run this command is to download the latest blazegraph.jar release located on GitHub. Running this command requires several parameters:

  • log4j.primary.configuration   –   The location of the log4j properties file.

  • log4j.configuration   –   The location of the log4j properties file.

  • output   –   The output directory for the exported data. Files are located as a tar.gz in a subdirectory named as documented in the knowledge base.

  • format   –   The desired output format followed by the location of the RWStore.properties file. If you’re working with triples, you need to change the -format parameter to N-Triples, Turtle, or RDF/XML.

For example, if you have the Blazegraph journal file and properties files, export data as N-Quads using the following code:

java -cp blazegraph.jar \ com.bigdata.rdf.sail.ExportKB \ -outdir ~/temp/ \ -format N-Quads \ ./RWStore.properties

If the export is successful, you see output like this:

Exporting kb as N-Quads on /home/ec2-user/temp/kb Effective output directory: /home/ec2-user/temp/kb Writing /home/ec2-user/temp/kb/kb.properties Writing /home/ec2-user/temp/kb/data.nq.gz Done

Create an Amazon Simple Storage Service (Amazon S3) bucket and copy the exported data into it

Once you have exported your data from Blazegraph, create an Amazon Simple Storage Service (Amazon S3) bucket in the same Region as the target Neptune DB cluster for the Neptune bulk loader to use to import the data from.

For instructions on how to create an Amazon S3 bucket, see How do I create an S3 Bucket? in the Amazon Simple Storage Service User Guide, and Examples of creating a bucket in the Amazon Simple Storage Service User Guide.

For instructions about how to copy the data files you have exported into the new Amazon S3 bucket, see Uploading an object to a bucket in the Amazon Simple Storage Service User Guide, or Using high-level (s3) commands with the Amazon CLI. You can also use Python code like the following to copy the files one by one:

import boto3 region = 'region name' bucket_name = 'bucket name' s3 = boto3.resource('s3') s3.meta.client.upload_file('export.nq', bucket_name, 'export.nq')