Neptune Best Practices Using openCypher and Bolt
Follow these best practices when using the openCypher query language and Bolt protocol with Neptune. For information about using openCypher in Neptune, see Accessing the Neptune Graph with openCypher.
Prefer directed to bi-directional edges in queries
When Neptune performs query optimizations, bi-directional edges make it difficult to create optimal query plans. Sub-optimal plans require the engine to do unnecessary work and result in poorer performance.
Therefore, use directed edges rather than bi-directional ones whenever possible. For example, use:
MATCH p=(:airport {code: 'ANC'})-[:route]->(d) RETURN p)
instead of:
MATCH p=(:airport {code: 'ANC'})-[:route]-(d) RETURN p)
Most data models don't actually need to traverse edges in both directions, so queries can achieve significant performance improvements by making the switch to using directed edges.
If your data model does require traversing bi-directional edges, make the first
node (left-hand side) in the MATCH
pattern the node with the most
restrictive filtering.
Take the example, "Find me all the routes
to and from the
ANC
airport". Write this query to start at the ANC
airport, like this:
MATCH p=(src:airport {code: 'ANC'})-[:route]-(d) RETURN p
The engine can perform the minimal amount of work to satisfy the query, because the most restricted node is placed as the first node (left-hand side) in the pattern. The engine can then optimize the query.
This is far preferable than filtering the ANC
airport at the end
of the pattern, like this:
MATCH p=(d)-[:route]-(src:airport {code: 'ANC'}) RETURN p
When the most restricted node is not placed first in the pattern, the engine must perform additional work because it can't optimize the query and has to perform additional lookups to arrive at the results.
Neptune does not support multiple concurrent queries in a transaction
Although the Bolt driver itself allows concurrent queries in a transaction, Neptune does not support multiple queries in a transaction running concurrently. Instead, Neptune requires that multiple queries in a transaction be run sequentially, and that the results of each query be completely consumed before the next query is initiated.
The example below shows how to use Bolt to run multiple queries sequentially in a transaction, so that the results of each one are completely consumed before the next one begins:
final String query = "MATCH (n) RETURN n"; try (Driver driver = getDriver(HOST_BOLT, getDefaultConfig())) { try (Session session = driver.session(readSessionConfig)) { try (Transaction trx = session.beginTransaction()) { final Result res_1 = trx.run(query); Assert.assertEquals(10000, res_1.list().size()); final Result res_2 = trx.run(query); Assert.assertEquals(10000, res_2.list().size()); } } }
Create a new connection after failover
In case of a failover, the Bolt driver can continue connecting to the old writer instance rather than the new active one, because the DNS name resolved to a specific IP address.
To prevent this, close and then reconnect the Driver
object after
any failover.
Connection handling for long-lived applications
When building long-lived applications, such as those running within containers or
on Amazon EC2 instances, instantiate a Driver
object once and then reuse that
object for the lifetime of the application. The Driver
object is thread
safe, and the overhead of initializing it is considerable.
Connection handling for Amazon Lambda
Bolt drivers are not recommended for use within Amazon Lambda functions, because of their connection overhead and management requirements. Use the HTTPS endpoint instead.
Close driver objects when you're done
Be sure to close the client when you are finished with it, so that the Bolt connections
are closed by the server and all resources associated with the connections are released.
This happens automatically if you close the driver using driver.close()
.
If the driver is not closed properly, Neptune terminates all idle Bolt connections after 20 minutes, or after 10 days if you are using IAM authentication.
Neptune supports no more than 1000 concurrent Bolt connections. If you don't explicitly close connections when you're done with them, and the number of live connections reaches that limit of 1000, any new connection attempts fail.
Use explicit transaction modes for reading and writing
When using transactions with Neptune and the Bolt driver, it is best to explicitly set the access mode for both read and write transactions to the right settings.
Read-only transactions
For read-only transactions, if you don't pass in the appropriate access mode
configuration when building the session, the default isolation level is used,
which is mutation query isolation. As a result, it's important for read-only
transactions to set the access mode to read
explicitly.
Auto-commit read transaction example:
SessionConfig sessionConfig = SessionConfig .builder() .withFetchSize(1000) .withDefaultAccessMode(AccessMode.READ) .build(); Session session = driver.session(sessionConfig); try {
(Add your application code here)
} catch (final Exception e) { throw e; } finally { driver.close() }
Read transaction example:
Driver driver = GraphDatabase.driver(url, auth, config); SessionConfig sessionConfig = SessionConfig .builder() .withDefaultAccessMode(AccessMode.READ) .build(); driver.session(sessionConfig).readTransaction( new TransactionWork<List<String>>() { @Override public List<String> execute(org.neo4j.driver.Transaction tx) {
(Add your application code here)
} } );
In both cases, SNAPSHOT isolation is achieved using Neptune read-only transaction semantics.
Because read replicas only accept read-only queries, any query submitted
to a read replica runs under SNAPSHOT
isolation semantics.
There are no dirty reads or non-repeatable reads for read-only transactions.
Read-only transactions
For mutation queries, there are three different mechanisms to create a write transaction, each of which is illustrated below:
Implicit write transaction example:
Driver driver = GraphDatabase.driver(url, auth, config); SessionConfig sessionConfig = SessionConfig .builder() .withDefaultAccessMode(AccessMode.WRITE) .build(); driver.session(sessionConfig).writeTransaction( new TransactionWork<List<String>>() { @Override public List<String> execute(org.neo4j.driver.Transaction tx) {
(Add your application code here)
} } );
Auto-commit write transaction example:
SessionConfig sessionConfig = SessionConfig .builder() .withFetchSize(1000) .withDefaultAccessMode(AccessMode.Write) .build(); Session session = driver.session(sessionConfig); try {
(Add your application code here)
} catch (final Exception e) { throw e; } finally { driver.close() }
Explicit write transaction example:
Driver driver = GraphDatabase.driver(url, auth, config); SessionConfig sessionConfig = SessionConfig .builder() .withFetchSize(1000) .withDefaultAccessMode(AccessMode.WRITE) .build(); Transaction beginWriteTransaction = driver.session(sessionConfig).beginTransaction();
(Add your application code here)
beginWriteTransaction.commit(); driver.close();
Isolation levels for write transactions
Reads made as part of mutation queries are run under
READ COMMITTED
transaction isolation.There are no dirty reads for reads made as part of mutation queries.
Records and ranges of records are locked when reading in a mutation query.
When a range of the index has been read by a mutation transaction, there is a strong guarantee that this range will not be modified by any concurrent transactions until the end of the read.
Mutation queries are not thread safe.
For conflicts, see Conflict Resolution Using Lock-Wait Timeouts.
Mutation queries are not automatically retried in case of failure.
Retry logic for exceptions
For all exceptions that allow a retry, it is generally best to use an exponential backoff
and retry strategyConcurrentModificationException
errors. The following shows an example of an exponential backoff and retry pattern:
public static void main() { try (Driver driver = getDriver(HOST_BOLT, getDefaultConfig())) { retriableOperation(driver, "CREATE (n {prop:'1'})") .withRetries(5) .withExponentialBackoff(true) .maxWaitTimeInMilliSec(500) .call(); } } protected RetryableWrapper retriableOperation(final Driver driver, final String query){ return new RetryableWrapper<Void>() { @Override public Void submit() { log.info("Performing graph Operation in a retry manner......"); try (Session session = driver.session(writeSessionConfig)) { try (Transaction trx = session.beginTransaction()) { trx.run(query).consume(); trx.commit(); } } return null; } @Override public boolean isRetryable(Exception e) { if (isCME(e)) { log.debug("Retrying on exception.... {}", e); return true; } return false; } private boolean isCME(Exception ex) { return ex.getMessage().contains("Operation failed due to conflicting concurrent operations"); } }; } /** * Wrapper which can retry on certain condition. Client can retry operation using this class. */ @Log4j2 @Getter public abstract class RetryableWrapper<T> { private long retries = 5; private long maxWaitTimeInSec = 1; private boolean exponentialBackoff = true; /** * Override the method with custom implementation, which will be called in retryable block. */ public abstract T submit() throws Exception; /** * Override with custom logic, on which exception to retry with. */ public abstract boolean isRetryable(final Exception e); /** * Define the number of retries. * * @param retries -no of retries. */ public RetryableWrapper<T> withRetries(final long retries) { this.retries = retries; return this; } /** * Max wait time before making the next call. * * @param time - max polling interval. */ public RetryableWrapper<T> maxWaitTimeInMilliSec(final long time) { this.maxWaitTimeInSec = time; return this; } /** * ExponentialBackoff coefficient. */ public RetryableWrapper<T> withExponentialBackoff(final boolean expo) { this.exponentialBackoff = expo; return this; } /** * Call client method which is wrapped in submit method. */ public T call() throws Exception { int count = 0; Exception exceptionForMitigationPurpose = null; do { final long waitTime = exponentialBackoff ? Math.min(getWaitTimeExp(retries), maxWaitTimeInSec) : 0; try { return submit(); } catch (Exception e) { exceptionForMitigationPurpose = e; if (isRetryable(e) && count < retries) { Thread.sleep(waitTime); log.debug("Retrying on exception attempt - {} on exception cause - {}", count, e.getMessage()); } else if (!isRetryable(e)) { log.error(e.getMessage()); throw new RuntimeException(e); } } } while (++count < retries); throw new IOException(String.format( "Retry was unsuccessful.... attempts %d. Hence throwing exception " + "back to the caller...", count), exceptionForMitigationPurpose); } /* * Returns the next wait interval, in milliseconds, using an exponential backoff * algorithm. */ private long getWaitTimeExp(final long retryCount) { if (0 == retryCount) { return 0; } return ((long) Math.pow(2, retryCount) * 100L); } }
Set multiple properties at once using a single SET clause
Instead of using multiple SET clauses to set individual properties, use a map to set multiple properties for an entity at once.
You can use:
MATCH (n:SomeLabel {`~id`: 'id1'}) SET n += {property1 : 'value1', property2 : 'value2', property3 = 'value3'}
Instead of:
MATCH (n:SomeLabel {`~id`: 'id1'}) SET n.property1 = 'value1' SET n.property2 = 'value2' SET n.property3 = 'value3'
The SET clause accepts either a single property or a map. If updating multiple properties on a single entity, using a single SET clause with a map allows the updates to be performed in a single operation instead of multiple operations, which can be executed more effeciently.
Use the SET clause to remove multiple properties at once
When using the openCypher language, REMOVE is used to remove properties from an entity. In Neptune,
each property being removed requires a separate operation, adding query latency. You can instead use SET with a map to
set all property values to null
, which in Neptune is equivalent to removing properties. Neptune will
have increased performance when multiple properties on a single entity are required to be removed.
Use:
WITH {prop1: null, prop2: null, prop3: null} as propertiesToRemove MATCH (n) SET n += propertiesToRemove
Instead of:
MATCH (n) REMOVE n.prop1, n.prop2, n.prop3
Use parameterized queries
It is recommended to always use parameterized queries when querying using openCypher. The query engine can leverage repeated parameterized queries for features like query plan cache, where repeated invocation of the same parameterized structure with different parameters can leverage the cached plans. The query plan generated for parameterized queries is cached and reused only when it completes within 100ms and the parameter types are either NUMBER, BOOLEAN or STRING.
Use:
MATCH (n:foo) WHERE id(n) = $id RETURN n
With parameters:
parameters={"id": "first"} parameters={"id": "second"} parameters={"id": "third"}
Instead of:
MATCH (n:foo) WHERE id(n) = "first" RETURN n MATCH (n:foo) WHERE id(n) = "second" RETURN n MATCH (n:foo) WHERE id(n) = "third" RETURN n
Use flattened maps instead of nested maps in UNWIND clause
Deep nested structure can restrict the ability of the query engine to generate an optimal query plan. To partially alleviate this issue, the following defined patterns will create optimal plans for the following scenarios:
-
Scenario 1: UNWIND with a list of cypher literals, which includes NUMBER, STRING and BOOLEAN.
-
Scenario 2: UNWIND with a list of flattened maps, which includes only cypher literals (NUMBER, STRING, BOOLEAN) as values.
When writing a query containing UNWIND clause, use the above recommendation to improve performance.
Scenario 1 example:
UNWIND $ids as x MATCH(t:ticket {`~id`: x})
With parameters:
parameters={ "ids": [1, 2, 3] }
An example for Scenario 2 is to generate a list of nodes to CREATE or MERGE. Instead of issuing multiple statements, use the following pattern to define the properties as a set of flattened maps:
UNWIND $props as p CREATE(t:ticket {title: p.title, severity:p.severity})
With parameters:
parameters={ "props": [ {"title": "food poisoning", "severity": "2"}, {"title": "Simone is in office", "severity": "3"} ] }
Instead of nested node objects like:
UNWIND $nodes as n CREATE(t:ticket n.properties)
With parameters:
parameters={ "nodes": [ {"id": "ticket1", "properties": {"title": "food poisoning", "severity": "2"}}, {"id": "ticket2", "properties": {"title": "Simone is in office", "severity": "3"}} ] }
Place more restrictive nodes on the left side in Variable-Length Path (VLP) expressions
In Variable-Length Path (VLP) queries, the query engine optimizes the evaluation by choosing to start the traversal on the left or right side of the expression. The decision is based on the cardinality of the patterns on the left and right side. Cardinality is the number of nodes matching the specified pattern.
-
If the right pattern has a cardinality of one, then the right side will be the starting point.
-
If the left and the right side have cardinality of one, the expansion is checked on both sides and starts on the side with the smaller expansion. Expansion is the number of outgoing or incoming edges for the node on the left and the node on the right side of the VLP expression. This part of the optimization is only used if the VLP relationship is unidirectional and the relationship type is provided.
-
Otherwise, the left side will be the starting point.
For a chain of VLP expressions, this optimization can only be applied to the first expression. The other VLPs are evaluated starting with the left side. As an example, let the cardinality of (a), (b) be one, and the cardinality of (c) be greater than one.
-
(a)-[*1..]->(c)
: Evaluation starts with (a). -
(c)-[*1..]->(a)
: Evaluation starts with (a). -
(a)-[*1..]-(c)
: Evaluation starts with (a). -
(c)-[*1..]-(a)
: Evaluation starts with (a).
Now let the incoming edges of (a) be two, and the outgoing edges of (a) be three, the incoming edges of (b) be four, and the outgoing edges of (b) be five.
-
(a)-[*1..]->(b)
: Evaluation starts with (a) as the outgoing edges of (a) are less than the incoming edges of (b). -
(a)<-[*1..]-(b)
: Evaluation starts with (a) as the incoming edges of (a) are less than the outgoing edges of (b).
As a general rule, place the more restrictive pattern on the left side of a VLP expression.
Avoid redundant node label checks by using granular relationship names
When optimizing for performance, using relationship labels that are exclusive to node patterns allows the removal of
label filtering on nodes. Consider a graph model where the relationship likes
is only used to define
a relationship between two person
nodes. We could write the following query to find this pattern:
MATCH (n:person)-[:likes]->(m:person) RETURN n, m
The person
label check on n and m is redundant, as we defined the relationship to only appear when both
are of the type person
. To optimize on performance, we can write the query as follows:
MATCH (n)-[:likes]->(m) RETURN n, m
This pattern can also apply when properties are exclusive to a single node label. Assume that only person
nodes have the property email
, therefore verifying the node label matches person
is
redundant. Writing this query as:
MATCH (n:person) WHERE n.email = 'xxx@gmail.com' RETURN n
Is less efficient than writing this query as:
MATCH (n) WHERE n.email = 'xxx@gmail.com' RETURN n
You should only adopt this pattern when performance is important and you have checks in your modeling process to
ensure these edge labels are not reused for patterns involving other node labels. If you later introduce an
email
property on another node label such as company
, then the results will differ
between these two versions of the query.
Specify edge labels where possible
It is recommended to provide an edge label where possible when specifying an edge in a pattern. Consider the following example query, which is used to link all of the people living in a city with all of the people who visited that city.
MATCH (person)-->(city {country: "US"})-->(anotherPerson) RETURN person, anotherPerson
If your graph model links people to nodes other than just cities using multiple edge labels, by not specifying the end label, Neptune will need to evaluate additional paths that will later be discarded. In the above query, as an edge label was not given, the engine does more work first and then filters out values to obtain the correct result. A better version of above query might be:
MATCH (person)-[:livesIn]->(city {country: "US"})-[:visitedBy]->(anotherPerson) RETURN person, anotherPerson
This not only helps in evaluation, but enables the query planner to create better plans. You could even combine this best practice with redundant node label checks to remove the city label check and write the query as:
MATCH (person)-[:livesIn]->({country: "US"})-[:visitedBy]->(anotherPerson) RETURN person, anotherPerson
Avoid using the WITH clause when possible
The WITH clause in openCypher acts as a boundary where everything before it executes, and then the resulting values
are passed to the remaining portions of the query. The WITH clause is needed when you require interim aggregation
or want to limit the number of results, but aside from that you should try to avoid using the WITH clause. The
general guidance is to remove these simple WITH clauses (without aggregation, order by or limit) to enable the
query planner to work on the entire query to create a globally optimal plan. As an example, assume you wrote a query
to return all people living in India
:
MATCH (person)-[:lives_in]->(city) WITH person, city MATCH (city)-[:part_of]->(country {name: 'India'}) RETURN collect(person) AS result
In the above version, the WITH clause restricts the placement of the pattern
(city)-[:part_of]->(country {name: 'India'})
(which is more restrictive) before
(person)-[:lives_in]->(city)
. This makes the plan sub-optimal. An optimization
on this query would be to remove the WITH clause and let the planner compute the best plan.
MATCH (person)-[:lives_in]->(city) MATCH (city)-[:part_of]->(country {name: 'India'}) RETURN collect(person) AS result
Place restrictive filters as early in the query as possible
In all scenarios, early placement of filters in the query helps in reducing the intermediate solutions a query plan must consider. This means less memory and fewer compute resources are needed to execute the query.
The following example helps you understand these impacts. Suppose you write a query to return all of the people
who live in India
. One version of the query could be:
MATCH (n)-[:lives_in]->(city)-[:part_of]->(country) WITH country, collect(n.firstName + " " + n.lastName) AS result WHERE country.name = 'India' RETURN result
The above version of the query is not the most optimal way to achieve this use case. The filter
country.name = 'India'
appears later in the query pattern. It will first collect all persons
and where they live, and group them by country, then filter for only the group for country.name = India
.
The optimal way to query for only people living in India
and then perform the collect aggregation.
MATCH (n)-[:lives_in]->(city)-[:part_of]->(country) WHERE country.name = 'India' RETURN collect(n.firstName + " " + n.lastName) AS result
A general rule is to place a filter as soon as possible after the variable is introduced.
Explicitly check whether properties exist
Based on openCypher semantics, when a property is accessed it is equivalent to an optional join and must retain all rows even if the property does not exist. If you know based on your graph schema that a particular property will always exist for that entity, explicitly checking that property for existence allows the query engine to create optimal plans and improve performance.
Consider a graph model where nodes of type person
always have a property name
. Instead of
doing this:
MATCH (n:person) RETURN n.name
Explicitly verify the property existence in the query with an IS NOT NULL check:
MATCH (n:person) WHERE n.name IS NOT NULL RETURN n.name
Do not use named path (unless it is required)
Named path in a query always comes at an additional cost, which can add penalties in terms of higher latency and memory usage. Consider the following query:
MATCH p = (n)-[:commentedOn]->(m) WITH p, m, n, n.score + m.score as total WHERE total > 100 MATCH (m)-[:commentedON]->(o) WITH p, m, n, distinct(o) as o1 RETURN p, m.name, n.name, o1.name
In the above query, assuming we only want to know the properties of the nodes, the use of path “p” is unnecessary. By specifying the named path as a variable, the aggregation operation using DISTINCT will get expensive both in terms of time and memory usage. A more optimized version of above query could be:
MATCH (n)-[:commentedOn]->(m) WITH m, n, n.score + m.score as total WHERE total > 100 MATCH (m)-[:commentedON]->(o) WITH m, n, distinct(o) as o1 RETURN m.name, n.name, o1.name
Avoid COLLECT(DISTINCT())
COLLECT(DISTINCT()) is used whenever a list is to be formed containing distinct values. COLLECT is an aggregation function, and grouping is done based on additional keys being projected in the same statement. When distinct is used, the input is split in multiple chunks where each chunk denotes one group for reduction. Performance will be impacted as the number of groups increases. In Neptune, it is much more efficient to perform DISTINCT before actually collecting/forming the list. This allows grouping to be done directly on the grouping keys for the whole chunk.
Consider the following query:
MATCH (n:Person)-[:commented_on]->(p:Post) WITH n, collect(distinct(p.post_id)) as post_list RETURN n, post_list
A more optimal way of writing this query is:
MATCH (n:Person)-[:commented_on]->(p:Post) WITH DISTINCT n, p.post_id as postId WITH n, collect(postId) as post_list RETURN n, post_list
Prefer the properties function over individual property lookup when retrieving all property values
The properties()
function is used to return a map containing all properties for an entity, and is much
more efficient than returning properties individually.
Assuming your Person
nodes contain 5 properties, firstName
, lastName
,
age
, dept
, and company
, the following query would be preferred:
MATCH (n:Person) WHERE n.dept = 'AWS' RETURN properties(n) as personDetails
Rather than using:
MATCH (n:Person) WHERE n.dept = 'AWS' RETURN n.firstName, n.lastName, n.age, n.dept, n.company === OR === MATCH (n:Person) WHERE n.dept = 'AWS' RETURN {firstName: n.firstName, lastName: n.lastName, age: n.age, department: n.dept, company: n.company} as personDetails
Perform static computations outside of the query
It is recommended to resolve static computations (simple mathematical/string operations) on the client-side. Consider this example where you want to find all people one year older or less than the author:
MATCH (m:Message)-[:HAS_CREATOR]->(p:person) WHERE p.age <= ($age + 1) RETURN m
Here, $age
is injected into the query via parameters, and is then added to a fixed value. This value is
then compared with p.age
. Instead, a better approach would be doing the addition on the client-side
and passing the calculated value as a parameter $ageplusone. This helps the query engine to create optimized plans, and
avoids static computation for each incoming row. Following these guidelines, a more efficient verson of the query
would be:
MATCH (m:Message)-[:HAS_CREATOR]->(p:person) WHERE p.age <= $ageplusone RETURN m
Batch inputs using UNWIND instead of individual statements
Whenever the same query needs to be executed for different inputs, instead of executing one query per input, it would be much more performant to run a query for a batch of inputs.
If you want to merge on a set of nodes, one option is to run a merge query per input:
MERGE (n:Person {`~id`: $id}) SET n.name = $name, n.age = $age, n.employer = $employer
With parameters:
params = {id: '1', name: 'john', age: 25, employer: 'Amazon'}
The above query needs to be executed for every input. While this approach works, it may require many queries to be executed for a large set of input. In this scenario, batching may help reduce the number of queries executed on the server, as well as improve the overall throughput.
Use the following pattern:
UNWIND $persons as person MERGE (n:Person {`~id`: person.id}) SET n += person
With parameters:
params = {persons: [{id: '1', name: 'john', age: 25, employer: 'Amazon'}, {id: '2', name: 'jack', age: 28, employer: 'Amazon'}, {id: '3', name: 'alice', age: 24, employer: 'Amazon'}...]}
Experimentation with different batch sizes is recommended to determine what works best for your workload.
Prefer using custom IDs for node/relationship
Neptune allows users to explicitly assign IDs on nodes and relationships. The ID must be globally unique in the dataset and deterministic to be useful. A deterministic ID can be used as a lookup or a filtering mechanism just like properties; however, using an ID is much more optimized from query execution perspective than using properties. There are several benefits to using custom IDs -
-
Properties can be null for an existing entity, but the ID must exist. This allows the query engine to use an optimized join during execution.
-
When concurrent mutation queries are executed, the chances of concurrent modification exceptions (CMEs) are reduced significantly when IDs are used to access nodes because fewer locks are taking on IDs than properties due to their enforced uniqueness.
-
Using IDs avoids the chance of creating duplicate data as Neptune enforces uniqueness on IDs, unlike properties.
The following query example uses a custom ID:
Note
The property ~id
is used to specify the ID, whereas id
is just stored as any other property.
CREATE (n:Person {`~id`: '1', name: 'alice'})
Without using a custom ID:
CREATE (n:Person {id: '1', name: 'alice'})
If using the latter mechanism, there is no uniqueness enforcement and you could later execute the query:
CREATE (n:Person {id: '1', name: 'john'})
This creates a second node with id=1
named john
. In this scenario, you would now have
two nodes with id=1
, each having a different name - (alice and john).
Avoid doing ~id
computations in the query
When using custom IDs in the queries, always perform static computations outside the queries and provide these values in the parameters. When static values are provided, the engine is better able to optimize lookups and avoid scanning and filtering these values.
If you want to create edges between nodes that are existing in the database, one option could be:
UNWIND $sections as section MATCH (s:Section {`~id`: 'Sec-' + section.id}) MERGE (s)-[:IS_PART_OF]->(g:Group {`~id`: 'g1'})
With parameters:
parameters={sections: [{id: '1'}, {id: '2'}]}
In the query above, the id
of the section is being computed in the query. Since the computation is dynamic,
the engine cannot statically inline ids and ends up scanning all section nodes. The engine then performs post-filtering
for required nodes. This can be costly if there are many section nodes in the database.
A better way to achieve this is to have Sec-
prepended in the ids being passed into the database:
UNWIND $sections as section MATCH (s:Section {`~id`: section.id}) MERGE (s)-[:IS_PART_OF]->(g:Group {`~id`: 'g1'})
With parameters:
parameters={sections: [{id: 'Sec-1'}, {id: 'Sec-2'}]}