Neptune full-text search parameters
Amazon Neptune uses the following parameters for specifying full-text OpenSearch queries in both Gremlin and SPARQL:
-
queryType– (Required) The type of OpenSearch query. (For a list of query types, see the OpenSearch documentation). Neptune supports the following OpenSearch query types: -
simple_query_string
– Returns documents based on a provided query string, using a parser with a limited but fault-tolerant Lucene syntax. This is the default query type. This query uses a simple syntax to parse and split the provided query string into terms based on special operators. The query then analyzes each term independently before returning matching documents.
While its syntax is more limited than the
query_stringquery, thesimple_query_stringquery does not return errors for invalid syntax. Instead, it ignores any invalid parts of the query string. match
– The matchquery is the standard query for performing a full-text search, including options for fuzzy matching.prefix
– Returns documents that contain a specific prefix in a provided field. -
fuzzy
– Returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance. An edit distance is the number of one-character changes needed to turn one term into another. These changes can include:
Changing a character (box to fox).
Removing a character (black to lack).
Inserting a character (sic to sick).
Transposing two adjacent characters (act to cat).
To find similar terms, the fuzzy query creates a set of all possible variations and expansions of the search term within a specified edit distance and then returns exact matches for each of those variants.
term
– Returns documents that contain an exact match of a specified term in one of the specified fields. You can use the
termquery to find documents based on a precise value such as a price, a product ID, or a username.Warning
Avoid using the term query for text fields. By default, OpenSearch changes the values of text fields as part of its analysis, which can make finding exact matches for text field values difficult.
To search text field values, use the match query instead.
-
query_string
– Returns documents based on a provided query string, using a parser with a strict syntax (Lucene syntax). This query uses a syntax to parse and split the provided query string based on operators, such as AND or NOT. The query then analyzes each split text independently before returning matching documents.
You can use the
query_stringquery to create a complex search that includes wildcard characters, searches across multiple fields, and more. While versatile, the query is strict and returns an error if the query string includes any invalid syntax.Warning
Because it returns an error for any invalid syntax, we don’t recommend using the
query_stringquery for search boxes.If you don’t need to support a query syntax, consider using the
matchquery. If you need the features of a query syntax, use thesimple_query_stringquery, which is less strict.
-
-
field– The field in OpenSearch against which to run the search. This can be omitted only if thequeryTypeallows it (assimple_query_stringandquery_stringdo), in which case the search is against all fields. In Gremlin, it is implicit.Multiple fields can be specified if the query allows it, as do
simple_query_stringandquery_string. query– (Required) The query to run against OpenSearch. The contents of this field might vary according to the queryType. Different queryTypes accept different syntaxes, asRegexpdoes, for example. In Gremlin,queryis implicit.-
maxResults– The maximum number of results to return. The default is theindex.max_result_windowOpenSearch setting, which itself defaults to 10,000. ThemaxResultsparameter can specify any number lower than that.Important
If you set
maxResultsto a value higher than the OpenSearchindex.max_result_windowvalue and try to retrieve more thanindex.max_result_windowresults, OpenSearch fails with aResult window is too largeerror. However, Neptune handles this gracefully without propagating the error. Keep this in mind if you are trying to fetch more thanindex.max_result_windowresults. minScore– The minimum score a search result must have to be returned. See OpenSearch relevance documentationfor an explanation of result scoring. batchSize– Neptune always fetches data in batches (the default batch size is 100). You can use this parameter to tune performance. The batch size cannot exceed theindex.max_result_windowOpenSearch setting, which defaults to 10,000.-
sortBy– An optional parameter that lets you sort the results returned by OpenSearch by one of the following:-
A particular string field in the document –
For example, in a SPARQL query, you could specify:
neptune-fts:config neptune-fts:sortBy foaf:name .In a similar Gremlin query, you could specify:
.withSideEffect('Neptune#fts.sortBy', 'name') -
A particular non-string field (
long,double, etc.) in the document –Note that when sorting on a non-string field, you need to append
.valueto the field name to differentiate it from a string field.For example, in a SPARQL query, you could specify:
neptune-fts:config neptune-fts:sortBy foaf:name.value .In a similar Gremlin query, you could specify:
.withSideEffect('Neptune#fts.sortBy', 'name.value') -
score– Sort by match score (the default).If the
sortOrderparameter is present butsortByis not present, the results are sorted byscorein the order specified bysortOrder. -
id– Sort by ID, which means the SPARQL subject URI or the Gremlin vertex or edge ID.For example, in a SPARQL query, you could specify:
neptune-fts:config neptune-fts:sortBy 'Neptune#fts.entity_id' .In a similar Gremlin query, you could specify:
.withSideEffect('Neptune#fts.sortBy', 'Neptune#fts.entity_id') -
label– Sort by label.For example, in a SPARQL query, you could specify:
neptune-fts:config neptune-fts:sortBy 'Neptune#fts.entity_type' .In a similar Gremlin query, you could specify:
.withSideEffect('Neptune#fts.sortBy', 'Neptune#fts.entity_type') -
doc_type– Sort by document type (that is, SPARQL or Gremlin).For example, in a SPARQL query, you could specify:
neptune-fts:config neptune-fts:sortBy 'Neptune#fts.document_type' .In a similar Gremlin query, you could specify:
.withSideEffect('Neptune#fts.sortBy', 'Neptune#fts.document_type')
By default, OpenSearch results are not sorted and their order is non-deterministic, meaning that the same query may return items in a different order each time it is run. For this reason, if the result set is greater than
max_result_window, a quite different subset of the total results could be returned every time a query is run. By sorting, however, you can make the results of different runs more directly comparable.If no
sortOrderparameter accompaniessortBy, descending (DESC) order from greatest to least is used. -
-
sortOrder– An optional parameter that lets you specify whether OpenSearch results are sorted from least to greatest or from greatest to least (the default):ASC– Ascending order, from least to greatest.-
DESC– Descending order, from greatest to least.This is the default value, used when the
sortByparameter is present but nosortOrderis specified.
If neither
sortBynorsortOrderis present, OpenSearch results are not sorted by default.