Creating and interpreting a query plan - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Creating and interpreting a query plan

You can use the query plan to get information on the individual operations required to run a query. Before you work with a query plan, we recommend that you first understand how Amazon Redshift handles processing queries and creating query plans. For more information, see Query planning and execution workflow.

To create a query plan, run the EXPLAIN command followed by the actual query text. The query plan gives you the following information:

  • What operations the execution engine performs, reading the results from bottom to top.

  • What type of step each operation performs.

  • Which tables and columns are used in each operation.

  • How much data is processed in each operation, in terms of number of rows and data width in bytes.

  • The relative cost of the operation. Cost is a measure that compares the relative execution times of the steps within a plan. Cost does not provide any precise information about actual execution times or memory consumption, nor does it provide a meaningful comparison between execution plans. It does give you an indication of which operations in a query are consuming the most resources.

The EXPLAIN command doesn't actually run the query. It only shows the plan that Amazon Redshift runs if the query is run under current operating conditions. If you change the schema or data for a table and run ANALYZE again to update the statistical metadata, the query plan might be different.

The query plan output by EXPLAIN is a simplified, high-level view of query execution. It doesn't illustrate the details of parallel query processing. To see detailed information, run the query itself, and then get query summary information from the SVL_QUERY_SUMMARY or SVL_QUERY_REPORT view. For more information about using these views, see Analyzing the query summary.

The following example shows the EXPLAIN output for a simple GROUP BY query on the EVENT table:

explain select eventname, count(*) from event group by eventname; QUERY PLAN ------------------------------------------------------------------- XN HashAggregate (cost=131.97..133.41 rows=576 width=17) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=17)

EXPLAIN returns the following metrics for each operation:

Cost

A relative value that is useful for comparing operations within a plan. Cost consists of two decimal values separated by two periods, for example cost=131.97..133.41. The first value, in this case 131.97, provides the relative cost of returning the first row for this operation. The second value, in this case 133.41, provides the relative cost of completing the operation. The costs in the query plan are cumulative as you read up the plan, so the HashAggregate cost in this example (131.97..133.41) includes the cost of the Seq Scan below it (0.00..87.98).

Rows

The estimated number of rows to return. In this example, the scan is expected to return 8798 rows. The HashAggregate operator on its own is expected to return 576 rows (after duplicate event names are discarded from the result set).

Note

The rows estimate is based on the available statistics generated by the ANALYZE command. If ANALYZE has not been run recently, the estimate is less reliable.

Width

The estimated width of the average row, in bytes. In this example, the average row is expected to be 17 bytes wide.

EXPLAIN operators

This section briefly describes the operators that you see most often in the EXPLAIN output. For a complete list of operators, see EXPLAIN in the SQL Commands section.

Sequential scan operator

The sequential scan operator (Seq Scan) indicates a table scan. Seq Scan scans each column in the table sequentially from beginning to end and evaluates query constraints (in the WHERE clause) for every row.

Join operators

Amazon Redshift selects join operators based on the physical design of the tables being joined, the location of the data required for the join, and the specific requirements of the query itself.

  • Nested Loop

    The least optimal join, a nested loop is used mainly for cross-joins (Cartesian products) and some inequality joins.

  • Hash Join and Hash

    Typically faster than a nested loop join, a hash join and hash are used for inner joins and left and right outer joins. These operators are used when joining tables where the join columns are not both distribution keys and sort keys. The hash operator creates the hash table for the inner table in the join; the hash join operator reads the outer table, hashes the joining column, and finds matches in the inner hash table.

  • Merge Join

    Typically the fastest join, a merge join is used for inner joins and outer joins. The merge join is not used for full joins. This operator is used when joining tables where the join columns are both distribution keys and sort keys, and when less than 20 percent of the joining tables are unsorted. It reads two sorted tables in order and finds the matching rows. To view the percent of unsorted rows, query the SVV_TABLE_INFO system table.

  • Spatial Join

    Typically a fast join based on proximity of spatial data, used for GEOMETRY and GEOGRAPHY data types.

Aggregate operators

The query plan uses the following operators in queries that involve aggregate functions and GROUP BY operations.

  • Aggregate

    Operator for scalar aggregate functions such as AVG and SUM.

  • HashAggregate

    Operator for unsorted grouped aggregate functions.

  • GroupAggregate

    Operator for sorted grouped aggregate functions.

Sort operators

The query plan uses the following operators when queries have to sort or merge result sets.

  • Sort

    Evaluates the ORDER BY clause and other sort operations, such as sorts required by UNION queries and joins, SELECT DISTINCT queries, and window functions.

  • Merge

    Produces final sorted results according to intermediate sorted results that derive from parallel operations.

UNION, INTERSECT, and EXCEPT operators

The query plan uses the following operators for queries that involve set operations with UNION, INTERSECT, and EXCEPT.

  • Subquery

    Used to run UNION queries.

  • Hash Intersect Distinct

    Used to run INTERSECT queries.

  • SetOp Except

    Used to run EXCEPT (or MINUS) queries.

Other operators

The following operators also appear frequently in EXPLAIN output for routine queries.

  • Unique

    Removes duplicates for SELECT DISTINCT queries and UNION queries.

  • Limit

    Processes the LIMIT clause.

  • Window

    Runs window functions.

  • Result

    Runs scalar functions that do not involve any table access.

  • Subplan

    Used for certain subqueries.

  • Network

    Sends intermediate results to the leader node for further processing.

  • Materialize

    Saves rows for input to nested loop joins and some merge joins.

Joins in EXPLAIN

The query optimizer uses different join types to retrieve table data, depending on the structure of the query and the underlying tables. The EXPLAIN output references the join type, the tables used, and the way the table data is distributed across the cluster to describe how the query is processed.

Join type examples

The following examples show the different join types that the query optimizer can use. The join type used in the query plan depends on the physical design of the tables involved.

Example: Hash join two tables

The following query joins EVENT and CATEGORY on the CATID column. CATID is the distribution and sort key for CATEGORY but not for EVENT. A hash join is performed with EVENT as the outer table and CATEGORY as the inner table. Because CATEGORY is the smaller table, the planner broadcasts a copy of it to the compute nodes during query processing by using DS_BCAST_INNER. The join cost in this example accounts for most of the cumulative cost of the plan.

explain select * from category, event where category.catid=event.catid; QUERY PLAN ------------------------------------------------------------------------- XN Hash Join DS_BCAST_INNER (cost=0.14..6600286.07 rows=8798 width=84) Hash Cond: ("outer".catid = "inner".catid) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=35) -> XN Hash (cost=0.11..0.11 rows=11 width=49) -> XN Seq Scan on category (cost=0.00..0.11 rows=11 width=49)
Note

Aligned indents for operators in the EXPLAIN output sometimes indicate that those operations do not depend on each other and can start in parallel. In the preceding example, although the scan on the EVENT table and the hash operation are aligned, the EVENT scan must wait until the hash operation has fully completed.

Example: Merge join two tables

The following query also uses SELECT *, but it joins SALES and LISTING on the LISTID column, where LISTID has been set as both the distribution and sort key for both tables. A merge join is chosen, and no redistribution of data is required for the join (DS_DIST_NONE).

explain select * from sales, listing where sales.listid = listing.listid; QUERY PLAN ----------------------------------------------------------------------------- XN Merge Join DS_DIST_NONE (cost=0.00..6285.93 rows=172456 width=97) Merge Cond: ("outer".listid = "inner".listid) -> XN Seq Scan on listing (cost=0.00..1924.97 rows=192497 width=44) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=53)

The following example demonstrates the different types of joins within the same query. As in the previous example, SALES and LISTING are merge joined, but the third table, EVENT, must be hash joined with the results of the merge join. Again, the hash join incurs a broadcast cost.

explain select * from sales, listing, event where sales.listid = listing.listid and sales.eventid = event.eventid; QUERY PLAN ---------------------------------------------------------------------------- XN Hash Join DS_BCAST_INNER (cost=109.98..3871130276.17 rows=172456 width=132) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Merge Join DS_DIST_NONE (cost=0.00..6285.93 rows=172456 width=97) Merge Cond: ("outer".listid = "inner".listid) -> XN Seq Scan on listing (cost=0.00..1924.97 rows=192497 width=44) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=53) -> XN Hash (cost=87.98..87.98 rows=8798 width=35) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=35)

Example: Join, aggregate, and sort

The following query runs a hash join of the SALES and EVENT tables, followed by aggregation and sort operations to account for the grouped SUM function and the ORDER BY clause. The initial sort operator runs in parallel on the compute nodes. Then the Network operator sends the results to the leader node, where the Merge operator produces the final sorted results.

explain select eventname, sum(pricepaid) from sales, event where sales.eventid=event.eventid group by eventname order by 2 desc; QUERY PLAN --------------------------------------------------------------------------------- XN Merge (cost=1002815366604.92..1002815366606.36 rows=576 width=27) Merge Key: sum(sales.pricepaid) -> XN Network (cost=1002815366604.92..1002815366606.36 rows=576 width=27) Send to leader -> XN Sort (cost=1002815366604.92..1002815366606.36 rows=576 width=27) Sort Key: sum(sales.pricepaid) -> XN HashAggregate (cost=2815366577.07..2815366578.51 rows=576 width=27) -> XN Hash Join DS_BCAST_INNER (cost=109.98..2815365714.80 rows=172456 width=27) Hash Cond: ("outer".eventid = "inner".eventid) -> XN Seq Scan on sales (cost=0.00..1724.56 rows=172456 width=14) -> XN Hash (cost=87.98..87.98 rows=8798 width=21) -> XN Seq Scan on event (cost=0.00..87.98 rows=8798 width=21)

Data redistribution

The EXPLAIN output for joins also specifies a method for how data is moved around a cluster to facilitate the join. This data movement can be either a broadcast or a redistribution. In a broadcast, the data values from one side of a join are copied from each compute node to every other compute node, so that every compute node ends up with a complete copy of the data. In a redistribution, participating data values are sent from their current slice to a new slice (possibly on a different node). Data is typically redistributed to match the distribution key of the other table participating in the join if that distribution key is one of the joining columns. If neither of the tables has distribution keys on one of the joining columns, either both tables are distributed or the inner table is broadcast to every node.

The EXPLAIN output also references inner and outer tables. The inner table is scanned first, and appears nearer the bottom of the query plan. The inner table is the table that is probed for matches. It is usually held in memory, is usually the source table for hashing, and if possible, is the smaller table of the two being joined. The outer table is the source of rows to match against the inner table. It is usually read from disk. The query optimizer chooses the inner and outer table based on database statistics from the latest run of the ANALYZE command. The order of tables in the FROM clause of a query doesn't determine which table is inner and which is outer.

Use the following attributes in query plans to identify how data is moved to facilitate a query:

  • DS_BCAST_INNER

    A copy of the entire inner table is broadcast to all compute nodes.

  • DS_DIST_ALL_NONE

    No redistribution is required, because the inner table has already been distributed to every node using DISTSTYLE ALL.

  • DS_DIST_NONE

    No tables are redistributed. Collocated joins are possible because corresponding slices are joined without moving data between nodes.

  • DS_DIST_INNER

    The inner table is redistributed.

  • DS_DIST_OUTER

    The outer table is redistributed.

  • DS_DIST_ALL_INNER

    The entire inner table is redistributed to a single slice because the outer table uses DISTSTYLE ALL.

  • DS_DIST_BOTH

    Both tables are redistributed.