Query performance improvement - Amazon Redshift
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Query performance improvement

Following are some common issues that affect Amazon Redshift query performance, with instructions on ways to diagnose and resolve them.

Table statistics missing or out of date

If table statistics are missing or out of date, you might see the following:

  • A warning message in EXPLAIN command results.

  • A missing statistics alert event in STL_ALERT_EVENT_LOG. For more information, see Reviewing query alerts.

To fix this issue, run ANALYZE.

Nested loop

If a nested loop is present, you might see a nested loop alert event in STL_ALERT_EVENT_LOG. You can also identify this type of event by running the query at Identifying queries with nested loops. For more information, see Reviewing query alerts.

To fix this, review your query for cross-joins and remove them if possible. Cross-joins are joins without a join condition that result in the Cartesian product of two tables. They are typically run as nested loop joins, which are the slowest of the possible join types.

Hash join

If a hash join is present, you might see the following:

To fix this issue, you can take a couple of approaches:

  • Rewrite the query to use a merge join if possible. You can do this by specifying join columns that are both distribution keys and sort keys.

  • If the HJOIN step in SVL_QUERY_SUMMARY has a very high value in the rows field compared to the rows value in the final RETURN step in the query, check whether you can rewrite the query to join on a unique column. When a query does not join on a unique column, such as a primary key, that increases the number of rows involved in the join.

Ghost rows or uncommitted rows

If ghost rows or uncommitted rows are present, you might see an alert event in STL_ALERT_EVENT_LOG that indicates excessive ghost rows. For more information, see Reviewing query alerts.

To fix this issue, you can take a couple of approaches:

  • Check the Loads tab of your Amazon Redshift console for active load operations on any of the query tables. If you see active load operations, wait for those to complete before taking action.

  • If there are no active load operations, run VACUUM on the query tables to remove deleted rows.

Unsorted or missorted rows

If unsorted or missorted rows are present, you might see a very selective filter alert event in STL_ALERT_EVENT_LOG. For more information, see Reviewing query alerts.

You can also check to see if any of the tables in your query have large unsorted areas by running the query in Identifying tables with data skew or unsorted rows.

To fix this issue, you can take a couple of approaches:

  • Run VACUUM on the query tables to re-sort the rows.

  • Review the sort keys on the query tables to see if any improvements can be made. Remember to weigh the performance of this query against the performance of other important queries and the system overall before making any changes. For more information, see Sort keys.

Suboptimal data distribution

If data distribution is suboptimal, you might see the following:

  • A serial execution, large broadcast, or large distribution alert event appears in STL_ALERT_EVENT_LOG. For more information, see Reviewing query alerts.

  • Slices are not processing approximately the same number of rows for a given step. For more information, see Using the SVL_QUERY_REPORT view.

  • Slices are not taking approximately the same amount of time for a given step. For more information, see Using the SVL_QUERY_REPORT view.

If none of the preceding is true, you can also see if any of the tables in your query have data skew by running the query in Identifying tables with data skew or unsorted rows.

To fix this issue, review the distribution styles for the tables in the query and see if any improvements can be made. Remember to weigh the performance of this query against the performance of other important queries and the system overall before making any changes. For more information, see Data distribution for query optimization.

Insufficient memory allocated to the query

If insufficient memory is allocated to your query, you might see a step in SVL_QUERY_SUMMARY that has an is_diskbased value of true. For more information, see Using the SVL_QUERY_SUMMARY view.

To fix this issue, allocate more memory to the query by temporarily increasing the number of query slots it uses. Workload Management (WLM) reserves slots in a query queue equivalent to the concurrency level set for the queue. For example, a queue with a concurrency level of 5 has 5 slots. Memory assigned to the queue is allocated equally to each slot. Assigning several slots to one query gives that query access to the memory for all of those slots. For more information on how to temporarily increase the slots for a query, see wlm_query_slot_count.

Suboptimal WHERE clause

If your WHERE clause causes excessive table scans, you might see a SCAN step in the segment with the highest maxtime value in SVL_QUERY_SUMMARY. For more information, see Using the SVL_QUERY_SUMMARY view.

To fix this issue, add a WHERE clause to the query based on the primary sort column of the largest table. This approach helps minimize scanning time. For more information, see Amazon Redshift best practices for designing tables.

Insufficiently restrictive predicate

If your query has an insufficiently restrictive predicate, you might see a SCAN step in the segment with the highest maxtime value in SVL_QUERY_SUMMARY that has a very high rows value compared to the rows value in the final RETURN step in the query. For more information, see Using the SVL_QUERY_SUMMARY view.

To fix this issue, try adding a predicate to the query or making the existing predicate more restrictive to narrow the output.

Very large result set

If your query returns a very large result set, consider rewriting the query to use UNLOAD to write the results to Amazon S3. This approach improves the performance of the RETURN step by taking advantage of parallel processing. For more information on checking for a very large result set, see Using the SVL_QUERY_SUMMARY view.

Large SELECT list

If your query has an unusually large SELECT list, you might see a bytes value that is high relative to the rows value for any step (in comparison to other steps) in SVL_QUERY_SUMMARY. This high bytes value can be an indicator that you are selecting a lot of columns. For more information, see Using the SVL_QUERY_SUMMARY view.

To fix this issue, review the columns you are selecting and see if any can be removed.