SELECT
Retrieves rows of data from zero or more tables.
Note
This topic provides summary information for reference. Comprehensive information about
using SELECT and the SQL language is beyond the scope of this
documentation. For information about using SQL that is specific to Athena, see Considerations and limitations for SQL queries
in Amazon Athena and
Run SQL queries in Amazon Athena. For
an example of creating a database, creating a table, and running a SELECT
query on the table in Athena, see Get started.
Synopsis
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ]
[ OFFSET count [ ROW | ROWS ] ]
[ LIMIT [ count | ALL ] ]
Note
Reserved words in SQL SELECT statements must be enclosed in double quotes. For more information, see Reserved keywords to escape in SQL SELECT statements.
Parameters
- [ WITH with_query [, ....] ]
-
You can use
WITHto flatten nested queries, or to simplify subqueries.Using the
WITHclause to create recursive queries is supported starting in Athena engine version 3. The maximum recursion depth is 10.The
WITHclause precedes theSELECTlist in a query and defines one or more subqueries for use within theSELECTquery.Each subquery defines a temporary table, similar to a view definition, which you can reference in the
FROMclause. The tables are used only when the query runs.with_querysyntax is:subquery_table_name [ ( column_name [, ...] ) ] AS (subquery)Where:
-
subquery_table_nameis a unique name for a temporary table that defines the results of theWITHclause subquery. Eachsubquerymust have a table name that can be referenced in theFROMclause. -
column_name [, ...]is an optional list of output column names. The number of column names must be equal to or less than the number of columns defined bysubquery. -
subqueryis any query statement.
-
- [ ALL | DISTINCT ] select_expression
-
select_expressiondetermines the rows to be selected. Aselect_expressioncan use one of the following formats:expression [ [ AS ] column_alias ] [, ...]row_expression.* [ AS ( column_alias [, ...] ) ]relation.**-
The
expression [ [ AS ] column_alias ]syntax specifies an output column. The optional[AS] column_aliassyntax specifies a custom heading name to be used for the column in the output. -
For
row_expression.* [ AS ( column_alias [, ...] ) ],row_expressionis an arbitrary expression of data typeROW. The fields of the row define the output columns to be included in the result. -
For
relation.*, the columns ofrelationare included in the result. This syntax does not permit the use of column aliases. -
The asterisk
*specifies that all columns be included in the result set. -
In the result set, the order of columns is the same as the order of their specification by the select expression. If a select expression returns multiple columns, the column order follows the order used in the source relation or row type expression.
-
When column aliases are specified, the aliases override preexisting column or row field names. If the select expression does not have column names, zero-indexed anonymous column names (
_col0,_col1,_col2, ...) are displayed in the output. -
ALLis the default. UsingALLis treated the same as if it were omitted; all rows for all columns are selected and duplicates are kept. -
Use
DISTINCTto return only distinct values when a column contains duplicate values.
-
- FROM from_item [, ...]
-
Indicates the input to the query, where
from_itemcan be a view, a join construct, or a subquery as described below.The
from_itemcan be either:-
table_name [ [ AS ] alias [ (column_alias [, ...]) ] ]Where
table_nameis the name of the target table from which to select rows,aliasis the name to give the output of theSELECTstatement, andcolumn_aliasdefines the columns for thealiasspecified.
-OR-
-
join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]Where
join_typeis one of:-
[ INNER ] JOIN -
LEFT [ OUTER ] JOIN -
RIGHT [ OUTER ] JOIN -
FULL [ OUTER ] JOIN -
CROSS JOIN -
ON join_condition | USING (join_column [, ...])Where usingjoin_conditionallows you to specify column names for join keys in multiple tables, and usingjoin_columnrequiresjoin_columnto exist in both tables.
-
-
- [ WHERE condition ]
-
Filters results according to the
conditionyou specify, whereconditiongenerally has the following syntax.column_nameoperatorvalue[[[AND | OR]column_nameoperatorvalue] ...]The
operatorcan be one of the comparators=,>,<,>=,<=,<>,!=.The following subquery expressions can also be used in the
WHEREclause.-
[NOT] BETWEEN– Specifies a range between two integers, as in the following example. If the column data type isinteger_AANDinteger_Bvarchar, the column must be cast to integer first.SELECT DISTINCT processid FROM "webdata"."impressions" WHERE cast(processid as int) BETWEEN 1500 and 1800 ORDER BY processid -
[NOT] LIKE– Searches for the pattern specified. Use the percent sign (value%) as a wildcard character, as in the following example.SELECT * FROM "webdata"."impressions" WHERE referrer LIKE '%.org' -
[NOT] IN (– Specifies a list of possible values for a column, as in the following example.value[,value[, ...])SELECT * FROM "webdata"."impressions" WHERE referrer IN ('example.com','example.net','example.org')
-
- [ GROUP BY [ ALL | DISTINCT ] grouping_expressions [, ...] ]
-
Divides the output of the
SELECTstatement into rows with matching values.ALLandDISTINCTdetermine whether duplicate grouping sets each produce distinct output rows. If omitted,ALLis assumed.grouping_expressionsallow you to perform complex grouping operations. You can use complex grouping operations to perform analysis that requires aggregation on multiple sets of columns in a single query.The
grouping_expressionselement can be any function, such asSUM,AVG, orCOUNT, performed on input columns.GROUP BYexpressions can group output by input column names that don't appear in the output of theSELECTstatement.All output expressions must be either aggregate functions or columns present in the
GROUP BYclause.You can use a single query to perform analysis that requires aggregating multiple column sets.
Athena supports complex aggregations using
GROUPING SETS,CUBEandROLLUP.GROUP BY GROUPING SETSspecifies multiple lists of columns to group on.GROUP BY CUBEgenerates all possible grouping sets for a given set of columns.GROUP BY ROLLUPgenerates all possible subtotals for a given set of columns. Complex grouping operations do not support grouping on expressions composed of input columns. Only column names are allowed.You can often use
UNION ALLto achieve the same results as theseGROUP BYoperations, but queries that useGROUP BYhave the advantage of reading the data one time, whereasUNION ALLreads the underlying data three times and may produce inconsistent results when the data source is subject to change. - [ HAVING condition ]
-
Used with aggregate functions and the
GROUP BYclause. Controls which groups are selected, eliminating groups that don't satisfycondition. This filtering occurs after groups and aggregates are computed. - [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] union_query] ]
-
UNION,INTERSECT, andEXCEPTcombine the results of more than oneSELECTstatement into a single query.ALLorDISTINCTcontrol the uniqueness of the rows included in the final result set.UNIONcombines the rows resulting from the first query with the rows resulting from the second query. To eliminate duplicates,UNIONbuilds a hash table, which consumes memory. For better performance, consider usingUNION ALLif your query does not require the elimination of duplicates. MultipleUNIONclauses are processed left to right unless you use parentheses to explicitly define the order of processing.INTERSECTreturns only the rows that are present in the results of both the first and the second queries.EXCEPTreturns the rows from the results of the first query, excluding the rows found by the second query.ALLcauses all rows to be included, even if the rows are identical.DISTINCTcauses only unique rows to be included in the combined result set. - [ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ]
-
Sorts a result set by one or more output
expression.When the clause contains multiple expressions, the result set is sorted according to the first
expression. Then the secondexpressionis applied to rows that have matching values from the first expression, and so on.Each
expressionmay specify output columns fromSELECTor an ordinal number for an output column by position, starting at one.ORDER BYis evaluated as the last step after anyGROUP BYorHAVINGclause.ASCandDESCdetermine whether results are sorted in ascending or descending order. The default sorting order is ascending (ASC). The default null ordering isNULLS LAST, regardless of ascending or descending sort order. - [ OFFSET count [ ROW | ROWS ] ]
-
Use the
OFFSETclause to discard a number of leading rows from the result set. If theORDER BYclause is present, theOFFSETclause is evaluated over a sorted result set, and the set remains sorted after the skipped rows are discarded. If the query has noORDER BYclause, it is arbitrary which rows are discarded. If the count specified byOFFSETequals or exceeds the size of the result set, the final result is empty. - LIMIT [ count | ALL ]
-
Restricts the number of rows in the result set to
count.LIMIT ALLis the same as omitting theLIMITclause. If the query has noORDER BYclause, the results are arbitrary. - TABLESAMPLE [ BERNOULLI | SYSTEM ] (percentage)
-
Optional operator to select rows from a table based on a sampling method.
BERNOULLIselects each row to be in the table sample with a probability ofpercentage. All physical blocks of the table are scanned, and certain rows are skipped based on a comparison between the samplepercentageand a random value calculated at runtime.With
SYSTEM, the table is divided into logical segments of data, and the table is sampled at this granularity.Either all rows from a particular segment are selected, or the segment is skipped based on a comparison between the sample
percentageand a random value calculated at runtime.SYSTEMsampling is dependent on the connector. This method does not guarantee independent sampling probabilities. - [ UNNEST (array_or_map) [WITH ORDINALITY] ]
-
Expands an array or map into a relation. Arrays are expanded into a single column. Maps are expanded into two columns (key, value).
You can use
UNNESTwith multiple arguments, which are expanded into multiple columns with as many rows as the highest cardinality argument.Other columns are padded with nulls.
The
WITH ORDINALITYclause adds an ordinality column to the end.UNNESTis usually used with aJOINand can reference columns from relations on the left side of theJOIN.
Getting the file locations for source data in Amazon S3
To see the Amazon S3 file location for the data in a table row, you can use
"$path" in a SELECT query, as in the following
example:
SELECT "$path" FROM "my_database"."my_table" WHERE year=2019;
This returns a result like the following:
s3://amzn-s3-demo-bucket/datasets_mytable/year=2019/data_file1.json
To return a sorted, unique list of the S3 filename paths for the data in a table, you
can use SELECT DISTINCT and ORDER BY, as in the following
example.
SELECT DISTINCT "$path" AS data_source_file FROM sampledb.elb_logs ORDER By data_source_file ASC
To return only the filenames without the path, you can pass "$path" as a
parameter to an regexp_extract function, as in the following
example.
SELECT DISTINCT regexp_extract("$path", '[^/]+$') AS data_source_file FROM sampledb.elb_logs ORDER By data_source_file ASC
To return the data from a specific file, specify the file in the WHERE
clause, as in the following example.
SELECT *,"$path" FROM my_database.my_table WHERE "$path" = 's3://amzn-s3-demo-bucket/my_table/my_partition/file-01.csv'
For more information and examples, see the Knowledge Center article How can
I see the Amazon S3 source file for a row in an Athena table?
Note
In Athena, the Hive or Iceberg hidden metadata columns $bucket,
$file_modified_time, $file_size, and
$partition are not supported for views.
Escaping single quotes
To escape a single quote, precede it with another single quote, as in the following example. Do not confuse this with a double quote.
Select 'O''Reilly'
Results
O'Reilly
Additional resources
For more information about using SELECT statements in Athena, see the
following resources.
| For information about this | See this |
|---|---|
| Running queries in Athena | Run SQL queries in Amazon Athena |
Using SELECT to create a table |
Create a table from query results (CTAS) |
Inserting data from a SELECT query into another
table |
INSERT INTO |
Using built-in functions in SELECT statements |
Functions in Amazon Athena |
Using user defined functions in SELECT
statements |
Query with user defined functions |
| Querying Data Catalog metadata | Query the Amazon Glue Data Catalog |