Query statements scan one or more tables or expressions and return the computed result rows.
SQL Syntax
query_statement: [ WITH with_query_name AS ( select ) [, ...] ]
query_expr query_expr: { select | ( query_expr ) | query_expr set_op query_expr } [ ORDER BY expression [{ ASC | DESC }] [, ...] ] [ LIMIT count [ OFFSET skip_rows ] ] select: SELECT [{ ALL | DISTINCT }] { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...] [ FROM from_item [, ...] ] [ WHERE bool_expression ] [ GROUP BY expression [, ...] ] [ HAVING bool_expression ] [ WINDOW window_name AS ( window_definition ) [, ...] ] set_op: UNION { ALL | DISTINCT } from_item: { table_name [ [ AS ] alias ] | join | ( select ) [ [ AS ] alias ] | field_path | { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | with_query_name [ [ AS ] alias ] } join: from_item join_type JOIN from_item [ { ON bool_expression | USING ( join_column [, ...] ) } ] join_type: { [INNER] | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }
Notation:
- Square brackets "[ ]" indicate optional clauses.
- Parentheses "( )" indicate literal parentheses.
- The vertical bar "|" indicates a logical OR.
- Curly braces "{ }" enclose a set of options.
- A comma followed by an ellipsis within square brackets "[, ... ]" indicates that the preceding item can repeat in a comma-separated list.
SELECT list
Syntax:
SELECT [{ ALL | DISTINCT }] { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
[ REPLACE ( expression [ AS ] column_name [, ...] ) ]
| expression [ [ AS ] alias ] } [, ...]
The SELECT list defines the columns that the query will return. Expressions in
the SELECT list can refer to columns in any of the from_items in its
corresponding FROM clause.
Each item in the SELECT list is one of:
- *
expressionexpression.*
SELECT *
SELECT *, often referred to as select star, produces one output column for
each column that is visible after executing the full query.
SELECT * FROM (SELECT "apple" AS fruit, "carrot" AS vegetable);
+-------+-----------+
| fruit | vegetable |
+-------+-----------+
| apple | carrot |
+-------+-----------+
SELECT expression
Items in a SELECT list can be expressions. These expressions evaluate to a
single value and produce one output column, with an optional explicit alias.
If the expression does not have an explicit alias, it receives an implicit alias according to the rules for implicit aliases, if possible. Otherwise, the column is anonymous and you cannot refer to it by name elsewhere in the query.
SELECT expression.*
An item in a SELECT list can also take the form of expression.*. This
produces one output column for each column or top-level field of expression.
The expression must either be a table alias or evaluate to a single value of a
data type with fields, such as a STRUCT.
The following query produces one output column for each column in the table
groceries, aliased as g.
WITH groceries AS
(SELECT "milk" AS dairy,
"eggs" AS protein,
"bread" AS grain)
SELECT g.*
FROM groceries AS g;
+-------+---------+-------+
| dairy | protein | grain |
+-------+---------+-------+
| milk | eggs | bread |
+-------+---------+-------+
More examples:
WITH locations AS
(SELECT STRUCT("Seattle" AS city, "Washington" AS state) AS location
UNION ALL
SELECT STRUCT("Phoenix" AS city, "Arizona" AS state) AS location)
SELECT l.location.*
FROM locations l;
+---------+------------+
| city | state |
+---------+------------+
| Seattle | Washington |
| Phoenix | Arizona |
+---------+------------+
WITH locations AS
(SELECT ARRAY<STRUCT<city STRING, state STRING>>[("Seattle", "Washington"),
("Phoenix", "Arizona")] AS location)
SELECT l.LOCATION[offset(0)].*
FROM locations l;
+---------+------------+
| city | state |
+---------+------------+
| Seattle | Washington |
+---------+------------+
SELECT modifiers
You can modify the results returned from a SELECT query, as follows.
SELECT DISTINCT
A SELECT DISTINCT statement discards duplicate rows and returns only the
remaining rows. SELECT DISTINCT cannot return columns of the following types:
- STRUCT
- ARRAY
SELECT * EXCEPT
A SELECT * EXCEPT statement specifies the names of one or more columns to
exclude from the result. All matching column names are omitted from the output.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * EXCEPT (order_id)
FROM orders;
+-----------+----------+
| item_name | quantity |
+-----------+----------+
| sprocket | 200 |
+-----------+----------+
Note: SELECT * EXCEPT does not exclude columns that do not have names.
SELECT * REPLACE
A SELECT * REPLACE statement specifies one or more
expression AS identifier clauses. Each identifier must match a column name
from the SELECT * statement. In the output column list, the column that
matches the identifier in a REPLACE clause is replaced by the expression in
that REPLACE clause.
A SELECT * REPLACE statement does not change the names or order of columns.
However, it can change the value and the value type.
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE ("widget" AS item_name)
FROM orders;
+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | widget | 200 |
+----------+-----------+----------+
WITH orders AS
(SELECT 5 as order_id,
"sprocket" as item_name,
200 as quantity)
SELECT * REPLACE (quantity/2 AS quantity)
FROM orders;
+----------+-----------+----------+
| order_id | item_name | quantity |
+----------+-----------+----------+
| 5 | sprocket | 100 |
+----------+-----------+----------+
Note: SELECT * REPLACE does not replace columns that do not have names.
SELECT ALL
A SELECT ALL statement returns all rows, including duplicate rows.
SELECT ALL is the default behavior of SELECT.
Aliases
See Aliases for information on syntax and visibility for
SELECT list aliases.
Analytic functions
Clauses related to analytic functions are documented elsewhere.
-
OVERClause andPARTITION BY: See Analytic Functions. -
WINDOWClause and Window Functions: See WINDOW Clause.
FROM clause
The FROM clause indicates the table or tables from which to retrieve rows, and
specifies how to join those rows together to produce a single stream of
rows for processing in the rest of the query.
Syntax
from_item: { table_name [ [ AS ] alias ] | join | ( select ) [ [ AS ] alias ] | field_path | { UNNEST( array_expression ) | UNNEST( array_path ) | array_path } [ [ AS ] alias ] [ WITH OFFSET [ [ AS ] alias ] ] | with_query_name [ [ AS ] alias ] }
table_name
The name (optionally qualified) of an existing table.
SELECT * FROM Roster; SELECT * FROM dataset.Roster; SELECT * FROM project.dataset.Roster;
join
See JOIN Types below.
select
( select ) [ [ AS ] alias ] is a table subquery.
field_path
In the FROM clause, field_path is any path that
resolves to a field within a data type. field_path can go
arbitrarily deep into a nested data structure.
Some examples of valid field_path values include:
SELECT * FROM T1 t1, t1.array_column; SELECT * FROM T1 t1, t1.struct_column.array_field; SELECT (SELECT ARRAY_AGG(c) FROM t1.array_column c) FROM T1 t1; SELECT a.struct_field1 FROM T1 t1, t1.array_of_structs a; SELECT (SELECT STRING_AGG(a.struct_field1) FROM t1.array_of_structs a) FROM T1 t1;
Field paths in the FROM clause must end in an
array field. In
addition, field paths cannot contain arrays before the end of the path. For example, the path
array_column.some_array.some_array_field is invalid because it
contains an array before the end of the path.
Note: If a path has only one name, it is interpreted as a table.
To work around this, wrap the path using UNNEST, or use the
fully-qualified path.
UNNEST
In the FROM clause, UNNEST iterates through an ARRAY and produces one row
for each ARRAY element. The resulting column has an optional alias, which
can be used to refer to the column elsewhere in the query. You can also use
UNNEST outside the FROM clause wherever it is valid to use an expression
that returns a column.
ARRAY unnesting can be either explicit or implicit. In explicit unnesting,
array_expression must return an ARRAY value but does not need to resolve
to an ARRAY, and the UNNEST keyword is required.
Example:
SELECT * FROM UNNEST ([1, 2, 3]);
In implicit unnesting, array_path must resolve to an ARRAY and the
UNNEST keyword is optional.
Example:
SELECT x FROM mytable AS t, t.struct_typed_column.array_typed_field1 AS x;
In this scenario, array_path can go arbitrarily deep into a data
structure, but the last field must be ARRAY-typed. No previous field in the
expression can be ARRAY-typed because it is not possible to extract a named
field from an ARRAY.
UNNEST treats NULLs as follows:
- NULL and empty ARRAYs produces zero rows.
- An ARRAY containing NULLs produces rows containing NULL values.
The optional WITH OFFSET clause returns a separate
column containing the "offset" value (i.e. counting starts at zero) for each row
produced by the UNNEST operation. This column has an optional
alias; the default alias is offset.
Example:
SELECT * FROM UNNEST ( ) WITH OFFSET AS num;
with_query_name
The query names in a WITH clause (see WITH Clause) act like names of temporary tables that you
can reference anywhere in the FROM clause. In the example below,
subQ1 and subQ2 are with_query_names.
Example:
WITH subQ1 AS (SELECT * FROM Roster WHERE SchoolID = 52), subQ2 AS (SELECT SchoolID FROM subQ1) SELECT DISTINCT * FROM subQ2;
The WITH clause hides any permanent tables with the same name
for the duration of the query, unless you qualify the table name, e.g.
dataset.Roster or project.dataset.Roster.
Subqueries
A subquery is a query that appears inside another statement, and is written
inside parentheses. These are also referred to as "sub-SELECTs" or
"nested SELECTs". The full SELECT syntax is valid in
subqueries.
There are two types of subquery:
- Expression Subqueries, which you can use in a query wherever expressions are valid. Expression subqueries return a single value.
- Table subqueries, which you can use only in a
FROMclause. The outer query treats the result of the subquery as a table.
Note that there must be parentheses around both types of subqueries.
Example:
SELECT AVG ( PointsScored )
FROM
( SELECT PointsScored
FROM Stats
WHERE SchoolID = 77 )
Optionally, a table subquery can have an alias.
Example:
SELECT r.LastName
FROM
( SELECT * FROM Roster) AS r;
Aliases
See Aliases for information on syntax and visibility for
FROM clause aliases.
JOIN types
Syntax
join: from_item join_type JOIN from_item [ ON bool_expression | USING ( join_column [, ...] ) ] join_type: { [INNER] | CROSS | FULL [OUTER] | LEFT [OUTER] | RIGHT [OUTER] }
The JOIN clause merges two from_items so that the SELECT clause can
query them as one source. The join_type and ON or USING clause (a
"join condition") specify how to combine and discard rows from the two
from_items to form a single source.
All JOIN clauses require a join_type.
A JOIN clause requires a join condition unless one of the following conditions
is true:
join_typeisCROSS.- One or both of the
from_items is not a table, e.g. anarray_pathorfield_path.
[INNER] JOIN
An INNER JOIN, or simply JOIN, effectively calculates the Cartesian product
of the two from_items and discards all rows that do not meet the join
condition. "Effectively" means that it is possible to implement an INNER JOIN
without actually calculating the Cartesian product.
CROSS JOIN
CROSS JOIN returns the Cartesian product of the two from_items. In
other words, it retains all rows from both from_items and combines each
row from the first from_items with each row from the second
from_items.
Comma cross joins
CROSS JOINs can be written explicitly (see directly above) or implicitly using
a comma to separate the from_items.
Example of an implicit "comma cross join":
SELECT * FROM Roster, TeamMascot;
Here is the explicit cross join equivalent:
SELECT * FROM Roster CROSS JOIN TeamMascot;
You cannot write comma cross joins inside parentheses.
Invalid - comma cross join inside parentheses:
SELECT * FROM t CROSS JOIN (Roster, TeamMascot); // INVALID.
See Sequences of JOINs for details on how a comma cross join behaves in a sequence of JOINs.
FULL [OUTER] JOIN
A FULL OUTER JOIN (or simply FULL JOIN) returns all fields for all rows in
both from_items that meet the join condition.
FULL indicates that all rows from both from_items are
returned, even if they do not meet the join condition.
OUTER indicates that if a given row from one from_item does not
join to any row in the other from_item, the row will return with NULLs
for all columns from the other from_item.
LEFT [OUTER] JOIN
The result of a LEFT OUTER JOIN (or simply LEFT JOIN) for two
from_items always retains all rows of the left from_item in the
JOIN clause, even if no rows in the right from_item satisfy the join
predicate.
LEFT indicates that all rows from the left from_item are
returned; if a given row from the left from_item does not join to any row
in the right from_item, the row will return with NULLs for all
columns from the right from_item. Rows from the right from_item that
do not join to any row in the left from_item are discarded.
RIGHT [OUTER] JOIN
The result of a RIGHT OUTER JOIN (or simply RIGHT JOIN) is similar and
symmetric to that of LEFT OUTER JOIN.
ON clause
The ON clause contains a bool_expression. A combined row (the result of
joining two rows) meets the join condition if bool_expression returns
TRUE.
Example:
SELECT * FROM Roster INNER JOIN PlayerStats
ON Roster.LastName = PlayerStats.LastName;
USING clause
The USING clause requires a column_list of one or more columns which
occur in both input tables. It performs an equality comparison on that column,
and the rows meet the join condition if the equality comparison returns TRUE.
In most cases, a statement with the USING keyword is equivalent to using the
ON keyword. For example, the statement:
SELECT FirstName
FROM Roster INNER JOIN PlayerStats
USING (LastName);
is equivalent to:
SELECT FirstName
FROM Roster INNER JOIN PlayerStats
ON Roster.LastName = PlayerStats.LastName;
The results from queries with USING do differ from queries that use ON when
you use SELECT *. To illustrate this, consider the query:
SELECT * FROM Roster INNER JOIN PlayerStats
USING (LastName);
This statement returns the rows from Roster and PlayerStats where
Roster.LastName is the same as PlayerStats.LastName. The results include a
single LastName column.
By contrast, consider the following query:
SELECT * FROM Roster INNER JOIN PlayerStats
ON Roster.LastName = PlayerStats.LastName;
This statement returns the rows from Roster and PlayerStats where
Roster.LastName is the same as PlayerStats.LastName. The results include
two LastName columns; one from Roster and one from PlayerStats.
Sequences of JOINs
The FROM clause can contain multiple JOIN clauses in sequence.
Example:
SELECT * FROM a LEFT JOIN b ON TRUE LEFT JOIN c ON TRUE;
where a, b, and c are any from_items. JOINs are bound from left to
right, but you can insert parentheses to group them in a different order.
Consider the following queries: A (without parentheses) and B (with parentheses)
are equivalent to each other but not to C. The FULL JOIN in bold binds
first.
A.
SELECT * FROM Roster FULL JOIN TeamMascot USING (SchoolID) FULL JOIN PlayerStats USING (LastName);
B.
SELECT * FROM ( (Roster FULL JOIN TeamMascot USING (SchoolID)) FULL JOIN PlayerStats USING (LastName));
C.
SELECT * FROM (Roster FULL JOIN (TeamMascot FULL JOIN PlayerStats USING (LastName)) USING (SchoolID)) ;
When comma cross joins are present in a query with a sequence of JOINs, they
group from left to right like other JOIN types.
Example:
SELECT * FROM a JOIN b ON TRUE, b JOIN c ON TRUE;
The query above is equivalent to
SELECT * FROM ((a JOIN b ON TRUE) CROSS JOIN b) JOIN c ON TRUE);
There cannot be a RIGHT JOIN or FULL JOIN after a comma join.
Invalid - RIGHT JOIN after a comma cross join:
SELECT * FROM Roster, TeamMascot RIGHT JOIN PlayerStats ON TRUE; // INVALID.
WHERE clause
Syntax
WHERE bool_expression
The WHERE clause filters out rows by evaluating each row against
bool_expression, and discards all rows that do not return TRUE (that is,
rows that return FALSE or NULL).
Example:
SELECT * FROM Roster
WHERE SchoolID = 52;
The bool_expression can contain multiple sub-conditions.
Example:
SELECT * FROM Roster
WHERE STARTS_WITH(LastName, "Mc") OR STARTS_WITH(LastName, "Mac");
You cannot reference column aliases from the SELECT list in the WHERE
clause.
Expressions in an INNER JOIN have an equivalent expression in the
WHERE clause. For example, a query using INNER JOIN and ON has an
equivalent expression using CROSS JOIN and WHERE.
Example - this query:
SELECT * FROM Roster INNER JOIN TeamMascot
ON Roster.SchoolID = bar.SchoolID;
is equivalent to:
SELECT * FROM Roster CROSS JOIN TeamMascot
WHERE Roster.SchoolID = bar.SchoolID;
GROUP BY clause
Syntax
GROUP BY expression [, ...]
The GROUP BY clause groups together rows in a table which have the same value
for the expression in the GROUP BY clause. For multiple rows with the
same value for expression, the GROUP BY clause produces a single
combined row. GROUP BY is commonly used when aggregate functions are present
in the SELECT list, or to eliminate redundancy in the output.
Example:
SELECT SUM(PointsScored), LastName
FROM PlayerStats
GROUP BY LastName;
The GROUP BY clause can refer to expression names in the SELECT list. The
GROUP BY clause also allows expressions in the SELECT list to be referenced
ordinally, using integer values. 1 refers to the first expression in the
SELECT list, 2 the second, and so forth. Ordinals and expression names can
be mixed.
Example:
SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY LastName, FirstName;
The query above is equivalent to:
SELECT SUM(PointsScored), LastName, FirstName
FROM PlayerStats
GROUP BY 2, FirstName;
GROUP BY clauses may also refer to aliases. If a query contains aliases in
the SELECT clause, those aliases override names in the corresponding
FROM clause.
SELECT SUM(PointsScored), LastName as last_name
FROM PlayerStats
GROUP BY last_name;
HAVING clause
Syntax
HAVING bool_expression
The HAVING clause is similar to the WHERE clause: it filters out rows that
do not return TRUE when they are evaluated against the bool_expression.
As with the WHERE clause, the bool_expression can be any expression
that returns a boolean, and can contain multiple sub-conditions.
The HAVING clause differs from the WHERE clause in that:
- The
HAVINGclause requiresGROUP BYor aggregation to be present in the query. - The
HAVINGclause occurs afterGROUP BYand aggregation, and beforeORDER BY. This means that theHAVINGclause is evaluated once for every aggregated row in the result set. This differs from theWHEREclause, which is evaluated beforeGROUP BYand aggregation.
The HAVING clause can reference columns available via the FROM clause, as
well as SELECT list aliases. Expressions referenced in the HAVING clause
must either appear in the GROUP BY clause or they must be the result of an
aggregate function:
SELECT LastName
FROM Roster
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
If a query contains aliases in the SELECT clause, those aliases override names
in a FROM clause.
SELECT LastName, SUM(PointsScored) AS ps
FROM Roster
GROUP BY LastName
HAVING ps > 0;
Mandatory aggregation
Aggregation does not have to be present in the HAVING clause itself, but
aggregation must be present in at least one of the following forms:
Aggregation function in the SELECT list.
SELECT LastName, SUM(PointsScored) AS total
FROM PlayerStats
GROUP BY LastName
HAVING total > 15;
Aggregation function in the 'HAVING' clause.
SELECT LastName
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
Aggregation in both the SELECT list and HAVING clause.
When aggregation functions are present in both the SELECT list and HAVING
clause, the aggregation functions and the columns they reference do not need
to be the same. In the example below, the two aggregation functions,
COUNT() and SUM(), are different and also use different columns.
SELECT LastName, COUNT(*)
FROM PlayerStats
GROUP BY LastName
HAVING SUM(PointsScored) > 15;
ORDER BY clause
Syntax
ORDER BY expression [{ ASC | DESC }] [, ...]
The ORDER BY clause specifies a column or expression as the sort criterion for
the result set. If an ORDER BY clause is not present, the order of the results
of a query is not defined. The default sort direction is ASC, which sorts the
results in ascending order of expression values. DESC sorts the results in
descending order. Column aliases from a FROM clause or SELECT list are
allowed. If a query contains aliases in the SELECT clause, those aliases
override names in the corresponding FROM clause.
It is possible to order by multiple columns. In the example below, the result
set is ordered first by SchoolID and then by LastName:
SELECT LastName, PointsScored, OpponentID
FROM PlayerStats
ORDER BY SchoolID, LastName;
The following rules apply when ordering values:
- NULLs: In the context of the
ORDER BYclause, NULLs are the minimum possible value; that is, NULLs appear first inASCsorts and last inDESCsorts. - Floating point data types: see Floating Point Semantics on ordering and grouping.
When used in conjunction with UNION, the ORDER BY clause applies to the result set of the entire query; it does not
apply only to the closest SELECT statement. For this reason, it can be helpful
(though it is not required) to use parentheses to show the scope of the ORDER
BY.
This query without parentheses:
SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot
ORDER BY SchoolID;
is equivalent to this query with parentheses:
( SELECT * FROM Roster
UNION ALL
SELECT * FROM TeamMascot )
ORDER BY SchoolID;
but is not equivalent to this query, where the ORDER BY clause applies only to
the second SELECT statement:
SELECT * FROM Roster
UNION ALL
( SELECT * FROM TeamMascot
ORDER BY SchoolID );
You can also use integer literals as column references in ORDER BY clauses. An
integer literal becomes an ordinal (for example, counting starts at 1) into
the SELECT list.
Example - the following two queries are equivalent:
SELECT SUM(PointsScored), LastName
FROM PlayerStats
ORDER BY LastName;
SELECT SUM(PointsScored), LastName
FROM PlayerStats
ORDER BY 2;
UNION
Syntax
UNION { ALL | DISTINCT }
The UNION operator
combines the result sets of two or more input queries by pairing columns from
the result set of each query and vertically concatenating them. You
must specify ALL or DISTINCT; if you specify ALL, then all rows are
retained. If DISTINCT is specified, duplicate rows are discarded.
If a given row R appears exactly m times in the first input query and n times in the second input query (m >= 0, n >= 0):
- For
UNION ALL, R appears exactly m + n times in the result. - For
UNION DISTINCT, theDISTINCTis computed after theUNIONis computed, so R appears exactly one time. - If there are more than two input queries, the above operations generalize and the output is the same as if the inputs were combined incrementally from left to right.
The following rules apply:
- For
UNION DISTINCT, all column types must support equality comparison. - The input queries on each side of the operator must return the same number of columns.
- The operators pair the columns returned by each input query according to
the columns' positions in their respective
SELECTlists. That is, the first column in the first input query is paired with the first column in the second input query. - The result set always uses the column names from the first input query.
- The result set always uses the supertypes of input types in corresponding columns, so paired columns must also have either the same data type or a common supertype.
- You must use parentheses to separate multiple operations of different
types (
UNION ALLandUNION DISTINCTare different types). If the statement only contains multiple operation of the same type (for example,UNION ALLandUNION ALL), parentheses are not required.
Examples:
query1 UNION ALL (query2 UNION DISTINCT query3)
query1 UNION ALL query2 UNION ALL query3
Invalid:
query1 UNION ALL query2 UNION DISTINCT query3; // INVALID.
LIMIT clause and OFFSET clause
Syntax
LIMIT count [ OFFSET skip_rows ]
LIMIT specifies a non-negative count of type INT64,
and no more than count rows will be returned. LIMIT 0 returns 0 rows. If
there is a UNION
operation, LIMIT is applied after the
UNION operation
is evaluated.
OFFSET specifies a non-negative skip_rows of type
INT64, and only rows from
that offset in the table will be considered.
These clauses accept only literal or parameter values.
The rows that are returned by LIMIT and OFFSET is unspecified unless these
operators are used after ORDER BY.
WITH clause
The WITH clause contains one or more named subqueries whose output acts as a
temporary table which subsequent SELECT statements can reference in any clause
or subquery. This includes any SELECT statements on either side of a set
operator such as UNION.
Example:
WITH subQ1 AS (SELECT SchoolID FROM Roster),
subQ2 AS (SELECT OpponentID FROM PlayerStats)
SELECT * FROM subQ1
UNION ALL
SELECT * FROM subQ2;
Another useful role of the WITH clause is to break up more complex queries
into a WITH SELECT statement and WITH clauses, where the less desirable
alternative is writing nested table subqueries. If a WITH clause contains
multiple subqueries, the subquery names cannot repeat.
BigQuery supports WITH clauses in subqueries, such as table
subqueries, expression subqueries, and so on.
WITH q1 AS (my_query)
SELECT *
FROM
(WITH q2 AS (SELECT * FROM q1) SELECT * FROM q2)
The following are scoping rules for WITH clauses:
- Aliases are scoped so that the aliases introduced in a
WITHclause are visible only in the later subqueries in the sameWITHclause, and in the query under theWITHclause. - Aliases introduced in the same
WITHclause must be unique, but the same alias can be used in multipleWITHclauses in the same query. The local alias overrides any outer aliases anywhere that the local alias is visible. - Aliased subqueries in a
WITHclause can never be correlated. No columns from outside the query are visible. The only names from outside that are visible are otherWITHaliases that were introduced earlier in the sameWITHclause.
Here's an example of a statement that uses aliases in WITH subqueries:
WITH q1 AS (my_query)
SELECT *
FROM
(WITH q2 AS (SELECT * FROM q1), # q1 resolves to my_query
q3 AS (SELECT * FROM q1), # q1 resolves to my_query
q1 AS (SELECT * FROM q1), # q1 (in the query) resolves to my_query
q4 AS (SELECT * FROM q1) # q1 resolves to the WITH subquery
# on the previous line.
SELECT * FROM q1) # q1 resolves to the third inner WITH subquery.
BigQuery does not support WITH RECURSIVE.
Aliases
An alias is a temporary name given to a table, column, or expression present in
a query. You can introduce explicit aliases in the SELECT list or FROM
clause, or BigQuery will infer an implicit alias for some expressions.
Expressions with neither an explicit nor implicit alias are anonymous and the
query cannot reference them by name.
Explicit alias syntax
You can introduce explicit aliases in either the FROM clause or the SELECT
list.
In a FROM clause, you can introduce explicit aliases for any item, including
tables, arrays, subqueries, and UNNEST clauses, using [AS] alias. The AS
keyword is optional.
Example:
SELECT s.FirstName, s2.SongName
FROM Singers AS s, (SELECT * FROM Songs) AS s2;
You can introduce explicit aliases for any expression in the SELECT list using
[AS] alias. The AS keyword is optional.
Example:
SELECT s.FirstName AS name, LOWER(s.FirstName) AS lname
FROM Singers s;
Explicit alias visibility
After you introduce an explicit alias in a query, there are restrictions on where else in the query you can reference that alias. These restrictions on alias visibility are the result of BigQuery's name scoping rules.
FROM clause aliases
BigQuery processes aliases in a FROM clause from left to right,
and aliases are visible only to subsequent path expressions in a FROM
clause.
Example:
Assume the Singers table had a Concerts column of ARRAY type.
SELECT FirstName
FROM Singers AS s, s.Concerts;
Invalid:
SELECT FirstName
FROM s.Concerts, Singers AS s; // INVALID.
FROM clause aliases are not visible to subqueries in the same FROM
clause. Subqueries in a FROM clause cannot contain correlated references to
other tables in the same FROM clause.
Invalid:
SELECT FirstName
FROM Singers AS s, (SELECT (2020 - ReleaseDate) FROM s) // INVALID.
You can use any column name from a table in the FROM as an alias anywhere in
the query, with or without qualification with the table name.
Example:
SELECT FirstName, s.ReleaseDate
FROM Singers s WHERE ReleaseDate = 1975;
If the FROM clause contains an explicit alias, you must use the explicit alias
instead of the implicit alias for the remainder of the query (see
Implicit Aliases). A table alias is useful for brevity or
to eliminate ambiguity in cases such as self-joins, where the same table is
scanned multiple times during query processing.
Example:
SELECT * FROM Singers as s, Songs as s2
ORDER BY s.LastName
Invalid — ORDER BY does not use the table alias:
SELECT * FROM Singers as s, Songs as s2
ORDER BY Singers.LastName; // INVALID.
SELECT list aliases
Aliases in the SELECT list are visible only to the following clauses:
GROUP BYclauseORDER BYclauseHAVINGclause
Example:
SELECT LastName AS last, SingerID
FROM Singers
ORDER BY last;
Explicit aliases in GROUP BY, ORDER BY, and HAVING clauses
These three clauses, GROUP BY, ORDER BY, and HAVING, can refer to only the
following values:
- Tables in the
FROMclause and any of their columns. - Aliases from the
SELECTlist.
GROUP BY and ORDER BY can also refer to a third group:
- Integer literals, which refer to items in the
SELECTlist. The integer1refers to the first item in theSELECTlist,2refers to the second item, etc.
Example:
SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY 1
ORDER BY 2 DESC;
The query above is equivalent to:
SELECT SingerID AS sid, COUNT(Songid) AS s2id
FROM Songs
GROUP BY sid
ORDER BY s2id DESC;
Ambiguous aliases
BigQuery provides an error if a name is ambiguous, meaning it can resolve to more than one unique object.
Examples:
This query contains column names that conflict between tables, since both
Singers and Songs have a column named SingerID:
SELECT SingerID
FROM Singers, Songs;
This query contains aliases that are ambiguous in the GROUP BY clause because
they are duplicated in the SELECT list:
SELECT FirstName AS name, LastName AS name,
FROM Singers
GROUP BY name;
Ambiguity between a FROM clause column name and a SELECT list alias in
GROUP BY:
SELECT UPPER(LastName) AS LastName
FROM Singers
GROUP BY LastName;
The query above is ambiguous and will produce an error because LastName in the
GROUP BY clause could refer to the original column LastName in Singers, or
it could refer to the alias AS LastName, whose value is UPPER(LastName).
The same rules for ambiguity apply to path expressions. Consider the following
query where table has columns x and y, and column z is of type STRUCT
and has fields v, w, and x.
Example:
SELECT x, z AS T
FROM table T
GROUP BY T.x;
The alias T is ambiguous and will produce an error because T.x in the GROUP
BY clause could refer to either table.x or table.z.x.
A name is not ambiguous in GROUP BY, ORDER BY or HAVING if it is both
a column name and a SELECT list alias, as long as the name resolves to the
same underlying object.
Example:
SELECT LastName, BirthYear AS BirthYear
FROM Singers
GROUP BY BirthYear;
The alias BirthYear is not ambiguous because it resolves to the same
underlying column, Singers.BirthYear.
Implicit aliases
In the SELECT list, if there is an expression that does not have an explicit
alias, BigQuery assigns an implicit alias according to the following
rules. There can be multiple columns with the same alias in the SELECT list.
- For identifiers, the alias is the identifier. For example,
SELECT abcimpliesAS abc. - For path expressions, the alias is the last identifier in the path. For
example,
SELECT abc.def.ghiimpliesAS ghi. - For field access using the "dot" member field access operator, the alias is
the field name. For example,
SELECT (struct_function()).fnameimpliesAS fname.
In all other cases, there is no implicit alias, so the column is anonymous and cannot be referenced by name. The data from that column will still be returned and the displayed query results may have a generated label for that column, but the label cannot be used like an alias.
In a FROM clause, from_items are not required to have an alias. The
following rules apply:
- If there is an expression that does not have an explicit alias, BigQuery assigns an implicit alias in these cases:
- For identifiers, the alias is the identifier. For example,
FROM abcimpliesAS abc. - For path expressions, the alias is the last identifier in the path. For
example,
FROM abc.def.ghiimpliesAS ghi - The column produced using
WITH OFFSEThas the implicit aliasoffset. - Table subqueries do not have implicit aliases.
FROM UNNEST(x)does not have an implicit alias.
Appendix A: examples with sample data
Sample tables
The following three tables contain sample data about athletes, their schools, and the points they score during the season. These tables will be used to illustrate the behavior of different query clauses.
Table Roster:
| LastName | SchoolID |
|---|---|
| Adams | 50 |
| Buchanan | 52 |
| Coolidge | 52 |
| Davis | 51 |
| Eisenhower | 77 |
The Roster table includes a list of player names (LastName) and the unique ID assigned to their school (SchoolID).
Table PlayerStats:
| LastName | OpponentID | PointsScored |
|---|---|---|
| Adams | 51 | 3 |
| Buchanan | 77 | 0 |
| Coolidge | 77 | 1 |
| Adams | 52 | 4 |
| Buchanan | 50 | 13 |
The PlayerStats table includes a list of player names (LastName) and the unique ID assigned to the opponent they played in a given game (OpponentID) and the number of points scored by the athlete in that game (PointsScored).
Table TeamMascot:
| SchoolId | Mascot |
|---|---|
| 50 | Jaguars |
| 51 | Knights |
| 52 | Lakers |
| 53 | Mustangs |
The TeamMascot table includes a list of unique school IDs (SchoolID) and the mascot for that school (Mascot).
JOIN types
1) [INNER] JOIN
Example:
SELECT * FROM Roster JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
| LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
|---|---|---|---|
| Adams | 50 | 50 | Jaguars |
| Buchanan | 52 | 52 | Lakers |
| Coolidge | 52 | 52 | Lakers |
| Davis | 51 | 51 | Knights |
2) CROSS JOIN
Example:
SELECT * FROM Roster CROSS JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
| LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
|---|---|---|---|
| Adams | 50 | 50 | Jaguars |
| Adams | 50 | 51 | Knights |
| Adams | 50 | 52 | Lakers |
| Adams | 50 | 53 | Mustangs |
| Buchanan | 52 | 50 | Jaguars |
| Buchanan | 52 | 51 | Knights |
| Buchanan | 52 | 52 | Lakers |
| Buchanan | 52 | 53 | Mustangs |
| Coolidge | 52 | 50 | Jaguars |
| Coolidge | 52 | 51 | Knights |
| Coolidge | 52 | 52 | Lakers |
| Coolidge | 52 | 53 | Mustangs |
| Davis | 51 | 50 | Jaguars |
| Davis | 51 | 51 | Knights |
| Davis | 51 | 52 | Lakers |
| Davis | 51 | 53 | Mustangs |
| Eisenhower | 77 | 50 | Jaguars |
| Eisenhower | 77 | 51 | Knights |
| Eisenhower | 77 | 52 | Lakers |
| Eisenhower | 77 | 53 | Mustangs |
3) FULL [OUTER] JOIN
Example:
SELECT * FROM Roster FULL JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
| LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
|---|---|---|---|
| Adams | 50 | 50 | Jaguars |
| Buchanan | 52 | 52 | Lakers |
| Coolidge | 52 | 52 | Lakers |
| Davis | 51 | 51 | Knights |
| Eisenhower | 77 | NULL | NULL |
| NULL | NULL | 53 | Mustangs |
4) LEFT [OUTER] JOIN
Example:
SELECT * FROM Roster LEFT JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
| LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
|---|---|---|---|
| Adams | 50 | 50 | Jaguars |
| Buchanan | 52 | 52 | Lakers |
| Coolidge | 52 | 52 | Lakers |
| Davis | 51 | 51 | Knights |
| Eisenhower | 77 | NULL | NULL |
5) RIGHT [OUTER] JOIN
Example:
SELECT * FROM Roster RIGHT JOIN TeamMascot
ON Roster.SchoolID = TeamMascot.SchoolID;
Results:
| LastName | Roster.SchoolId | TeamMascot.SchoolId | Mascot |
|---|---|---|---|
| Adams | 50 | 50 | Jaguars |
| Davis | 51 | 51 | Knights |
| Coolidge | 52 | 52 | Lakers |
| Buchanan | 52 | 52 | Lakers |
| NULL | NULL | 53 | Mustangs |
GROUP BY clause
Example:
SELECT LastName, SUM(PointsScored)
FROM PlayerStats
GROUP BY LastName;
| LastName | SUM |
|---|---|
| Adams | 7 |
| Buchanan | 13 |
| Coolidge | 1 |
UNION
The UNION operator combines the result sets of two or more SELECT statements
by pairing columns from the result set of each SELECT statement and vertically
concatenating them.
Example:
SELECT Mascot AS X, SchoolID AS Y
FROM TeamMascot
UNION ALL
SELECT LastName, PointsScored
FROM PlayerStats;
Results:
| X | Y |
|---|---|
| Mustangs | 50 |
| Knights | 51 |
| Lakers | 52 |
| Mustangs | 53 |
| Adams | 3 |
| Buchanan | 0 |
| Coolidge | 1 |
| Adams | 4 |
| Buchanan | 13 |