BigQuery supports two SQL dialects: standard SQL and legacy SQL. This topic explains the differences between the two dialects, including syntax, functions, and semantics, and gives examples of some of the highlights of standard SQL.
Comparison of legacy and standard SQL
Previously, BigQuery executed queries using a non-standard SQL dialect known as BigQuery SQL. With the launch of BigQuery 2.0, BigQuery released support for standard SQL, and renamed BigQuery SQL to legacy SQL. Standard SQL is the preferred SQL dialect for querying data stored in BigQuery.
Do I have to migrate to standard SQL?
Migration from legacy SQL to standard SQL is recommended but not required. For example, suppose that you execute many queries that use legacy SQL, but you want to take advantage of a standard SQL feature for a new query. You can create new queries using standard SQL that run alongside queries using legacy SQL.
Enabling standard SQL
You have a choice of whether to use legacy or standard SQL when you execute a query through BigQuery. See enabling standard SQL for steps to enable standard SQL in the BigQuery UI, CLI, API, or whichever interface you are using.
Advantages of standard SQL
Standard SQL is compliant with the SQL 2011 standard, and has extensions that support querying nested and repeated data. It has several advantages over legacy SQL, including:
- Composability using
WITHclauses and SQL functions - Subqueries in the
SELECTlist andWHEREclause - Correlated subqueries
ARRAYandSTRUCTdata types- Inserts, updates, and deletes
COUNT(DISTINCT <expr>)is exact and scalable, providing the accuracy ofEXACT_COUNT_DISTINCTwithout its limitations- Automatic predicate push-down through
JOINs - Complex
JOINpredicates, including arbitrary expressions
For examples that demonstrate some of these features, see Standard SQL highlights.
Type differences
Legacy SQL types have an equivalent in standard SQL and vice versa. In some cases, the type has a different name. The following table lists each legacy SQL data type and its standard SQL equivalent.
| Legacy SQL | Standard SQL | Notes |
|---|---|---|
BOOL |
BOOL |
|
INTEGER |
INT64 |
|
FLOAT |
FLOAT64 |
|
STRING |
STRING |
|
BYTES |
BYTES |
|
RECORD |
STRUCT |
|
REPEATED |
ARRAY |
|
TIMESTAMP |
TIMESTAMP |
Standard SQL has a stricter TIMESTAMP range |
DATE |
DATE |
Legacy SQL has limited support for DATE |
TIME |
TIME |
Legacy SQL has limited support for TIME |
DATETIME |
DATETIME |
Legacy SQL has limited support for DATETIME |
For more information on the standard SQL type system, see the standard SQL data types reference. For more information on data types in BigQuery, see the BigQuery data types reference.
Label differences
Legacy SQL has the concept of "labels" for fields, which are:
REQUIRED- A value stored in this field cannot beNULLNULLABLE- A value stored in this field can beNULLREPEATED- A value of this field comprises any number of elements. Elements cannot beNULL
The following table describes the availability of these concepts in standard SQL.
| Legacy SQL label | Supported in standard SQL? | Notes |
|---|---|---|
REQUIRED |
No | Convert REQUIRED fields to NULLABLE to append to tables that include them |
NULLABLE |
Yes | All standard SQL types are NULLABLE |
REPEATED |
Yes; equivalent to ARRAY |
None needed |
For more information on REPEATED field handling, see
Differences in repeated field handling.
Converting REQUIRED fields to NULLABLE
The legacy SQL concept of REQUIRED is not present in standard SQL, and a
value of any type can be NULL. To append to a table that has REQUIRED fields
using standard SQL, you must first change those fields to be NULLABLE instead.
There are multiple ways to change REQUIRED fields to NULLABLE:
BigQuery UI
Navigate to the table to which you want to append. Under the "Schema" tab,
click the arrow next to REQUIRED, then select "All REQUIRED to NULLABLE".
BigQuery API
Using the jobs API, pass the ALLOW_FIELD_RELAXATION option as part of
the configuration.query.schemaUpdateOptions list. See the
jobs API reference
for more details.
After changing all REQUIRED fields in your table to NULLABLE, you can now
append to it using standard SQL.
REPEATED fields
The legacy SQL concept of REPEATED is equivalent to the ARRAY type in
standard SQL. For example, REPEATED INTEGER is equivalent to ARRAY<INT64> in
standard SQL.
Standard SQL supports NULL array elements, but raises an error if there is a
NULL array element in the query result. If there is a NULL array column in
the query result, standard SQL stores it as an empty array.
For more information on the standard SQL type system, see the standard SQL data types reference. For more information on data types in BigQuery, see the BigQuery data types reference.
Syntax differences
Escaping reserved keywords and invalid identifiers
In legacy SQL, you escape reserved keywords and identifiers that contain
invalid characters such as a space or hyphen - using square brackets [].
In standard SQL, you escape such keywords and identifiers using backticks
`. For example:
SELECT
word,
SUM(word_count) AS word_count
FROM
`bigquery-public-data.samples.shakespeare`
WHERE word IN ('me', 'I', 'you')
GROUP BY word;
For a list of reserved keywords and what constitutes valid identifiers, see Lexical Structure.
Project-qualified table names
In legacy SQL, to query a table with a project-qualified name, you use a
semi-colon, :, as a separator. For example:
SELECT
word
FROM
[bigquery-public-data:samples.shakespeare]
LIMIT 1;
In standard SQL, you use a period, ., instead. For example:
SELECT
word
FROM
`bigquery-public-data.samples.shakespeare`
LIMIT 1;
If your project name includes a domain, such as example.com:myproject, you use
example.com:myproject as the project name, including the :.
Table decorators and wildcard functions
Standard SQL does not support the TABLE_DATE_RANGE, TABLE_DATE_RANGE_STRICT,
or TABLE_QUERY functions.
You can achieve the same semantics of TABLE_DATE_RANGE and TABLE_QUERY using
a filter on the _TABLE_SUFFIX pseudocolumn. For example, consider the
following legacy SQL query, which counts the number of rows across 2010 and 2011
in the National Oceanic and Atmospheric Administration GSOD (global summary of
the day) tables:
SELECT COUNT(*)
FROM TABLE_QUERY([bigquery-public-data:noaa_gsod],
'table_id IN ("gsod2010", "gsod2011")');
An equivalent query using standard SQL is:
SELECT COUNT(*)
FROM `bigquery-public-data.noaa_gsod.*`
WHERE _TABLE_SUFFIX IN ("gsod2010", "gsod2011");
For more information, including examples that use TABLE_DATE_RANGE, see
Migrating legacy SQL wildcard functions
Trailing commas in the SELECT list
Unlike legacy SQL, standard SQL does not permit trailing commas prior to the
FROM clause. For example, the following query is invalid:
SELECT
word,
corpus, -- Error due to trailing comma
FROM
`bigquery-public-data.samples.shakespeare`
LIMIT 1;
To correct the error, remove the comma after corpus:
SELECT
word,
corpus
FROM
`bigquery-public-data.samples.shakespeare`
LIMIT 1;
Comma operator with tables
In legacy SQL, the comma operator , has the non-standard meaning of
UNION ALL when applied to tables. In standard SQL, the comma operator has the
standard meaning of JOIN. For example, consider the following legacy SQL
query:
SELECT
x,
y
FROM
(SELECT 1 AS x, "foo" AS y),
(SELECT 2 AS x, "bar" AS y);
This is equivalent to the standard SQL query:
SELECT
x,
y
FROM
(SELECT 1 AS x, "foo" AS y UNION ALL
SELECT 2 AS x, "bar" AS y);
Note also that in standard SQL, UNION ALL associates columns by position
rather than by name. The above query is equivalent to:
SELECT
x,
y
FROM
(SELECT 1 AS x, "foo" AS y UNION ALL
SELECT 2, "bar");
A common usage of the comma operator in standard SQL is to JOIN with an array.
For example:
WITH T AS (
SELECT 0 AS x, [1, 2, 3] AS arr UNION ALL
SELECT 1, [4, 5])
SELECT
x,
y
FROM
T,
UNNEST(arr) AS y;
This returns the cross product of the table T with the elements of arr. You
can also express the query in standard SQL as:
WITH T AS (
SELECT 0 AS x, [1, 2, 3] AS arr UNION ALL
SELECT 1, [4, 5])
SELECT
x,
y
FROM
T
JOIN
UNNEST(arr) AS y;
In this query, JOIN has the same meaning as the , comma operator separating
T and UNNEST(arr) AS y in the example above it.
Logical views
You cannot query a logical view defined with legacy SQL using standard SQL and vice versa due to differences in syntax and semantics between the dialects. Instead, you would need to create a new view that uses standard SQL--possibly under a different name--to replace a view that uses legacy SQL.
As an example, suppose that view V is defined using legacy SQL as:
SELECT *, UTC_USEC_TO_DAY(timestamp_col) AS day
FROM MyTable;
Suppose that view W is defined using legacy SQL as:
SELECT user, action, day
FROM V;
Suppose that you execute the following legacy SQL query daily, but you want to migrate it to use standard SQL instead:
SELECT EXACT_COUNT_DISTINCT(user), action, day
FROM W
GROUP BY action, day;
One possible migration path is to create new views using different names. The steps involved are:
Create a view named V2 using standard SQL with the following contents:
SELET *, EXTRACT(DAY FROM timestamp_col) AS day
FROM MyTable;
Create a view named W2 using standard SQL with the following contents:
SELECT user, action, day
FROM V2;
Change your query that executes daily to use standard SQL and refer to W2
instead:
SELECT COUNT(DISTINCT user), action, day
FROM W2
GROUP BY action, day;
Another option is to delete views V and W, then recreate them using standard
SQL under the same names. With this option, you would need to migrate all of
your queries that reference V or W to use standard SQL at the same time,
however.
Function comparison
The following is a partial list of legacy SQL functions and their standard SQL equivalents.
For more information on standard SQL functions, see the Functions & Operators topic.
COUNT function comparison
Both legacy SQL and standard SQL contain a COUNT function. However, each function behaves differently, depending on the SQL dialect you use.
In legacy SQL, COUNT(DISTINCT x) returns an approximate count. In standard
SQL, it returns an exact count. For an approximate count of distinct values
that runs faster and requires fewer resources, use
APPROX_COUNT_DISTINCT.
URL function comparison
Both legacy SQL and standard SQL contain functions for parsing URLs. In legacy
SQL, these functions are HOST(url), TLD(url), and DOMAIN(url). In standard
SQL, these functions are
NET.HOST(url),
NET.PUBLIC_SUFFIX(url),
and NET.REG_DOMAIN(url).
Improvements over legacy SQL functions
- Standard SQL URL functions can parse URLs starting with "//".
- When the input is not compliant with RFC 3986 or is not a URL (for example, "mailto:?to=&subject=&body="), different rules are applied to parse the input. In particular, Standard SQL URL functions can parse non-standard inputs without "//", such as "www.google.com". For best results, it is recommended that you ensure that inputs are URLs and comply with RFC 3986.
NET.PUBLIC_SUFFIXreturns results without leading dots. For example, it returns "com" instead of ".com". This complies with the format in the public suffix list.NET.PUBLIC_SUFFIXandNET.REG_DOMAINsupport uppercase letters and internationalized domain names.TLDandDOMAINdo not support them (might return unexpected results).
Minor differences on edge cases
- If the input does not contain any suffix in the
public suffix list,
NET.PUBLIC_SUFFIXandNET.REG_DOMAINreturn NULL, whileTLDandDOMAINreturn non-NULL values as best effort guesses. - If the input contains only a public suffix without a preceding label (for
example, "http://com"),
NET.PUBLIC_SUFFIXreturns the public suffix, whileTLDreturns an empty string. Similarly,NET.REG_DOMAINreturns NULL, whileDOMAINreturns the public suffix. - For inputs with IPv6 hosts,
NET.HOSTdoes not remove brackets from the result, as specified by RFC 3986. - For inputs with IPv4 hosts,
NET.REG_DOMAINreturns NULL, whileDOMAINreturns the first 3 octets.
Examples
In the table below, gray text color indicates results that are the same between legacy and standard SQL.
| URL (description) | HOST | NET.HOST | TLD | NET.PUBLIC _SUFFIX | DOMAIN | NET.REG_DOMAIN |
|---|---|---|---|---|---|---|
| "//google.com" (starting with "//") |
NULL | "google.com" | NULL | "com" | NULL | "google.com" |
| "google.com" (non-standard; no "//") |
NULL | "google.com" | NULL | "com" | NULL | "google.com" |
| "http://user:pass@[email protected]" (non-standard with multiple "@") |
"[email protected]" | "x.com" | ".com" | "com" | "[email protected]" | "x.com" |
| "http://foo.com:1:2" (non-standard with multiple ":") |
"foo.com:1" | "foo.com" | ".com:1" | "com" | "foo.com" | "foo.com" |
| "http://x.Co.uk" (upper case letters) |
"x.Co.uk" | "x.Co.uk" | ".uk" | "Co.uk" | "Co.uk" | "x.Co.uk" |
| "http://a.b" (public suffix not found) |
"a.b" | "a.b" | ".b" | NULL | "a.b" | NULL |
| "http://com" (host contains only a public suffix) |
"com" | "com" | "" | "com" | "com" | NULL |
| "http://[::1]" (IPv6 host; no public suffix) |
"::1" | "[::1]" | "" | NULL | "::1" | NULL |
| "http://1.2.3.4" (IPv4 host; no public suffix) |
"1.2.3.4" | "1.2.3.4" | "" | NULL | "1.2.3" | NULL |
Differences in repeated field handling
A REPEATED type in legacy SQL is equivalent to an ARRAY of that type in
standard SQL. The following section discusses some of the differences in
operations on repeated fields between legacy and standard SQL.
Selecting nested repeated leaf fields
Using legacy SQL, you can "dot" into a nested repeated field without needing to consider where the repetition occurs. In standard SQL, attempting to "dot" into a nested repeated field results in an error. For example:
SELECT
repository.url,
payload.pages.page_name
FROM
`bigquery-public-data.samples.github_nested`
LIMIT 5;
Attempting to execute this query returns:
Cannot access field page_name on a value with type
ARRAY<STRUCT<action STRING, html_url STRING, page_name STRING, ...>>
To correct the error and return an array of page_names in the result, use an
ARRAY subquery instead. For example:
SELECT
repository.url,
ARRAY(SELECT page_name FROM UNNEST(payload.pages)) AS page_names
FROM
`bigquery-public-data.samples.github_nested`
LIMIT 5;
For more information on arrays and ARRAY subqueries, see the
Working with Arrays topic.
Filtering repeated fields
Using legacy SQL, you can filter repeated fields directly using a WHERE
clause. In standard SQL, you can express similar logic with a JOIN comma
operator followed by a filter. For example, consider the following legacy SQL
query:
SELECT
payload.pages.title
FROM
[bigquery-public-data:samples.github_nested]
WHERE payload.pages.page_name IN ('db_jobskill', 'Profession');
This query returns all titles of pages for which the page_name is either
db_jobskill or Profession. You can express a similar query in standard SQL
as:
SELECT
page.title
FROM
`bigquery-public-data.samples.github_nested`,
UNNEST(payload.pages) AS page
WHERE page.page_name IN ('db_jobskill', 'Profession');
One difference between the preceding legacy SQL and standard SQL queries is that
if you unset the "Flatten Results" option and execute the legacy SQL query,
payload.pages.title is REPEATED in the query result. To achieve the
same semantics in standard SQL and return an array for the title column, use
an ARRAY subquery instead:
SELECT
title
FROM (
SELECT
ARRAY(SELECT title FROM UNNEST(payload.pages)
WHERE page_name IN ('db_jobskill', 'Profession')) AS title
FROM
`bigquery-public-data.samples.github_nested`)
WHERE ARRAY_LENGTH(title) > 0;
This query creates an array of titles where the page_name is either
'db_jobskill' or 'Profession', then filters any rows where the array did not
match that condition using ARRAY_LENGTH(title) > 0.
For more information on arrays, see the Working with Arrays topic.
Structure of selected nested leaf fields
Legacy SQL preserves the structure of nested leaf fields in the SELECT list
when the "Flatten Results" option is unset, whereas standard SQL does not. For
example, consider the following legacy SQL query:
SELECT
repository.url,
repository.has_downloads
FROM
[bigquery-public-data.samples.github_nested]
LIMIT 5;
This query returns url and has_downloads within a record named repository
when "Flatten Results" is unset. Now consider the following standard SQL query:
SELECT
repository.url,
repository.has_downloads
FROM
`bigquery-public-data.samples.github_nested`
LIMIT 5;
This query returns url and has_downloads as top-level columns; they are not
part of a repository record or struct. To return them as part of a struct, use
the STRUCT operator:
SELECT
STRUCT(
repository.url,
repository.has_downloads) AS repository
FROM
`bigquery-public-data.samples.github_nested`
LIMIT 5;
Removing repetition with FLATTEN
Standard SQL does not have a FLATTEN function as in legacy SQL, but you can
achieve similar semantics using the JOIN (comma) operator. For example,
consider the following legacy SQL query:
SELECT
repository.url,
payload.pages.page_name
FROM
FLATTEN([bigquery-public-data:samples.github_nested], payload.pages.page_name)
LIMIT 5;
You can express a similar query in standard SQL as follows:
SELECT
repository.url,
page.page_name
FROM
`bigquery-public-data.samples.github_nested`,
UNNEST(payload.pages) AS page
LIMIT 5;
Or, equivalently, use JOIN rather than the comma , operator:
SELECT
repository.url,
page.page_name
FROM
`bigquery-public-data.samples.github_nested`
JOIN
UNNEST(payload.pages) AS page
LIMIT 5;
One important difference is that the legacy SQL query returns a row where
payload.pages.page_name is NULL if payload.pages is empty. The standard
SQL query, however, does not return a row if payload.pages is empty. To
achieve exactly the same semantics, use a LEFT JOIN or LEFT OUTER JOIN. For
example:
SELECT
repository.url,
page.page_name
FROM
`bigquery-public-data.samples.github_nested`
LEFT JOIN
UNNEST(payload.pages) AS page
LIMIT 5;
For more information on arrays, see the
Working with Arrays topic. For more
information on UNNEST, see the
UNNEST topic.
Filtering rows with OMIT RECORD IF
The OMIT IF clause from legacy SQL allows you to filter rows based on a
condition that can apply to repeated fields. In standard SQL, you can model
an OMIT IF clause with an EXISTS clause, IN clause, or simple filter.
For example, consider the following legacy SQL query:
SELECT
repository.url,
FROM
[bigquery-public-data:samples.github_nested]
OMIT RECORD IF
EVERY(payload.pages.page_name != 'db_jobskill'
AND payload.pages.page_name != 'Profession');
The analogous standard SQL query is:
SELECT
repository.url
FROM
`bigquery-public-data.samples.github_nested`
WHERE EXISTS (
SELECT 1 FROM UNNEST(payload.pages)
WHERE page_name = 'db_jobskill'
OR page_name = 'Profession');
Here the EXISTS clause evaluates to true if there is at least one element of
payload.pages where the page name is 'db_jobskill' or 'Profession'.
Alternatively, suppose that the legacy SQL query uses IN:
SELECT
repository.url,
FROM
[bigquery-public-data:samples.github_nested]
OMIT RECORD IF NOT
SOME(payload.pages.page_name IN ('db_jobskill', 'Profession'));
In standard SQL, you can express the query using an EXISTS clause with IN:
SELECT
repository.url
FROM
`bigquery-public-data.samples.github_nested`
WHERE EXISTS (
SELECT 1 FROM UNNEST(payload.pages)
WHERE page_name IN ('db_jobskill', 'Profession'));
Consider the following legacy SQL query that filters records with 80 or fewer pages:
SELECT
repository.url,
FROM
[bigquery-public-data:samples.github_nested]
OMIT RECORD IF
COUNT(payload.pages.page_name) <= 80;
In this case, you can use a filter with ARRAY_LENGTH in standard SQL:
SELECT
repository.url
FROM
`bigquery-public-data.samples.github_nested`
WHERE
ARRAY_LENGTH(payload.pages) > 80;
Note that the ARRAY_LENGTH function applies to the repeated payload.pages
field directly rather than the nested field payload.pages.page_name as in the
legacy SQL query.
For more information on arrays and ARRAY subqueries, see the
Working with Arrays topic.
Semantic differences
The semantics of some operations differ between legacy and standard SQL.
Automatic data type coercions
Both legacy and standard SQL support coercions (automatic conversions) between
certain data types. For example, BigQuery coerces a value of type INT64 to
FLOAT64 if the query passes it to a function that requires FLOAT64 as input.
Standard SQL does not support the following coercions that legacy SQL supports.
Instead, you must use an explicit CAST.
INT64literal toTIMESTAMP. Instead, useTIMESTAMP_MICROS(micros_value).STRINGliteral toINT64,FLOAT64, orBOOL. Instead, useCAST(str AS INT64),CAST(str AS FLOAT64), orCAST(str AS BOOL).STRINGtoBYTES. Instead, useCAST(str AS BYTES).
Runtime errors
Some functions in legacy SQL return NULL for invalid input, potentially
masking problems in queries or in data. Standard SQL is generally more strict,
and raises an error if an input is invalid.
- For all mathematical functions and operators, legacy SQL does not check for
overflows. Standard SQL adds overflow checks, and raises an error if a
computation overflows. This includes the
+,-,*operators, theSUM,AVG, andSTDDEVaggregate functions, and others. - Standard SQL raises an error upon division by zero, whereas legacy SQL returns
NULL. To returnNULLfor division by zero in standard SQL, useSAFE_DIVIDE. - Standard SQL raises an error for
CASTs where the input format is invalid or out of range for the target type, whereas legacy SQL returnsNULL. To avoid raising an error for an invalid cast in standard SQL, useSAFE_CAST.
Nested repeated results
Queries executed using standard SQL preserve any nesting and repetition of the
columns in the result, and the "Flatten Results" option has no effect. To
return top-level columns for nested fields, use the .* operator on struct
columns. For example:
SELECT
repository.*
FROM
`bigquery-public-data.samples.github_nested`
LIMIT 5;
To return top-level columns for repeated nested fields (ARRAYs of STRUCTs),
use a JOIN to take the cross product of the table's rows and the elements of
the repeated nested field. For example:
SELECT
repository.url,
page.*
FROM
`bigquery-public-data.samples.github_nested`
JOIN
UNNEST(payload.pages) AS page
LIMIT 5;
For more information on arrays and ARRAY subqueries, see the
Working with Arrays topic.
Differences in user-defined JavaScript functions
The User-Defined Functions topic documents how to use JavaScript user-defined functions with standard SQL. This section explains some of the key differences between user-defined functions in legacy and standard SQL.
Functions in the query text
With standard SQL, you use CREATE TEMPORARY FUNCTION as part of the query body
rather than specifying user-defined functions separately. Consider the following
standard SQL query:
-- Computes the harmonic mean of the elements in 'arr'.
-- The harmonic mean of x_1, x_2, ..., x_n can be expressed as:
-- n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))
CREATE TEMPORARY FUNCTION HarmonicMean(arr ARRAY<FLOAT64>)
RETURNS FLOAT64 LANGUAGE js AS """
var sum_of_reciprocals = 0;
for (var i = 0; i < arr.length; ++i) {
sum_of_reciprocals += 1 / arr[i];
}
return arr.length / sum_of_reciprocals;
""";
WITH T AS (
SELECT GENERATE_ARRAY(1.0, x * 4, x) AS arr
FROM UNNEST([1, 2, 3, 4, 5]) AS x
)
SELECT arr, HarmonicMean(arr) AS h_mean
FROM T;
This query defines a JavaScript function named HarmonicMean and then applies
it to the array column arr from T.
For more information on user-defined functions, see the User-Defined Functions topic.
Functions operate on values rather than rows
In legacy SQL, JavaScript functions operate on rows from a table. In standard SQL, as in the example above, JavaScript functions operate on values. To pass a row value to a JavaScript function using standard SQL, define a function that takes a struct of the same row type as the table. For example:
-- Takes a struct of x, y, and z and returns a struct with a new field foo.
CREATE TEMPORARY FUNCTION AddField(s STRUCT<x FLOAT64, y BOOL, z STRING>)
RETURNS STRUCT<x FLOAT64, y BOOL, z STRING, foo STRING> LANGUAGE js AS """
var new_struct = new Object();
new_struct.x = s.x;
new_struct.y = s.y;
new_struct.z = s.z;
if (s.y) {
new_struct.foo = 'bar';
} else {
new_struct.foo = 'baz';
}
return new_struct;
""";
WITH T AS (
SELECT x, MOD(off, 2) = 0 AS y, CAST(x AS STRING) AS z
FROM UNNEST([5.0, 4.0, 3.0, 2.0, 1.0]) AS x WITH OFFSET off
)
SELECT AddField(t).*
FROM T AS t;
This query defines a JavaScript function that takes a struct with the same row
type as T and creates a new struct with an additional field named foo. The
SELECT statement passes the row t as input to the function and uses .* to
return the fields of the resulting struct in the output.
Standard SQL highlights
This section discusses some of the highlights of standard SQL compared to legacy SQL.
Composability using WITH clauses
Some of the standard SQL examples on this page make use of a
WITH clause, which enables
extraction or reuse of named subqueries. For example:
WITH T AS (
SELECT x FROM UNNEST([1, 2, 3, 4]) AS x
)
SELECT x / (SELECT SUM(x) FROM T) AS weighted_x
FROM T;
This query defines a names subquery T that contains x values of 1, 2, 3,
and 4. It selects x values from T and divides them by the sum of all x
values in T. This query is equivalent to a query where the contents of T
are inline:
SELECT
x / (SELECT SUM(x)
FROM (SELECT x FROM UNNEST([1, 2, 3, 4]) AS x)) AS weighted_x
FROM (SELECT x FROM UNNEST([1, 2, 3, 4]) AS x);
As another example, consider this query, which uses multiple names subqueries:
WITH T AS (
SELECT x FROM UNNEST([1, 2, 3, 4]) AS x
),
TPlusOne AS (
SELECT x + 1 AS y
FROM T
),
TPlusOneTimesTwo AS (
SELECT y * 2 AS z
FROM TPlusOne
)
SELECT z
FROM TPlusOneTimesTwo;
This query defines a sequence of transformations of the original data, followed
by a SELECT statement over TPlusOneTimesTwo. This query is equivalent to the
following query, which inlines the computations:
SELECT (x + 1) * 2 AS z
FROM (SELECT x FROM UNNEST([1, 2, 3, 4]) AS x);
For more information, see the
WITH clause topic in the
documentation.
Composability using SQL functions
Standard SQL supports user-defined SQL functions. You can use user-defined SQL functions to define common expressions and then reference them from the query. For example:
-- Computes the harmonic mean of the elements in 'arr'.
-- The harmonic mean of x_1, x_2, ..., x_n can be expressed as:
-- n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))
CREATE TEMPORARY FUNCTION HarmonicMean(arr ARRAY<FLOAT64>) AS
(
ARRAY_LENGTH(arr) / (SELECT SUM(1 / x) FROM UNNEST(arr) AS x)
);
WITH T AS (
SELECT GENERATE_ARRAY(1.0, x * 4, x) AS arr
FROM UNNEST([1, 2, 3, 4, 5]) AS x
)
SELECT arr, HarmonicMean(arr) AS h_mean
FROM T;
This query defines a SQL function named HarmonicMean and then applies it to
the array column arr from T.
Subqueries in more places
Standard SQL supports subqueries in the SELECT list, WHERE clause, and
anywhere else in the query that expects an expression. For example, consider the
following standard SQL query that computes the fraction of warm days in Seattle
in 2015:
WITH SeattleWeather AS (
SELECT *
FROM `bigquery-public-data.noaa_gsod.gsod2015`
WHERE stn = '994014'
)
SELECT
COUNTIF(max >= 70) /
(SELECT COUNT(*) FROM SeattleWeather) AS warm_days_fraction
FROM SeattleWeather;
The Seattle weather station has an ID of '994014'. The query computes the
number of warm days based on those where the temperature reached 70 degrees
Fahrenheit, or approximately 21 degrees Celsius, divided by the total number of
recorded days for that station in 2015.
Correlated subqueries
In standard SQL, subqueries can reference correlated columns; that is, columns that originate from the outer query. For example, consider the following standard SQL query:
WITH WashingtonStations AS (
SELECT weather.stn AS station_id, ANY_VALUE(station.name) AS name
FROM `bigquery-public-data.noaa_gsod.stations` AS station
INNER JOIN `bigquery-public-data.noaa_gsod.gsod2015` AS weather
ON station.usaf = weather.stn
WHERE station.state = 'WA' AND station.usaf != '999999'
GROUP BY station_id
)
SELECT washington_stations.name,
(SELECT COUNT(*)
FROM `bigquery-public-data.noaa_gsod.gsod2015` AS weather
WHERE washington_stations.station_id = weather.stn
AND max >= 70) AS warm_days
FROM WashingtonStations AS washington_stations
ORDER BY warm_days DESC;
This query computes the names of weather stations in Washington state and the
number of days in 2015 that the temperature reached 70 degrees Fahrenheit, or
approximately 21 degrees Celsius. Notice that there is a subquery in the
SELECT list, and that the subquery references washington_stations.station_id
from the outer scope, namely FROM WashingtonStations AS washington_stations.
Arrays and structs
ARRAY and STRUCT are powerful concepts in standard SQL. As an example that
uses both, consider the following query, which computes the top two articles
for each day in the HackerNews dataset:
WITH TitlesAndScores AS (
SELECT
ARRAY_AGG(STRUCT(title, score)) AS titles,
EXTRACT(DATE FROM time_ts) AS date
FROM `bigquery-public-data.hacker_news.stories`
WHERE score IS NOT NULL AND title IS NOT NULL
GROUP BY date)
SELECT date,
ARRAY(SELECT AS STRUCT title, score
FROM UNNEST(titles)
ORDER BY score DESC
LIMIT 2)
AS top_articles
FROM TitlesAndScores
ORDER BY date DESC;
The WITH clause defines TitlesAndScores, which contains two columns. The
first is an array of structs, where one field is an article title and the second
is a score. The ARRAY_AGG expression returns an array of these structs for
each day.
The SELECT statement following the WITH clause uses an ARRAY subquery to
sort and return the top two articles within each array in accordance with the
score, then returns the results in descending order by date.
For more information on arrays and ARRAY subqueries, see the
Working with Arrays topic. See also the
references for arrays and
structs.