Before you begin
- Ensure that you are using BigQuery standard SQL. For more information, see Enabling Standard SQL.
- If you are using legacy SQL, see Table Wildcard Functions.
- Many of the examples on this page use a public dataset from the National Oceanic and Atmospheric Administration (NOAA). For more information about the data, see NOAA Global Surface Summary of the Day Weather Data.
Querying sets of tables using wildcard tables
Wildcard tables enable you to query several tables concisely.
For example, the National Oceanic and Atmospheric Administration (NOAA)
GSOD weather dataset
contains annual tables that all share the common prefix gsod followed by the
four-digit year. The tables are named gsod1929, gsod1930, gsod1931,
and so on.
To query a group of tables that share a common prefix, use the table wildcard
symbol (*) after the table prefix in your FROM statement. For example,
the following query finds the maximum temperature reported during the 1940s:
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
`bigquery-public-data.noaa_gsod.gsod194*`
WHERE
max != 9999.9 # code for missing data
ORDER BY
max DESC
Filtering selected tables using _TABLE_SUFFIX
To restrict the query so that it scans an arbitrary set of tables, use the
_TABLE_SUFFIX pseudo column in the WHERE clause. The _TABLE_SUFFIX pseudo
column contains the values matched by the table wildcard. For example, the
previous sample query that scans all tables from the 1940s uses a table
wildcard to represent the last digit of the year:
FROM
`bigquery-public-data.noaa_gsod.gsod194*`
The corresponding _TABLE_SUFFIX pseudo column contains values in the range
0 through 9, representing the tables gsod1940 through gsod1949. These
_TABLE_SUFFIX values can be used in the WHERE clause to filter for specific
tables.
For example, to filter for the maximum temperature in the years 1940 and 1944,
use the values 0 and 4 for _TABLE_SUFFIX:
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
`bigquery-public-data.noaa_gsod.gsod194*`
WHERE
max != 9999.9 # code for missing data
AND ( _TABLE_SUFFIX = '0'
OR _TABLE_SUFFIX = '4' )
ORDER BY
max DESC
Using _TABLE_SUFFIX can greatly reduce the number of bytes scanned, which
reduces the cost of running your queries.
Scanning a range of tables using _TABLE_SUFFIX
To scan a range of tables, use the _TABLE_SUFFIX pseudo column along with
the BETWEEN clause. For example, to find the maximum temperature reported in
the years between 1929 and 1935 inclusive, use the table wildcard to represent
the last two digits of the year:
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
`bigquery-public-data.noaa_gsod.gsod19*`
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN '29' and '35'
ORDER BY
max DESC
Querying all tables in a dataset
To scan all tables in a dataset, you can use an empty prefix and the table
wildcard, which means that the _TABLE_SUFFIX pseudo column contains
full table names. For example, the following FROM clause scans all tables in
the GSOD dataset:
FROM
`bigquery-public-data.noaa_gsod.*`
With an empty prefix, the _TABLE_SUFFIX pseudo column contains full table
names. For example, the following query is equivalent to the previous example
that finds the maximum temperature between the years 1929 and 1935, but uses
full table names in the WHERE clause:
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM
`bigquery-public-data.noaa_gsod.*`
WHERE
max != 9999.9 # code for missing data
AND _TABLE_SUFFIX BETWEEN 'gsod1929' and 'gsod1935'
ORDER BY
max DESC
Note, however, that longer prefixes generally perform better. For more information, see Best practices.
Migrating legacy SQL table wildcard functions
In legacy SQL, you can use the following table wildcard functions to query multiple tables.
TABLE_DATE_RANGE()andTABLE_DATE_RANGE_STRICT()TABLE_QUERY()
The TABLE_DATE_RANGE() functions
The legacy SQL TABLE_DATE_RANGE() functions work on tables that conform to a
specific naming scheme: <prefix>YYYYMMDD, where the <prefix> represents the
first part of a table name and YYYYMMDD represents the date associated with
that table's data.
For example, the following legacy SQL query finds the average temperature from a set of daily tables that contain Seattle area weather data:
# Legacy SQL query
SELECT
ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
TABLE_DATE_RANGE([mydataset.sea_weather_],
TIMESTAMP("2016-05-01"),
TIMESTAMP("2016-05-09"))
In standard SQL, an equivalent query uses a table wildcard and the BETWEEN
clause.
# Standard SQL query
SELECT
ROUND(AVG(TemperatureF),1) AS AVG_TEMP_F
FROM
`mydataset.sea_weather_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160501' AND '20160509'
The TABLE_QUERY() function
The legacy SQL TABLE_QUERY() function enables you to find table names based
on patterns. When migrating a TABLE_QUERY() function to standard SQL, which
does not support the TABLE_QUERY() function, you can instead filter using
the _TABLE_SUFFIX pseudo column. Keep the following differences in mind when
migrating:
-
In legacy SQL, you place the
TABLE_QUERY()function in theFROMclause, whereas in standard SQL, you filter using the_TABLE_SUFFIXpseudo column in theWHEREclause. -
In legacy SQL, the
TABLE_QUERY()function operates on the entire table name (ortable_id), whereas in standard SQL, the_TABLE_SUFFIXpseudo column contains part or all of the table name, depending on how you use the wildcard character.
Filter in the WHERE clause
When migrating from legacy SQL to standard SQL, move the filter to the WHERE
clause. For example, the following query finds the maximum temperatures across
all years that end in the number 0:
# Legacy SQL query
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
TABLE_QUERY([bigquery-public-data:noaa_gsod],
'REGEXP_MATCH(table_id, r"0$")')
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
ORDER BY
max DESC
In standard SQL, an equivalent query uses a table wildcard and places the
regular expression function, REGEXP_CONTAINS(), in the WHERE clause:
# Standard SQL query
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
`bigquery-public-data.noaa_gsod.gsod*`
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"0$")
ORDER BY
max DESC
Differences between table_id and _TABLE_SUFFIX
In the legacy SQL TABLE_QUERY(dataset, expr) function, the second parameter
is an expression that operates over the entire table name, using the value
table_id. When migrating to standard SQL, the filter that you create in the
WHERE clause operates on the value of _TABLE_SUFFIX, which can include part
or all of the table name, depending on your use of the wildcard character.
For example, the following legacy SQL query uses the entire table name in a
regular expression to find the maximum temperatures across all years that end
in the number 0:
# Legacy SQL query
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
TABLE_QUERY([bigquery-public-data:noaa_gsod],
'REGEXP_MATCH(table_id, r"gsod\d{3}0")')
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
ORDER BY
max DESC
In standard SQL, an equivalent query can use the entire table name or only a part of the table name. You can use an empty prefix in standard SQL so that your filter operates over the entire table name:
# Standard SQL empty prefix
FROM
`bigquery-public-data.noaa_gsod.*`
However, longer prefixes perform better than empty prefixes, so the following
example uses a longer prefix, which means that the value of _TABLE_SUFFIX
is only part of the table name.
# Standard SQL query
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
year
FROM
`bigquery-public-data.noaa_gsod.gsod*`
WHERE
max != 9999.9 # code for missing data
AND max > 100 # to improve ORDER BY performance
AND REGEXP_CONTAINS(_TABLE_SUFFIX, r"\d{3}0")
ORDER BY
max DESC
Query execution details
Schema used for query evaluation
In order to execute a standard SQL query that uses a wildcard table, BigQuery automatically infers the schema for that table. BigQuery uses the schema for the most recently created table that matches the wildcard as the schema for the wildcard table. If the schema is inconsistent across tables matched by the wildcard table, BigQuery returns an error.
Wildcard tables do not support views
The wildcard table does not support views. The query returns an error if the wildcard table matches any views in the dataset.
Next steps
- For conceptual information about standard SQL wildcard tables, see Wildcard Tables.
- For more information about standard SQL, see the standard SQL Query Reference.