Introduction
Wildcard tables enable you to query multiple tables using concise SQL statements. Wildcard tables are available only in standard SQL. For equivalent functionality in legacy SQL, see Table Wildcard Functions.
A wildcard table represents a union of all the tables that match the wildcard
expression. For example, the following FROM clause uses the wildcard
expression gsod* to match all tables in the noaa_gsod dataset that begin
with the string gsod.
FROM
`bigquery-public-data.noaa_gsod.gsod*`
Each row in the wildcard table contains a special column that contains the value matched by the wildcard character.
For instructions and sample queries that use wildcards tables, see Querying Multiple Tables Using a Wildcard Table.
When to use wildcard tables
Wildcard tables are useful when a dataset contains multiple, similarly named tables that have compatible schemas. Typically, such datasets contain tables that each represent data from a single day, month, or year. For example, a public dataset hosted by BigQuery, the NOAA Global Surface Summary of the Day Weather Data, contains a table for each year from 1929 through the present.
A query that scans all the table IDs from 1929 through 1940 would be very long
if you have to name all 12 tables in the FROM clause (most of the tables are
omitted in this sample):
#standardSQL
SELECT
max,
ROUND((max-32)*5/9,1) celsius,
mo,
da,
year
FROM (
SELECT
*
FROM
`bigquery-public-data.noaa_gsod.gsod1929` UNION ALL
SELECT
*
FROM
`bigquery-public-data.noaa_gsod.gsod1930` UNION ALL
SELECT
*
FROM
`bigquery-public-data.noaa_gsod.gsod1931` UNION ALL
# ... Tables omitted for brevity
SELECT
*
FROM
`bigquery-public-data.noaa_gsod.gsod1940` )
WHERE
max != 9999.9 # code for missing data
ORDER BY
max DESC
The same query using a wildcard table is much more concise:
#standardSQL SELECT max, ROUND((max-32)*5/9,1) celsius, mo, da, year FROM `bigquery-public-data.noaa_gsod.gsod19*` WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX BETWEEN '29' AND '40' ORDER BY max DESC
Wildcard table syntax
Wildcard table syntax:
SELECT FROM `<project-id>.<dataset-id>.<table-prefix>*` WHERE bool_expression
- <project-id>
- Cloud Platform project ID. Optional if you use your default project ID.
- <dataset-id>
- BigQuery dataset ID.
- <table-prefix>
- A string that is common across all tables that are matched by the wildcard character. The table prefix is optional. Omitting the table prefix matches all tables in the dataset. See Best practices for information about omitting the table prefix.
- * (wildcard character)
- The wildcard character, "*", represents one more characters of a table name. The wildcard character can appear only as the final character of a wildcard table name.
The wildcard table WHERE clause also supports the _TABLE_SUFFIX pseudo
column, which contains values matched by the wildcard character. You can use
_TABLE_SUFFIX to filter your query using a
comparison operator.
For example, the following WHERE clauses use comparison operators:
WHERE
_TABLE_SUFFIX BETWEEN '29' AND '40'
WHERE
_TABLE_SUFFIX = '1929'
WHERE
_TABLE_SUFFIX < '1941'
For more information about the _TABLE_SUFFIX pseudo column, see
Filtering selected tables using _TABLE_SUFFIX.
Enclose table names with wildcards in backticks
The wildcard table name contains the special character (*), which means that you must enclose the wildcard table name in backtick (`) characters. For example, the following query is valid because it uses backticks:
#standardSQL /* Valid standard SQL query */ SELECT max FROM `bigquery-public-data.noaa_gsod.gsod*` WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX = '1929' ORDER BY max DESC
The following query is NOT valid because it isn't properly quoted with backticks:
#standardSQL /* Syntax error: Expected end of statement but got "-" at [4:11] */ SELECT max FROM # missing backticks bigquery-public-data.noaa_gsod.gsod* WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX = '1929' ORDER BY max DESC
Quotation marks do not work:
#standardSQL /* Syntax error: Unexpected string literal: 'bigquery-public-data.noaa_gsod.gsod*' at [4:3] */ SELECT max FROM # quotes are not backticks 'bigquery-public-data.noaa_gsod.gsod*' WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX = '1929' ORDER BY max DESC
Best practices
Longer prefixes generally perform better than shorter prefixes. For example,
the following query uses a long prefix (gsod200):
#standardSQL SELECT max FROM `bigquery-public-data.noaa_gsod.gsod200*` WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX BETWEEN '0' AND '1' ORDER BY max DESC
The following query generally performs worse because it uses an empty prefix:
#standardSQL SELECT max FROM `bigquery-public-data.noaa_gsod.*` WHERE max != 9999.9 # code for missing data AND _TABLE_SUFFIX BETWEEN 'gsod2000' AND 'gsod2001' ORDER BY max DESC