bq is a python-based tool that accesses BigQuery from the command line.
Installation
General usage
Flag positioning
bq supports two kinds of flags—common flags and command flags—and they must be used in the order shown here:
bq --common_flags <bq_command> --command-specific_flags <command_arguments>
- Common flags are flags that apply to all commands, such as
--project_idor--apilog. Callbq --helpshortfor a list of often-used common flags. - Command-specific flags are any flags that apply to a specific command: for example,
-p,-j,-dfor thelscommand (to list projects, jobs, and datasets respectively).
Listing common flags after the command will usually fail.
Example: bq --format=json ls -d
Help and debugging
- For a full list of commands, run
bq help - For help on a specific command, run
bq help <command> - To see the actual requests sent and received, which is useful for debugging your applications, add the flag
--apilog=<file>to save a log of operations tofile. Using-instead of a file path will print the log to the console. bq works by making standard REST-based API calls, which can be useful to see. It is also useful to attach this log when reporting issues. - To help troubleshoot errors, use the
--format=prettyjsonflag when getting the job status. Using this flag outputs the error object, including thereasonproperty, which you can use to look up troubleshooting steps at troubleshooting errors. For example,bq --format=prettyjson show -j <job id>
Setting default values for common flags
The first time you run bq, you will be prompted for a default project ID. This is stored in your .bigqueryrc file.
You can update this, or add additional common flag values by modifying your
.bigqueryrc file. Add new default values, one per line,
as flagname=flagvalue. You can only set common flag defaults
in this file; command-specific flags will raise an error when trying to
run bq. This file is read every time you run bq in the single
usage mode, so changes should be updated immediately. However when you run
bq in interactive mode (bq shell)
you must restart the shell before changes will be imported.
cat ~/.bigqueryrcReturns:
project_id = 1092187650 dataset_id=mydataset
Running asynchronous operations
The bq tool runs asynchronous commands by updating status counter on the command line. If a command is going to take a very long time you can instead ask for the job ID when starting the job and return control immediately, and then later ask bq to hook into that running job. This frees up your command-line for doing other things. Note that this is useful only for potentially long-running jobs such as queries or loads.
To start an asynchronous job and ask for the ID, call your bq command with the --nosync flag, which will return the job ID. To get the job results (or print out job status), pass this job ID into the command bq wait <job_id>.
The following example starts an asynchronous load job, lists the datasets in the current database, and later makes a call to get the job results.
$ bq --nosync load mydataset.names2010 gs://bigquerybucket/yob2010.txt name:string,gender:string,count:integer Successfully started load job job_f0afc92f7a9043849e5cae7b2e7ea2f9 $ bq ls datasetId ------------- olddataset mydataset $ bq wait job_f0afc92f7a9043849e5cae7b2e7ea2f9
Running bq in interactive mode
You can run bq in an interactive mode, where you don't need to prefix the commands with "bq". To start interactive mode, call bq shell. The prompt is the ID of the default project. To exit interactive mode, type "exit".
bq shell
Welcome to BigQuery! (Type help for more information.) 10000000021> ls datasetId ------------- mydataset 10000000021> exit Goodbye.
Useful common flags
Common flags are used between bq and the command. For a full list of flags, call bq --help.
Here are some of the most useful flags:
--apilog- Turn on logging of all server requests and responses. If no string is provided (--apilog=), log to stdout; if a string is provided, instead log to that file (--apilog=filename).--format [none|json|prettyjson|csv|sparse|pretty]- The output format.
Listing projects, datasets, and tables
To list objects, use the bq ls command with the following syntax:
bq ls [<project_id>:][<dataset_id>]
<project_id> and <dataset_id> are optional if you have those values defined in your .bigqueryrc file, however values that you pass in to bq ls will override any defined values in .bigqueryrc. A project ID is not the friendly name; a project ID is usually just a string of digits.
More information:
Working with projects
List projects
To list all projects:
bq ls -p
Set a default project
When you first run bq it will ask you for a default project ID and save that in the .bigqueryrc file. You can edit this file to change the value of your default project.
See your default project
Run cat ~/.bigqueryrc to see your default values. Alternatively, when you run in interactive mode, the prompt lists the default project ID.
Working with datasets
Create a dataset
Use the bq mk command to create a dataset:
bq mk [DATASET_ID]
Where [DATASET_ID] is a valid dataset ID. A valid dataset ID:
- Contains only letters, numbers, or underscores
[a-zA-Z0-9_] - Is unique within its project
Dataset IDs are case-sensitive. For example, my_dataset and
My_Dataset are different IDs that can coexist in the same project.
Set a default dataset
You can specify a default dataset, so that you don't have to qualify your tables or other actions with a dataset ID when using your default dataset. To do so, add the following line to your .bigqueryrc file:
dataset_id=[DATASET_ID]
A quick way to do this (replace [DATASET_ID] with your dataset ID):
echo dataset_id=[DATASET_ID] >> ~/.bigqueryrc
List datasets
The syntax differs, depending on whether you have a default project or dataset defined:
# List datasets in the default project: bq ls # List datasets in another project: bq ls [PROJECT_ID]: # List datasets when you have a default dataset defined: bq ls -d OR bq ls :
Working with tables
Get table information
bq show <project_id>:<dataset_id>.<table_id>
Example
bq show publicdata:samples.shakespeare
tableId Last modified Schema
------------- ----------------- ------------------------------------
shakespeare 01 Sep 13:46:28 |- word: string (required)
|- word_count: integer (required)
|- corpus: string (required)
|- corpus_date: integer (required)
Previewing table data
bq head [-n <rows>] <project_id>:<dataset_id>.<table_id>
Example
bq head -n 10 publicdata:samples.shakespeare
+--------------+------------+--------------+-------------+ | word | word_count | corpus | corpus_date | +--------------+------------+--------------+-------------+ | brave | 6 | 1kinghenryiv | 1597 | | profession | 1 | 1kinghenryiv | 1597 | | treason | 2 | 1kinghenryiv | 1597 | | Ned | 9 | 1kinghenryiv | 1597 | | answered | 1 | 1kinghenryiv | 1597 | | Perceived | 1 | 1kinghenryiv | 1597 | | 'You | 1 | 1kinghenryiv | 1597 | | degenerate | 1 | 1kinghenryiv | 1597 | | neighbouring | 1 | 1kinghenryiv | 1597 | | grandam | 1 | 1kinghenryiv | 1597 | +--------------+------------+--------------+-------------+
Note that this operation is only intended for previewing the contents of a table and is not an efficient way to extract a large portion of a table. If not explicitly specified, the command defaults to returning 100 rows.
List tables
The syntax differs, depending on whether you have a default project or dataset defined:
# In the default project: bq ls dataset_id # In the default project and dataset: bq ls # In another project or dataset: bq ls [project_id:][dataset_id]
Create a table from a file
You can load the following file formats to create a table or append data to an existing table:
- Uncompressed CSV, JSON, or Avro files from Google Cloud Storage
- Compressed (gzipped) CSV or JSON files from Google Cloud Storage
- Cloud Datastore Backup files from Google Cloud Storage
- Uncompressed CSV, JSON, or Avro files from your disk
Schema files cannot be loaded from Google Cloud Storage. If you are creating a table using both a data file and a schema file, the schema file must be local.
You can create a new table by creating a new schema and then loading the files in separate calls, or you can combine both actions into a single call with the following syntax:
bq load <destination_table> <data_source_uri> <table_schema>
destination_table- The fully-qualified table name of the table to create, or append to if the table already exists
data_source_uri- The source CSV data file used to populate the table. Note that this can be
an uncompressed local file or, alternatively, a fully-qualified Google
Cloud Storage URI referring to an uncompressed or
gzipped file, in the format
gs://bucket/file. For example, the following are all valid file types:my_file.csv,gs://[BUCKET_NAME]/my_file.csv.gz, ormy_file.csv.gz.You can load multiple files into a table by specifying a comma-separated list of files or by using a single wildcard character (*). The wildcard character cannot be used as part of the bucket name.
table_schema- A description of the table schema to use. This can be either a local file name or a comma-separated list of column_name:datatype pairs. We will use the comma-separated list for this example. Try using the following schema descriptor for your table:
name:string,gender:string,count:integerwhere "name", "gender", and "count" are labels that are assigned to the columns in the new table.If you would like, you can also specify your schema in a separate file and provide that file as the table schema. Your schema file must contain a single array object with entries that provide the following properties:
"name": Name of the column"type": Type of data, e.g.string. For a full list of allowed data types, see BigQuery data types."mode"(optional): Whether this field can be null
A sample schema file might look like the following:
[ {"name": "name", "type": "string", "mode": "required"}, {"name": "gender", "type": "string", "mode": "nullable"}, {"name": "count", "type": "integer", "mode": "required"} ]
See the bq command-line tool quickstart for a detailed walkthrough of creating and populating a table using bq.
When using the bq load command, you can specify the following optional flags:
--source_format- Type:
string - Description: The source file type. This can be either JSON, CSV, Avro, or Cloud Datastore Backup files. The default value is CSV.
- Valid Values:
CSVNEWLINE_DELIMITED_JSONAVRODATASTORE_BACKUP
- Usage:
bq load [--source_format=NEWLINE_DELIMITED_JSON|CSV|AVRO|DATASTORE_BACKUP] <destination_table> <data_source_uri> [<table_schema>]
--field_delimiter,-F- Type:
string - Description: The character that indicates the boundary between columns in the input file. By default, this is a comma.
- Valid Values: BigQuery converts the delimiter string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (',').
- Usage:
bq load -F '|' <destination_table> <data_source_uri> [<table_schema>]
bq load --field_delimiter='|' <destination_table> <data_source_uri> [<table_schema>]
--encoding,-E- Type:
string - Description: The character encoding used by the input file.
- Valid Values:
UTF-8ISO-8859-1
- Usage:
bq load -E ISO-8859-1 <destination_table> <data_source_uri> [<table_schema>]
bq load --encoding=UTF-8 <destination_table> <data_source_uri> [<table_schema>]
--max_bad_records- Type:
integer - Description: The maximum number of bad rows to skip before the load job is aborted and no updates are performed. If this value is larger than 0, the job will succeed as long as the number of bad records do not exceed this value. This is useful if you would like to load files that may have bad records. The default value for this parameter is 0 (all rows are required to be valid).
- Valid Values: Any integer
- Usage:
bq load --max_bad_records=3 <destination_table> <data_source_uri> [<table_schema>]
--skip_leading_rows- Type:
integer - Description: Skip a certain number of top rows. This is useful for skipping header rows in your source CSV file. The default value for this parameter is 0 (all rows are considered data rows).
- Valid Values: Any integer
- Usage:
bq load --skip_leading_rows=1 <destination_table> <data_source_uri> [<table_schema>]
--[no]autodetect- Type:
boolean - Description: Enable autodetection of schema and options
for formats—such as CSV and JSON—that are not self-describing. The default
value is
--noautodetect. - Valid Values:
--autodetect,--noautodetect - Usage:
bq load --autodetect <destination_table> <data_source_uri>
Character encodings
By default, the BigQuery service expects all source data to be UTF-8 encoded. Optionally, if you have CSV files with data encoded in ISO-8859-1 format, you should explicitly specify the encoding when you import your data so that BigQuery can properly convert your data to UTF-8 during the import process. Currently, it is only possible to import data that is ISO-8859-1 or UTF-8 encoded. Keep in mind the following when you specify the character encoding of your data:
- If you don't specify an encoding, or explicitly specify that your data is UTF-8 but then provide a CSV file that is not UTF-8 encoded, BigQuery attempts to convert your CSV file to UTF-8.
Generally, your data will be imported successfully but may not match byte-for-byte what you expect. To avoid this, specify the correct encoding and try your import again.
- Delimiters must be encoded as ISO-8859-1.
Generally, it is best practice to use a standard delimiter, such as a tab, pipe, or comma.
- If BigQuery cannot convert a character, it is converted to the standard Unicode replacement character: �.
- JSON files must always be encoded in UTF-8.
Copy an existing table
To copy a table, run the bq cp command:
bq cp <source_table> <destination_table>
For example, a sample copy command might look like the following:
bq cp dataset.mytable dataset2.mynewtable
You can also copy tables across different projects by specifying the project id in the source and destination path:
bq cp 123456789123:dataset.mytable 0987654321098:dataset2.mynewtable
Note that when you perform a table copy, you must specify a unique destination table. For example, if you already have a table called mynewtable in dataset2, the above command will fail and the command line tool throws an exception. If you would like to append or overwrite an existing table, you can programmatically start a copy job that appends or overwrites an existing table using the API.
Create a table from a query
Run a query, and specify the --destination_table flag
bq query --destination_table=mydataset.happyhalloween "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"
Create an empty table
Create an empty table by using the bq mk command with the
-t flag.
bq mk --schema name:string,value:integer -t mydataset.newtable
Where mydataset is the name of an existing dataset, and
newtable is the name of the new table. This example provided
a comma-separated list of fields for the --schema value.
Alternatively, you can store your schema in a file and supply that file's name
as the --schema value.
Working with queries
Run a query
Run the following command to run a query:
bq query <query_string>
- The query string must be contained within double quote characters.
- An ending ; (semicolon) mark is not required.
- You do not need to specify the project ID or dataset ID if you have defined default values.
- You can use the common
--formatflag to specify other output formats, such as JSON or CSV. - If a query will take a long time to run, you can run it asynchronously in the command line.
- To save the results to a table, use the
--destination_tableflag.
The following examples use a public dataset of baby names that is described in the Create a new table section of the command line tool quickstart page.
bq ls -d
datasetId ------------- mydataset
bq ls :mydataset
tableId
---------------
babynames
bq query "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"
Waiting on job_a4e77f793e7b4d5bbc1fd69244d9e792 ... (0s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Zachary | 22731 | | Alfred | 20477 | | Gregory | 17179 | | Ned | 16860 | | Ulrich | 15300 | | Thomas | 14995 | +----------+-------+
You can also run batch queries by using the --batch flag.
The following example shows how to start an asynchronous batch query:
bq --nosync query --batch "SELECT name,count FROM mydataset.babynames WHERE gender = 'M' ORDER BY count DESC LIMIT 6"
Enabling standard SQL
To enable standard SQL for a query, set the --use_legacy_sql
flag to false. For example, the following query runs using
standard SQL:
bq query --use_legacy_sql=false "SELECT word FROM publicdata.samples.shakespeare"
Working with query results
Allow large results
If your query results exceed the maximum response size, you can allow large results and write the result set to a destination table. Use the --allow_large_results flag with the --destination_table flag to create a destination table to hold the large results set:
bq query --destination_table '[DATASET].[TABLE_NAME]' --allow_large_results "[QUERY]"
For more information on allowing large results, see: Returning large query results.
Flatten results
If you need to flatten the results set when you query nested and repeated data, use the
--flatten_results flag:
bq query --flatten_results "[QUERY]"
For more information on flattening results, see: FLATTEN.