This page explains how to use the bq command-line tool to run queries, load load data, and export data.
Before you begin
Before you begin this quickstart, use the Google Cloud Platform Console to create or select a project and enable billing. You also need to install the Google Cloud SDK.
-
Sign in to your Google account.
If you don't already have one, sign up for a new account.
- Select or create a Cloud Platform project.
- Enable billing for your project.
- Install and initialize the Cloud SDK.
- BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to Enable the BigQuery API.
Examine a table
BigQuery offers a number of sample tables that you can run queries against. In this quickstart, you'll run some queries against the Shakespeare table, which contains an entry for every word in every play.
To examine the schema of a specific table, run
bq show projectId:datasetId.tableId
where the project and dataset IDs can be omitted if they are the default values
for your bq tool. The following example examines the shakespeare table in the
samples dataset:
bq show publicdata:samples.shakespeare
Output:
Table publicdata:samples.shakespeare
Last modified Schema Total Rows Total Bytes Expiration
----------------- ------------------------------------ ------------ ------------- ------------
26 Aug 14:43:49 |- word: string (required) 164656 6432064
|- word_count: integer (required)
|- corpus: string (required)
|- corpus_date: integer (required)
Run the help command
Use bq help to get detailed information about the bq command-line tool.
bq help
Include a command name to get information about a specific command. For example,
the following call to bq help retrieves information about the query
command.
bq help query
Run a query
Run a query to see how many times the substring "raisin" appears in Shakespeare's works.
To run a query, run the command bq query "sql_statement".
-
Escape any quotation marks inside the
sql_statementwith a\mark, or -
Use a different quotation mark type than the surrounding marks (
"versus').
The following example does a word count for the number of times that the
substring "raisin" appears in all of Shakespeare's works. The query shown is
case-sensitive; string comparisons are case-sensitive, unless you use the SQL
feature IGNORE CASE.
bq query "SELECT word, COUNT(word) as count FROM publicdata:samples.shakespeare WHERE word CONTAINS 'raisin' GROUP BY word"
Output:
Waiting on job_dcda37c0bbed4c669b04dfd567859b90 ... (0s) Current status: DONE +---------------+-------+ | word | count | +---------------+-------+ | Praising | 4 | | raising | 5 | | raisins | 1 | | praising | 7 | | dispraising | 2 | | dispraisingly | 1 | +---------------+-------+
If you search for a word that isn't in Shakespeare's works, no results are returned. For example, the following search for "huzzah" returns no matches.
bq query "SELECT word FROM publicdata:samples.shakespeare WHERE word = 'huzzah' IGNORE CASE"
Output:
Waiting on job_e19 ... (4s) Current status: DONE $
Create a new table
Now create your own table. Every table must live inside a dataset, which is simply a group of tables. A dataset is assigned to a single project.
Step 1: Download custom data
The custom data contains approximately 7 MB of data about popular baby names, and is provided by the US Social Security Administration.
-
Unzip the file onto your hard drive.
The zip file contains a read me file that describes the dataset schema. Learn more about the dataset.
-
Open the file named
yob2010.txtto see what it looks like. The file is a comma-separated value (CSV) file with the following three columns: name, sex (MorF), and number of children with that name. The file has no header row. -
Copy or move the
yob2010.txtfile into the directory you are using to run bq commands.
Step 2: Create a new dataset
-
Use the
bq lscommand to see whether your default project has any existing datasets.bq ls
Sample output:
datasetId ------------- olddataset
-
Run
bq lsagain to list the datasets in a specific project by including the project ID followed by a colon (:). The following example lists the datasets in thepublicdataproject.bq ls publicdata:
Output:
datasetId ----------- samples
-
Use the
bq mkcommand to create a new dataset namedbabynamesin your default project. A dataset name can be up to 1,024 characters long, and consist of A-Z, a-z, 0-9, and the underscore, but it cannot start with a number or underscore, or have spaces.bq mk babynames
Sample output:
Dataset 'myprojectid:babynames' successfully created.
-
Run
bq lsto confirm that the dataset now appears as part of the default project:bq ls
Sample output:
datasetId ------------- olddataset babynames
Step 3: Upload the table
The bq load command creates or updates a table and loads data in a single step.
-
Run the
bq loadcommand to load your source file into a new table callednames2010in thebabynamesdataset you created above. By default, this runs synchronously, and will take a few seconds to complete.bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer
The
bq loadcommand arguments:- datasetID:
babynames - tableID:
names2010 - source:
yob2010.txt - schema:
name:string,gender:string,count:integer
Sample output:
Waiting on job_4f0c0878f6184119abfdae05f5194e65 ... (35s) Current status: DONE
- datasetID:
-
Run
bq lsto confirm that the table now appears in the dataset:bq ls babynames
Output:
tableId Type ----------- ------- names2010 TABLE
-
Run
bq showto see the schema:bq show babynames.names2010
Output:
Table myprojectid:babynames.names2010 Last modified Schema Total Rows Total Bytes Expiration ----------------- ------------------- ------------ ------------- ------------ 13 Mar 15:31:00 |- name: string 34041 653855 |- gender: string |- count: integer
By default, when you load data, BigQuery expects UTF-8 encoded data. If you have
data that is in ISO-8859-1 (or Latin-1) encoding and are having problems with
your loaded data, you can tell BigQuery to treat your data as Latin-1
explicitly, using the -E flag.
For more information, see Character Encodings.
Step 4: Run queries
-
Run the following command to return the most popular girls' names:
bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'F' ORDER BY count DESC LIMIT 5"
Output:
Waiting on job_58c0f5ca52764ef1902eba611b71c651 ... (0s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Isabella | 22731 | | Sophia | 20477 | | Emma | 17179 | | Olivia | 16860 | | Ava | 15300 | +----------+-------+
-
Run the following command to see the most unusual boys' names.
The minimum count is 5 because the source data omits names
with fewer than 5 occurrences.
bq query "SELECT name,count FROM babynames.names2010 WHERE gender = 'M' ORDER BY count ASC LIMIT 5"
Output:
Waiting on job_556ba2e5aad340a7b2818c3e3280b7a3 ... (1s) Current status: DONE +----------+-------+ | name | COUNT | +----------+-------+ | Aarian | 5 | | Aaidan | 5 | | Aamarion | 5 | | Aadhavan | 5 | | Aaqib | 5 | +----------+-------+
Clean up
To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:
-
Run the
bq rmcommand to remove thebabynamesdataset. Use the-rflag to delete all tables in the dataset, include thenames2010table.bq rm -r babynames -
Confirm the delete command by typing
y.
What's next
-
To learn more about the bq command-line tool, see bq Command-Line Tool.
-
To learn more about loading data, see Loading Data Into BigQuery.
-
To learn more about querying data, see Querying Data.
-
To learn how to export data out of BigQuery, see Exporting Data From BigQuery.
-
To learn how to create an application using the Google BigQuery API, see Create A Simple Application With the API.