This page describes the data formats that Google BigQuery supports.
Introduction
BigQuery supports these data formats:
- CSV
- JSON (newline-delimited)
- Avro
- Cloud Datastore backups
Choose a format based upon the following factors:
-
Your data's schema.
CSV, JSON, and Avro all support flat data. JSON and Avro also support data with nested and repeated fields, which are useful for expressing hierarchical data. Nested and repeated fields also reduce duplication when denormalizing the data.
-
Embedded newlines.
If your data contains embedded newlines, BigQuery can load the data much faster in JSON or Avro format.
-
External limitations.
For example, your data might come from a document store database that natively stores data in JSON format. Or, your data might come from a source that only exports in CSV format.
Data format limits
For information about quotas and limits, see Load Jobs on the Quota Policy page.
CSV format
To change how BigQuery parses CSV data, specify additional CSV properties in the configuration.load object.
| CSV data type | BigQuery property |
|---|---|
| Header rows | skipLeadingRows |
| Newline characters | allowQuotedNewlines |
| Custom field delimiters | fieldDelimiter |
| Custom null values | nullMarker |
| Trailing optional columns | allowJaggedRows |
JSON format
One JSON object, including any nested/repeated fields, must appear on each line.
The following example shows sample nested/repeated data:
{"kind": "person", "fullName": "John Doe", "age": 22, "gender": "Male", "citiesLived": [{ "place": "Seattle", "numberOfYears": 5}, {"place": "Stockholm", "numberOfYears": 6}]}
{"kind": "person", "fullName": "Jane Austen", "age": 24, "gender": "Female", "citiesLived": [{"place": "Los Angeles", "numberOfYears": 2}, {"place": "Tokyo", "numberOfYears": 2}]}
For instructions on loading a JSON file, see Loading Nested and Repeated JSON Data.
Avro format
Avro is an open source data format that bundles serialized data with the data's schema in the same file.
Compressed Avro files are not supported, but compressed data blocks are. BigQuery supports the DEFLATE codec.
BigQuery converts Avro data types to the following BigQuery data types:
| BigQuery data type | Notes | |
|---|---|---|
| null | BigQuery ignores these values | |
| boolean | BOOLEAN | |
| int | INTEGER | |
| long | INTEGER | |
| float | FLOAT | |
| double | FLOAT | |
| bytes | BYTES | |
| string | STRING | UTF-8 only |
| record | RECORD |
|
| enum | STRING |
|
| array | repeated fields | Arrays of arrays are not supported. |
| map<T> | RECORD | BigQuery converts an Avro map<T> field to a repeated RECORD that contains two fields: a key and a value. BigQuery stores the key as a STRING, and converts the value to its corresponding data type in BigQuery. |
| union |
|
|
| fixed | BYTES |
|
Cloud Datastore backups
For information about loading Cloud Datastore backup files, see Loading Data From Cloud Datastore.
What's next
- To set the source format for loading data, see Specifying the source format.
- To learn about exporting data, see Exporting Data From BigQuery.