Loading Data From Cloud Datastore

BigQuery supports loading data from Google Cloud Datastore backups. In Cloud Datastore, you can back up each entity type, also known as a kind, into a set of backup files. You can then load the information into BigQuery as a table. You can control which properties BigQuery should load by setting the projectionFields property.

If you prefer to skip the loading process, you can query the backup directly by setting it up as an external data source. For more information, see External Data Sources.

Access control

Loading data into BigQuery requires the following access levels.

Product	Access
BigQuery	`WRITE` access for the dataset that contains the destination table. For more information, see access control.
Google Cloud Datastore	`READ` access to the Cloud Datastore backup objects in Google Cloud Storage. For more information, see Access Control - Google Cloud Storage.

Data type conversion

BigQuery converts data from each entity in Cloud Datastore backup files to BigQuery's data types. The following table describes the conversion between data types.

Cloud Datastore data type	BigQuery data type
Blob	BigQuery discards these values when loading the data.
Blobstore key	STRING
Boolean	BOOLEAN
Category	STRING
Datastore key	RECORD
Date and time	TIMESTAMP
Email	STRING
Embedded entity	RECORD
Floating-point number	DOUBLE
Geographical point	RECORD [{"lat","DOUBLE"}, {"long","DOUBLE"}]
IM handle	STRING
Integer	INTEGER
Link	STRING
Phone number	STRING
Postal address	STRING
Rating	INTEGER
Short blob	BigQuery discards these values when loading the data.
String	STRING (truncated to 64 KB)
User	RECORD [{"email","STRING"} {"userid","STRING"}]

Datastore key properties

Each entity in Cloud Datastore has a unique key that contains information such as the namespace and the path. BigQuery creates a RECORD data type for the key, with nested fields for each piece of information, as described in the following table.

Key property	Description	BigQuery data type
`__key__.app`	The Cloud Datastore app name.	STRING
`__key__.id`	The entity's ID, or `null` if `__key__.name` is set.	INTEGER
`__key__.kind`	The entity's kind.	STRING
`__key__.name`	The entity's name, or `null` if `__key__.id` is set.	STRING
`__key__.namespace`	If the Cloud Datastore app uses a custom namespace, the entity's namespace. Else, the default namespace is represented by an empty string.	STRING
`__key__.path`	The flattened ancestral path of the entity, consisting of the sequence of kind-identifier pairs from the root entity to the entity itself. For example: `"Country", "USA", "PostalCode", 10011, "Route", 1234`.	STRING

Creating a Cloud Datastore backup

Create a Cloud Storage bucket:
1. In the Cloud Platform Console, go to the Cloud Storage browser.
  Go to the Cloud Storage browser
2. Click Create bucket.
3. In the Create bucket dialog, specify the following attributes:
  - A unique bucket name, subject to the bucket name requirements.
  - A storage class.
  - A location where bucket data will be stored.
4. Click Create.
Export Datastore entities to the Cloud Storage bucket:
1. Go to the Datastore Admin page in the Cloud Platform Console
  Go to the Datastore Admin page
2. Click Enable Datastore Admin if it is not currently enabled.
3. Click Open Datastore Admin.
4. Select one or more of the entity kinds that you want to export, then click Backup Entities.
5. Select Google Cloud Storage for the backup storage destination. Enter the required bucket name in the format /gs/my_bucket.
6. Click Backup Entities.

Cloud Datastore creates multiple objects in Google Cloud Storage for a backup of a single kind. The object you'll need for the next steps ends with <kind_name>.backup_info.

Loading data using the BigQuery web UI

Create a Cloud Datastore backup.
Open the BigQuery web UI.
If needed, create a new dataset.
In the navigation, hover on the dataset ID that you wish to use. Click the down arrow icon next to the ID and click Create new table.
Select Google Cloud Storage from the Location choices and enter your bucket name in the format: YOUR_BUCKET_NAME/xxxxxxxxx.kind_name.info
Select Cloud Datastore Backup from the File format choices.
Provide a table name. Click the question mark icon to see naming limitations.
Click Create Table.

BigQuery will now create a table and load the exported entity data into it. While BigQuery loads the data, a (loading) string displays after your table name in the navigation. The string disappears after the data has been fully loaded.

Loading data using the BigQuery API or command-line tool

Set the following properties to load data from the API or the command-line tool.

API

Set sourceFormat to DATASTORE_BACKUP
Set sourceUris to the full path of the Cloud Datastore backup file that ends with <kind_name>.backup_info. The full bucket name format is gs://bucket_name/<Cloud Datastore backup file>

Command-line tool

Set source_format to DATASTORE_BACKUP.
Set uri to the full path of the Cloud Datastore backup file that ends with <kind_name>.backup_info. The full bucket name format is gs://bucket_name/<Cloud Datastore backup file>

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 3.0 License, and code samples are licensed under the Apache 2.0 License. For details, see our Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated December 8, 2016.