BigQuery supports loading data from Google Cloud Datastore backups. In Cloud Datastore, you can back up each entity type, also known as a kind, into a set of backup files. You can then load the information into BigQuery as a table. You can control which properties BigQuery should load by setting the projectionFields property.
If you prefer to skip the loading process, you can query the backup directly by setting it up as an external data source. For more information, see External Data Sources.
Access control
Loading data into BigQuery requires the following access levels.
Product |
Access |
|---|---|
BigQuery |
|
Google Cloud Datastore |
|
Data type conversion
BigQuery converts data from each entity in Cloud Datastore backup files to BigQuery's data types. The following table describes the conversion between data types.
| Cloud Datastore data type | BigQuery data type |
|---|---|
| Blob | BigQuery discards these values when loading the data. |
| Blobstore key | STRING |
| Boolean | BOOLEAN |
| Category | STRING |
| Datastore key | RECORD |
| Date and time | TIMESTAMP |
| STRING | |
| Embedded entity | RECORD |
| Floating-point number | DOUBLE |
| Geographical point |
RECORD [{"lat","DOUBLE"},
{"long","DOUBLE"}]
|
| IM handle | STRING |
| Integer | INTEGER |
| Link | STRING |
| Phone number | STRING |
| Postal address | STRING |
| Rating | INTEGER |
| Short blob | BigQuery discards these values when loading the data. |
| String | STRING (truncated to 64 KB) |
| User |
RECORD [{"email","STRING"}
{"userid","STRING"}]
|
Datastore key properties
Each entity in Cloud Datastore has a unique key that contains information such as the namespace and the path. BigQuery creates a RECORD data type for the key, with nested fields for each piece of information, as described in the following table.
| Key property | Description | BigQuery data type |
|---|---|---|
__key__.app |
The Cloud Datastore app name. | STRING |
__key__.id |
The entity's ID, or null if __key__.name is set. |
INTEGER |
__key__.kind |
The entity's kind. | STRING |
__key__.name |
The entity's name, or null if __key__.id is set. |
STRING |
__key__.namespace |
If the Cloud Datastore app uses a custom namespace, the entity's namespace. Else, the default namespace is represented by an empty string. | STRING |
__key__.path |
The flattened ancestral path of the entity, consisting of the sequence of kind-identifier pairs from the root entity to the entity itself. For example: "Country", "USA", "PostalCode", 10011, "Route", 1234. |
STRING |
Creating a Cloud Datastore backup
-
Create a Cloud Storage bucket:
- In the Cloud Platform Console, go to the Cloud Storage browser.
- Click Create bucket.
- In the Create bucket dialog, specify the following attributes:
- A unique bucket name, subject to the bucket name requirements.
- A storage class.
- A location where bucket data will be stored.
- Click Create.
-
Export Datastore entities to the Cloud Storage bucket:
- Go to the Datastore Admin page in the Cloud Platform Console
- Click Enable Datastore Admin if it is not currently enabled.
- Click Open Datastore Admin.
- Select one or more of the entity kinds that you want to export, then click Backup Entities.
-
Select Google Cloud Storage for the backup storage destination.
Enter the required bucket name in the format
/gs/my_bucket. - Click Backup Entities.
Cloud Datastore creates multiple objects in Google Cloud Storage for a backup
of a single kind. The object you'll need for the next steps ends with
<kind_name>.backup_info.
Loading data using the BigQuery web UI
- Create a Cloud Datastore backup.
- If needed, create a new dataset.
-
In the navigation, hover on the dataset ID that you wish to use. Click the down arrow icon
next to the ID and click Create new table.
-
Select Google Cloud Storage from the Location choices and
enter your bucket name in the format:
YOUR_BUCKET_NAME/xxxxxxxxx.kind_name.info - Select Cloud Datastore Backup from the File format choices.
- Provide a table name. Click the question mark icon to see naming limitations.
-
Click Create Table.
BigQuery will now create a table and load the exported entity data into it. While BigQuery loads the data, a (loading) string displays after your table name in the navigation. The string disappears after the data has been fully loaded.
Loading data using the BigQuery API or command-line tool
Set the following properties to load data from the API or the command-line tool.
API
- Set
sourceFormattoDATASTORE_BACKUP - Set
sourceUristo the full path of the Cloud Datastore backup file that ends with<kind_name>.backup_info. The full bucket name format isgs://bucket_name/<Cloud Datastore backup file>
Command-line tool
- Set
source_formattoDATASTORE_BACKUP. - Set
urito the full path of the Cloud Datastore backup file that ends with<kind_name>.backup_info. The full bucket name format isgs://bucket_name/<Cloud Datastore backup file>