Introduction
This Launch Checklist for BigQuery recommends activities that you should complete for launching a commercial application that uses Google BigQuery. This checklist focuses on BigQuery-specific activities. You should also use the general checklist, Launch Checklist for Google Cloud Platform, to understand the activities you should also complete that apply to all services.
This Launch Checklist for BigQuery is designed for developers who are proficient with BigQuery. If you are just starting out with BigQuery, these instructions will not teach you how to use BigQuery; instead, new users should start with the BigQuery Browser Tool Quickstart or bq Command-Line Tool Quickstart.
This checklist is broken down into the following sections:
- Architecture Design and Development
- Soft Launch
- Final Launch
The sections are presented in the order we recommend that you use them as you prepare to launch your application. For example, you should start with the Architecture Design and Development Checklist first; it contains activities that we recommend you do early in your app's development lifecycle. Similarly, the Soft Launch Checklist contains activities we recommend when you are close to launch. However, the exact timeline of the checklist activities and the time required for them depends on your app development time frame.
Architecture Design and Development Checklist
We recommend that you use this checklist in the early stages of the development of your application. You can work on the checklist activities from the groups in parallel; however, we recommend that you start the software architecture- related activities as early as possible as they require more time to complete.
| Activity |
| Community/Groups/Forums |
|
❑
Consult
Google BigQuery community support on Stack Overflow. It's a great
source of information and practical advice.
|
|
❑
Subscribe to the
Google BigQuery - Announce group.
|
| Schema Design |
|
❑
Design the schema based on the need to query it. BigQuery
supports nested and repeated (type:RECORD) fields, so leverage it where
appropriate, keeping an eye on how you would query it. Estimate querying
cost.
|
|
❑
Repeated fields often become difficult to work with and query after they get
beyond the first or second nesting level, so use them only when absolutely
necessary.
|
|
❑
De-normalize the schema as much as possible to avoid multiple joins.
|
|
❑
BigQuery does not support row-level updates, but tables can be
appended.
|
|
❑
Unlike a traditional RDBMS, there is no notion of primary/secondary or row-id
keys, so if really required, identify a column in the table schema for
that purpose.
|
| Estimate Data Volumes |
|
❑
Calculate how much data will be uploaded in the initial upload to get to
some base level.
|
|
❑
Calculate how much data will be uploaded incrementally and at what velocity
(hourly/daily/weekly).
|
|
❑
Calculate how much data (if any) expires and at what frequency.
|
| Estimate Query Processing |
|
❑
Identify the types of queries that will be executed on BigQuery
datasets every day.
|
|
❑
Decide on a table partitioning/sharding strategy. For example, if queries
issued during the day are relevant for the data collected during the day,
create one table per day. To aggregate data across multiple days, run
queries using the table wildcards.
|
|
❑
Calculate how much data (the number of queries x data processed per query)
will be processed by BigQuery every day. Note that
BigQuery charges for processing individual columns (not the
whole row), so account for it in your calculations. Also note that the
column referenced in queries invokes a full column scan.
|
| Quota Management |
|
❑
Understand default
quotas on BigQuery.
|
|
❑
Complete a quota analysis in the following areas wherever applicable:
|
|
❑
If quota is insufficient, file quota adjustments through
GWSC.
|
| Cost Analysis |
|
❑
Based on the estimated data volume and query processing, calculate the cost
of BigQuery storage and processing using the
BigQuery pricing and/or using
the pricing calculator.
|
|
❑
Consider the following cost optimization recommendations:
|
| Security |
|
❑
Keep the following in mind:
|
| Pre-processing data prior to BigQuery |
|
❑
Consider pre-processing data before loading it into BigQuery to
optimize for cost and performance. For example, you can:
|
|
❑
Pre-process (transform, de-normalize, etc.) data using Hadoop or ETL tools
or by loading raw data to BigQuery and using
BigQuery itself.
|
|
❑
Consider having multiple versions of the same dataset structured differently
for different types of queries.
|
|
❑
Since row-level updates are not supported, plan for updates/deletes to
tables, and construct a batch-oriented process to rebuild tables when updates
are needed.
|
| Query Tuning & Testing |
|
❑
Test your queries on the expected volume of data and tune them according to
the following principles:
|
| Data Visualization |
|
❑
BigQuery is a tool for querying data on large datasets. For
data visualization it has strong connectivity to third-party tools:
|
Soft Launch Checklist
Prior to your application's commercial launch, we recommend using the Soft Launch Checklist activities to test your launch readiness.
| Activity |
|
❑
If you are using a streaming API to load data, simulate a load test to
ensure data is loaded without any quota violations.
|
|
❑
If you are using batch jobs to load data, simulate a load test to ensure data
is loaded to BigQuery in a reasonable time and without quota
violations.
|
|
❑
Review your cost model against actual costs. Verify that operational costs
will be within budget. Revise your cost model. Although the Google Cloud
Platform pricing calculator enables
decent estimates, it won’t be clear until load-testing exactly how much
data is processed during a day.
|
Final Launch Checklist
Use the Final Launch Checklist shortly before and during your launch.
| Activity |
|
❑
If you have followed the checklist to this point, your application is ready
for a successful launch on Google BigQuery. Congratulations!
We recommend that you also review the Final Launch Checklist in the
Launch
Checklist for Google Cloud Platform.
|