Real-Time Data Analysis with Kubernetes, Redis, and BigQuery

In this tutorial, you'll perform real-time data analysis of Twitter data using a pipeline built on Google Compute Engine, Kubernetes, Redis, and BigQuery. This type of analysis has a number of useful applications, including:

Performing sentiment analysis
Identifying general patterns and trends in data
Monitoring the performance and reach of campaigns
Crafting responses in real time

The following diagram describes the architecture of the example application.

The architecture of the example application.

The example application uses a replication controller with one replicated pod, to support a Redis master that caches incoming tweets. The Redis master is then fronted by a service.

A replicated pod, defined by using a Deployment {: target='k8s' track-type='tutorial' track-name='externalLink' track-metadata-position='body'} reads Twitter's public sample stream using the Twitter streaming API, dumping tweets that match its predefined set of filters into the Redis cache. Two additional replicated pods do blocking reads on the Redis cache. When tweet data is available, these pods add the data to BigQuery in small batches using the BigQuery API.

Objectives

Download the source code.
Configure and start a Kubernetes cluster.
Create a Twitter application.
Create a Cloud Pub/Sub topic.
Start up the Kubernetes app.
Use BigQuery to query for Twitter data.

Costs

This tutorial uses several billable components of Google Cloud Platform, including:

5 Compute Engine n1-standard-1 virtual machine instances.
5 Compute Engine 10GB persistent disks.
BigQuery.

The cost of running this tutorial varies depending on run time. Use the pricing calculator to generate a cost estimate based on your projected usage.

New Cloud Platform users might be eligible for a free trial.

Before you begin

Use Linux or Mac OS X.
Install Docker.
Install the Google Cloud SDK .
Create a Twitter account.

Create and configure a new project

To work through this example, you must have a project with the required APIs enabled. In the Google Cloud Platform Console, create a new project, and then enable Google Compute Engine API. You will be prompted to enable billing if you have not previously done so.

This tutorial also uses the following APIs that were enabled by default when you created your new project:

BigQuery
Google Cloud Storage JSON API

Set up the Google Cloud SDK

Authenticate using your Google account:
```
gcloud auth login
```
Set the default project for the Cloud SDK to the project you selected in the previous section of this tutorial. Replace PROJECT_ID with your ID:
```
gcloud config set project [PROJECT_ID]
```
Update the components:
```
gcloud components update
```
Install the kubectl binary:
```
gcloud components update kubectl
```

Add the directory to kubectl to your path:

export PATH=$PATH:/usr/local/share/google/google-cloud-sdk/bin/

Download the example code

You can get the example code either of two ways:

Download the zip archive. Unzip the code into a directory named kube-redis-bq.

Clone the Github repository. In a terminal window, run the following command:

git clone https://github.com/GoogleCloudPlatform/kubernetes-bigquery-python.git kube-redis-bq

Download Kubernetes

Kubernetes is an open source orchestration system for Docker containers. It schedules containers onto the instances in your Compute Engine cluster, manages workloads to ensure that their state matches your declared intentions, and organizes containers by label and by task type for easy management and discovery.

Download the latest Kubernetes binary release.
Unpack the file into the same parent directory from where you installed the example code. Note that the tar file will unpack into a directory named kubernetes, so you don't need to create a new directory. For example, enter:
```
tar -zxvf kubernetes.tar.gz
```

Creating your BigQuery table

Create a BigQuery table to store your tweets. BigQuery groups tables into abstraction layers called datasets. Use the bq command-line tool, which is included with the Cloud SDK, to create a new BigQuery dataset named rtda.

In a terminal window, enter the following command:
```
bq mk rtda
```
Now that you have a dataset in place, create a new table named tweets to contain the incoming tweets. Each BigQuery table must be defined by a schema. To save you time, this example provides a predefined schema, schema.json, that you can use to define your table. To create the table by using the predefined schema, enter the following command:
```
bq mk -t rtda.tweets kube-redis-bq/bigquery-setup/schema.json
```
To verify that your new dataset and table have been created, use the BigQuery web UI. You should see the dataset name in the left sidebar. If it isn't there, make sure you're looking at the correct project. When you click the arrow next to the dataset name, you should see your new table. Alternatively, you can view all datasets in a project or all tables in a given dataset by running the following commands:
```
bq ls [PROJECT_ID]:
```
Note: Project IDs must be followed by a trailing colon.
```
bq ls [DATASET_ID]
```
In this case, the dataset ID is rtda.
Finally, edit kube-redis-bq/redis/bigquery-controller.yaml. Update the value for the following fields to reflect your BigQuery configuration:
- PROJECT_ID: Your project ID.
- BQ_DATASET: The BigQuery dataset containing your table (rtda).
- BQ_TABLE: The BigQuery table you just created (tweets).

redis/bigquery-controller.yaml

View on GitHub

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: bigquery-controller
  labels:
    name: bigquery-controller
spec:
  replicas: 2
  template:
    metadata:
      labels:
        name: bigquery-controller
    spec:
      containers:
      - name: bigquery
        image: gcr.io/google-samples/redis-bq-pipe:v3
        env:
        - name: PROCESSINGSCRIPT
          value: redis-to-bigquery
        - name: REDISLIST
          value: twitter-stream
        # Change this to your project ID.
        - name: PROJECT_ID
          value: xxxx
        # Change the following two settings to your dataset and table.
        - name: BQ_DATASET
          value: xxxx
        - name: BQ_TABLE
          value: xxxx

Configuring and starting a Kubernetes cluster

Kubernetes is portable—it uses a set of shell scripts for starting up, shutting down, and managing configuration—so no installation is necessary.

Edit kubernetes/cluster/gce/config-common.sh.
Add the bigquery scope to the file's NODE_SCOPES definition.

This setting allows your nodes to write to your BigQuery table. Save the configuration and close your file.

(Optional) Update the startup scripts for long-term Redis reliability

You can improve the long-term reliability of Redis on Kubernetes by adding some suggested optimizations to the nodes' startup scripts. This step is helpful if you want your Redis master to run for a long period of time. If you don't intend to have your cluster up for very long, you can skip this step.

Edit kubernetes/cluster/gce/configure-vm.sh and scroll to the bottom of the file. Above the following if statement, after the row of # symbols:
```
if [[ -z "${is_push}" ]]; then
```

Insert the following commands:

/sbin/sysctl vm.overcommit_memory=1
echo never > /sys/kernel/mm/transparent_hugepage/enabled

These commands will set your Linux kernel to always overcommit memory and will disable transparent huge pages in the kernel. Save and close the file.

Start up your cluster

Now you are ready to start up your cluster.

Enter the following command:
```
kubernetes/cluster/kube-up.sh
```
Starting the cluster can take some time. During the startup process, you might be prompted to create a new SSH key for Compute Engine or, if you've already done so previously, to enter your SSH key passphrase.
After your cluster has started, enter the following commands to see your running instances:
```
kubectl get nodes
```
```
kubectl cluster-info
```

You should see one Kubernetes master instance and four Kubernetes nodes.

Creating a Twitter application

To receive tweets from Twitter, you need to create a Twitter application and add its key/secret and token/secret values to the specification for your Twitter-to-Redis Kubernetes pod. In all, you will copy four values. Follow these steps:

Create a new Twitter application.
In the Twitter Application Management page, navigate to the Keys and Access Tokens tab.
Click the Create my access token button to create a new access token.
Edit kube-redis-bq/redis/twitter-stream.yaml.
Replace the following values with your consumer key and consumer secret:
- CONSUMERKEY
- CONSUMERSECRET
Replace the following values with your access token and access token secret:
- ACCESSTOKEN
- ACCESSTOKENSEC

The example application reads from a random sample of Twitter's public stream by default. To filter on a set of keywords instead:

Edit kube-redis-bq/twitter-stream.yaml and change the value of TWSTREAMMODE to filter.
Edit twitter-to-redis.py and replace the keywords defined in the track variable with the keywords you want to filter on.
Rebuild the container image. For instructions about how to build the container image, see the appendix.

Understanding the Docker image for your Kubernetes pods

This example uses three different Kubernetes pod templates to construct the analysis pipeline. One is a Redis pod, and the other two types of pods run the example application's scripts.

The specification files point to a pre-built Docker image; you don't need to do anything special to use it. If you'd like to build the image yourself, follow the steps in the appendix.

The Docker image contains two main scripts that perform the work for this solution.

The code in twitter-to-redis.py streams incoming tweets from Twitter to Redis.

redis/redis-pipe-image/twitter-to-redis.py

View on GitHub

#!/usr/bin/env python
# Copyright 2014 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""This script uses the Twitter Streaming API, via the tweepy library,
to pull in tweets and store them in a Redis server.
"""

import datetime
import os

import redis
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener

# Get your twitter credentials from the environment variables.
# These are set in the 'twitter-stream.json' manifest file.
consumer_key = os.environ['CONSUMERKEY']
consumer_secret = os.environ['CONSUMERSECRET']
access_token = os.environ['ACCESSTOKEN']
access_token_secret = os.environ['ACCESSTOKENSEC']

# Get info on the Redis host and port from the environment variables.
# The name of this variable comes from the redis service id, 'redismaster'.
REDIS_HOST = os.environ['REDISMASTER_SERVICE_HOST']
REDIS_PORT = os.environ['REDISMASTER_SERVICE_PORT']
REDIS_LIST = os.environ['REDISLIST']


class StdOutListener(StreamListener):
    """A listener handles tweets that are received from the stream.
    This listener dumps the tweets into Redis.
    """

    count = 0
    redis_errors = 0
    allowed_redis_errors = 3
    twstring = ''
    tweets = []
    r = redis.StrictRedis(host=REDIS_HOST, port=REDIS_PORT, db=0)
    total_tweets = 10000000

    def write_to_redis(self, tw):
        try:
            self.r.lpush(REDIS_LIST, tw)
        except:
            print 'Problem adding data to Redis.'
            self.redis_errors += 1

    def on_data(self, data):
        """What to do when tweet data is received."""
        self.write_to_redis(data)
        self.count += 1
        # if we've grabbed more than total_tweets tweets, exit the script.
        # If this script is being run in the context of a kubernetes
        # replicationController, the pod will be restarted fresh when
        # that happens.
        if self.count > self.total_tweets:
            return False
        if self.redis_errors > self.allowed_redis_errors:
            print 'too many redis errors.'
            return False
        if (self.count % 1000) == 0:
            print 'count is: %s at %s' % (self.count, datetime.datetime.now())
        return True

    def on_error(self, status):
        print status


if __name__ == '__main__':
    print '....'
    listener = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    print 'stream mode is: %s' % os.environ['TWSTREAMMODE']

    stream = Stream(auth, listener)
    # set up the streaming depending upon whether our mode is 'sample', which
    # will sample the twitter public stream. If not 'sample', instead track
    # the given set of keywords.
    # This environment var is set in the 'twitter-stream.yaml' file.
    if os.environ['TWSTREAMMODE'] == 'sample':
        stream.sample()
    else:
        stream.filter(
                track=['bigdata', 'kubernetes', 'bigquery', 'docker', 'google',
                       'googlecloud', 'golang', 'dataflow',
                       'containers', 'appengine', 'gcp', 'compute',
                       'scalability', 'gigaom', 'news', 'tech', 'apple',
                       'amazon', 'cluster', 'distributed', 'computing',
                       'cloud', 'android', 'mobile', 'ios', 'iphone',
                       'python', 'recode', 'techcrunch', 'timoreilly']
                )

The code in redis-to-bigquery.py streams cached data from Redis to BigQuery.

redis/redis-pipe-image/redis-to-bigquery.py

View on GitHub

#!/usr/bin/env python
# Copyright 2014 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""This script grabs tweets from a redis server, and stores them in BiqQuery
using the BigQuery Streaming API.
"""
import datetime
import json
import os

import redis

import utils

# Get info on the Redis host and port from the environment variables.
# The name of this variable comes from the redis service id, 'redismaster'.
REDIS_HOST = os.environ['REDISMASTER_SERVICE_HOST']
REDIS_PORT = os.environ['REDISMASTER_SERVICE_PORT']
REDIS_LIST = os.environ['REDISLIST']

r = redis.StrictRedis(host=REDIS_HOST, port=REDIS_PORT, db=0)

# Get the project ID from the environment variable set in
# the 'bigquery-controller.yaml' manifest.
PROJECT_ID = os.environ['PROJECT_ID']


def write_to_bq(bigquery):
    """Write the data to BigQuery in small chunks."""
    tweets = []
    CHUNK = 50  # The size of the BigQuery insertion batch.
    tweet = None
    mtweet = None
    count = 0
    count_max = 50000
    redis_errors = 0
    allowed_redis_errors = 3
    while count < count_max:
        while len(tweets) < CHUNK:
            # We'll use a blocking list pop -- it returns when there is
            # new data.
            res = None
            try:
                res = r.brpop(REDIS_LIST)
            except:
                print 'Problem getting data from Redis.'
                redis_errors += 1
                if redis_errors > allowed_redis_errors:
                    print "Too many redis errors: exiting."
                    return
                continue
            try:
                tweet = json.loads(res[1])
            except Exception, e:
                print e
                if redis_errors > allowed_redis_errors:
                    print "Too many redis errors: exiting."
                    return
                continue
            # First do some massaging of the raw data
            mtweet = utils.cleanup(tweet)
            # We only want to write tweets to BigQuery; we'll skip 'delete' and
            # 'limit' information.
            if 'delete' in mtweet:
                continue
            if 'limit' in mtweet:
                continue
            tweets.append(mtweet)
        # try to insert the tweets into bigquery
        response = utils.bq_data_insert(bigquery, PROJECT_ID, os.environ['BQ_DATASET'],
                             os.environ['BQ_TABLE'], tweets)
        tweets = []
        count += 1
        if count % 25 == 0:
            print ("processing count: %s of %s at %s: %s" %
                   (count, count_max, datetime.datetime.now(), response))


if __name__ == '__main__':
    print "starting write to BigQuery...."
    bigquery = utils.create_bigquery_client()
    write_to_bq(bigquery)

Starting up your Kubernetes app

After optionally building and pushing your Docker image, you can begin starting up your Kubernetes app.

In Kubernetes, pods {: target='k8s' track-type='tutorial' track-name='externalLink' track-metadata-position='body'} – rather than individual application containers – are the smallest deployable units that can be created, scheduled, and managed.

A replica set {: target='k8s' track-type='tutorial' track-name='externalLink' track-metadata-position='body'} ensures that a specified number of pod replicas are running at any one time. If there are too many replicas, it will shut down some. If there are too few, it will start more. As opposed to just creating singleton pods or even creating pods in bulk, a replica set replaces pods that are deleted or terminated for any reason, such as in the case of node failure.

A Deployment {: target='k8s' track-type='tutorial' track-name='externalLink' track-metadata-position='body'} provides declarative updates for pods and replica sets. You only need to describe the desired state in a Deployment object, and the Deployment controller will change the actual state to the desired state at a controlled rate for you.

You will use Deployments for the Redis-to-BigQuery and Twitter-to-Redis pods.

Start the Redis master pod

Start the Redis master pod first. If you're curious, you can see the specification file for the Redis master pod in kube-redis-bq/redis/redis-master.yaml:

redis/redis-master.yaml

View on GitHub

apiVersion: v1
kind: ReplicationController
metadata:
  name: redis-master
  labels:
    name: redis-master
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: redis-master
    spec:
      containers:
      - name: master
        image: redis
        ports:
        - containerPort: 6379

To start the pod, run the following command:

kubectl create -f kube-redis-bq/redis/redis-master.yaml

Run the following command to verify that the Redis master pod is running:
```
kubectl get pods
```

Look at the STATUS field to verify that the master is running. In addition, you might see some other pods already running. These extra pods are added and used by the Kubernetes framework.

It can take around 30 seconds for the pod to be placed on an instance. During this period, the Redis master pod will list its host as <unassigned>. However, the final value of Host will be the name of the instance that the pod is running on. When the process completes, the pod's status will change from ContainerCreating to Running.

Start the Redis master's service

In Kubernetes, a service defines a logical set of pods and a policy by which to access them. The service is specified in kube-redis-bq/redis/redis-master-service.yaml as follows:

redis/redis-master-service.yaml

View on GitHub

apiVersion: v1
kind: Service
metadata:
  name: redismaster
  labels:
    name: redis-master
spec:
  ports:
    # The port that this service should serve on.
    # You don't need to specify the targetPort if it is the same as the port,
    # though here we include it anyway, to show the syntax.
  - port: 6379
    targetPort: 6379
  selector:
    name: redis-master

The Redis pod that you created above has the label name: redis-master. The selector field of the service determines which pods will receive the traffic sent to the service.

Enter the following command to start the service:

kubectl create -f kube-redis-bq/redis/redis-master-service.yaml

To view the running services, enter the following command:
```
kubectl get services
```

Start the Redis-to-BigQuery Deployment

Next, start the Deployment for the pods that pull tweets from the Redis cache and stream them to your BigQuery table.

Before you start the Deployment, you might find it useful to take a closer look at kube-redis-bq/redis/bigquery-controller.yaml. This file defines the Deployment, and specifies a replica set with two pod replicas. The characteristics of these replicated pods are defined in the file by using a pod template.

Run the following command to start up the replicated pods:

kubectl create -f kube-redis-bq/redis/bigquery-controller.yaml

To verify that the pods are running, run the following:
```
kubectl get pods
```
It can take about 30 seconds for your new pods to move from ContainerCreating to Running. You should see one Redis pod labeled redis-master, and two BigQuery-to-Redis pods labeled bigquery-controller.
You can run:

kubectl get deployments

to see the system's defined Deployments, and how many replicas each is specified to have.

Start the Twitter-to-Redis Deployment

After your Redis master and Redis-to-BigQuery pipeline pods are running, start the Deployment that pulls in tweets and adds them to your Redis cache. Like kube-redis-bq/redis/bigquery-controller.yaml, the Twitter-to-Redis specification file, kube-redis-bq/redis/twitter-stream.yaml, defines a Deployment. However, this time the Deployment only asks for one replicated pod, as Twitter only allows one streaming API connection at a time.

To start up the Deployment, enter the following command:

kubectl create -f kube-redis-bq/redis/twitter-stream.yaml

Verify that all of your pods are now running by running the following command. Again, it can take about 30 seconds for your new pod to move from ContainerCreating to Running.
```
kubectl get pods
```

In addition to the pods you saw in the previous step, you should see a new pod labeled twitter-stream. Congratulations! Your analysis pipeline is now up and running.

You can again run:

kubectl get deployments

to see the system's defined Deployments, and how many replicas each is specified to have. You should now see something like this:

  NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
  bigquery-controller    2         2         2            2           2m
  twitter-stream         1         1         1            1           2m

Querying your BigQuery table

Open the BigQuery web UI and click Compose Query to begin writing a new query. You can verify that it is working by running a simple query, as follows:

SELECT
  text
FROM
  [rtda.tweets]
LIMIT
  1000 IGNORE CASE;

Let your pipeline collect tweets for a while – a few hours should do, but the longer you let it run, the richer your data set will be. After you have some more data in your BigQuery table, you can try running some interesting sample queries.

This sample query demonstrates how to find the most retweeted tweets in your table, filtering on a specific term (in this case, "android"):

SELECT
  text,
  MAX(retweeted_status.retweet_count) AS max_retweets,
  retweeted_status.user.screen_name
FROM
  [rtda.tweets]
WHERE
  text CONTAINS 'android'
GROUP BY
  text,
  retweeted_status.user.screen_name
ORDER BY
  max_retweets DESC
LIMIT
  1000 IGNORE CASE;

You might also find it interesting to filter your collected tweets by a set of terms. The following query filters by the words "Kubernetes," "BigQuery," "Redis," or "Twitter:"

SELECT
  created_at,
  text,
  id,
  retweeted_status.retweet_count,
  user.screen_name
FROM
  [rtda.tweets]
WHERE
  text CONTAINS 'kubernetes'
  OR text CONTAINS 'BigQuery'
  OR text CONTAINS 'redis'
  OR text CONTAINS 'twitter'
ORDER BY
  created_at DESC
LIMIT
  1000 IGNORE CASE;

The following query looks for a correlation between the number of favorites and the number of retweets in your set of tweets:

SELECT
  CORR(retweeted_status.retweet_count, retweeted_status.favorite_count),
  lang,
COUNT(*) c
FROM [rtda.tweets]
GROUP BY lang
HAVING c > 2000000
ORDER BY 1

Going even deeper, you could also investigate whether the speakers of a specific language prefer favoriting to retweeting, or vice versa:

SELECT
  CORR(retweeted_status.retweet_count, retweeted_status.favorite_count),
  lang,
COUNT(*) c,
AVG(retweeted_status.retweet_count) avg_rt,
AVG(retweeted_status.favorite_count) avg_fv,
AVG(retweeted_status.retweet_count)/AVG(retweeted_status.favorite_count) ratio_rt_fv
FROM [rtda.tweets]
WHERE retweeted_status.retweet_count > 1 AND retweeted_status.favorite_count > 1
GROUP BY lang
HAVING c > 1000000
ORDER BY 1;

Cleaning up

Labels make it easy to select the resources you want to stop or delete. For example:

kubectl delete deployment -l "name in (twitter-stream, bigquery-controller)"

You can also delete a resource using its specification filename. For example, delete the Redis service and replication controller like this:

kubectl delete -f redis-master.yaml

kubectl delete -f redis-master-service.yaml

If you'd like to shut down your cluster instances altogether, run the following command:

kubernetes/cluster/kube-down.sh

This deletes all of the instances in your cluster.

Appendix: Building and pushing the Docker image

Your Kubernetes pods require a Docker image that includes the app scripts and their supporting libraries. This image is used to start up the Twitter-to-Redis and Redis-to-BigQuery pods that are part of your analysis pipeline. The Dockerfile located in the kube-redis-bq/redis/redis-pipe-image directory specifies this image as follows:

redis/redis-pipe-image/Dockerfile

View on GitHub

FROM google/python

RUN pip install --upgrade pip
RUN pip install tweepy
RUN pip install --upgrade google-api-python-client
RUN pip install redis
RUN pip install python-dateutil

ADD twitter-to-redis.py /twitter-to-redis.py
ADD redis-to-bigquery.py /redis-to-bigquery.py
ADD controller.py /controller.py
ADD utils.py /utils.py

CMD python controller.py

This Dockerfile first instructs the Docker daemon to install some required Python libraries to the new image:

tweepy, a Python wrapper for the Twitter API.
redis-py, a Python wrapper for Redis.
The Google API Python libraries.
python-dateutil, an extension to Python's standard datetime module.

Then, four scripts in redis-pipe-image are added:

controller.py: Provides a common execution point for the other two Python scripts in the pipeline image.
utils.py: Contains some helper functions.
redis-to-bigquery.py: Streams cached data from Redis to BigQuery.
twitter-to-redis.py: Streams incoming tweets from Twitter to Redis.
To build the pipeline image using the associated Dockerfile, run the following command. Replace [PROJECT_ID] with your project ID:
```
sudo docker build -t gcr.io/[PROJECT_ID]/pipeline_image kube-redis-bq/redis/redis-pipe-image
```
After you build your image, push it to the Google Container Registry so that Kubernetes can access it. Replace [PROJECT_ID] with the ID of your project when you run the following command:
```
sudo gcloud docker push gcr.io/[PROJECT_ID]/pipeline_image
```

Update the Kubernetes specification files to use your custom image

Next, update the image values in the specification files to use your new image. You need to update the following two files:

kube-redis-bq/redis/twitter-stream.yaml

kube-redis-bq/redis/bigquery-controller.yaml

Edit each file and look for this line:
```
gcr.io/google-samples/redis-bq-pipe:v3
```
Replace your-project with your project ID. Replace the name of the image with pipeline_image.
Save and close the files.

What's next

Try out other Google Cloud Platform features for yourself. Have a look at our tutorials.

Objectives

Costs

Before you begin

Create and configure a new project

Set up the Google Cloud SDK

Download the example code

Download Kubernetes

Creating your BigQuery table

Configuring and starting a Kubernetes cluster

(Optional) Update the startup scripts for long-term Redis reliability

Start up your cluster

Creating a Twitter application

Understanding the Docker image for your Kubernetes pods

Starting up your Kubernetes app

Start the Redis master pod

Start the Redis master's service

Start the Redis-to-BigQuery Deployment

Start the Twitter-to-Redis Deployment

Querying your BigQuery table

Cleaning up

Appendix: Building and pushing the Docker image

Update the Kubernetes specification files to use your custom image

What's next

Send feedback about...