Google Cloud Platform Blog: open source

Collaborative Mathematics with SageMathCloud and Google Cloud Platform

Posted: Monday, September 29, 2014

Cross-posted on the Google for Education blog and Google Research blog

Modern mathematics research is distinguished by its openness. The notion of "mathematical truth" depends on theorems being published with proof, letting the reader understand how new results build on the old, all the way down to basic mathematical axioms and definitions. These new results become tools to aid further progress.

Nowadays, many of these tools come either in the form of software or theorems whose proofs are supported by software. If new tools produce unexpected results, researchers must be able to collaborate and investigate how those results came about. Trusting software tools means being able to inspect and modify their source code. Moreover, open source tools can be modified and extended when research veers in new directions.

In an attempt to create an open source tool to satisfy these requirements, University of Washington Professor William Stein built SageMathCloud (or SMC). SMC is a robust, low-latency web application for collaboratively editing mathematical documents and code. This makes SMC a viable platform for mathematics research, as well as a powerful tool for teaching any mathematically-oriented course. SMC is built on top of standard open-source tools, including Python, LaTeX, and R. In 2013, William received a 2013 Google Research Award which provided Google Cloud Platform credits for SMC development. This allowed William to extend SMC to use Google Compute Engine as a hosting platform, achieving better scalability and global availability.

SMC allows users to interactively explore 3D graphics with only a browser

SMC has its roots in 2005, when William started the Sage project in an attempt to create a viable free and open source alternative to existing closed-source mathematical software. Rather than starting from scratch, Sage was built by making the best existing open-source mathematical software work together transparently and filling in any gaps in functionality.

During the first few years, Sage grew to have about 75K active users, while the developer community matured with well over 100 contributors to each new Sage release and about 500 developers contributing peer-reviewed code.

Inspired by Google Docs, William and his students built the first web-based interface to Sage in 2006, called The Sage Notebook. However, The Sage Notebook was designed for a small number of users and would work for a small group (such as a single class), but soon became difficult to maintain for larger groups, let alone the whole web.

As the growth of new users for Sage began to stall in 2010, due largely to installation complexity, William turned his attention to finding ways to expand Sage's availability to a broader audience. Based on his experience teaching his own courses with Sage, and feedback from others doing the same, William began building a new Web-hosted version of Sage that can scale to the next generation of users.

The result is SageMathCloud, a highly distributed multi-datacenter application that creates a viable way to do computational mathematics collaboratively online. SMC uses a wide variety of open source tools, from languages (CoffeeScript, node.js, and Python) to infrastructure-level components (especially Cassandra, ZFS, and bup) and a number of in-browser toolkits (such as CodeMirror and three.js).

Latency is critical for collaborative tools: like an online video game, everything in SMC is interactive. The initial versions of SMC were hosted at UW, at which point the distance between Seattle and far away continents was a significant issue, even for the fastest networks. The global coverage of Google Cloud Platform provides a low-latency connection to SMC users around the world that is both fast and stable. It's not uncommon for long-running research computations to last days, or even weeks -- and here the robustness of Google Compute Engine, with machines live-migrating during maintenance, is crucial. Without it, researchers would often face multiple restarts and delays, or would invest in engineering around the problem, taking time away from the core research.

SMC sees use across a number of areas, especially:

Teaching: any course with a programming or math software component, where you want all your students to be able to use that component without dealing with the installation pain. Also, SMC allows students to easily share files, and even work together in realtime. There are dozens of courses using SMC right now.

Collaborative Research: all co-authors of a paper can work together in an SMC project, both writing the paper there and doing research-level computations.

Since launching SMC in May 2013, there are already more than 20,000 monthly active users who've started using Sage via SMC. We look forward to seeing if SMC has an impact on the number of active users of Sage, and are excited to learn about the collaborative research and teaching that it makes possible.

-Posted by Craig Citro, Software Engineer

gcloud-node - a Google Cloud Platform Client Library for Node.js

Posted: Tuesday, September 16, 2014

Today we are announcing a new category of client libraries that has been built specifically for Google Cloud Platform. The very first library, gcloud-node, is idiomatic and intuitive for Node.js developers. With today’s release, you can begin integrating Cloud Datastore and Cloud Storage into your Node.js applications, with more Cloud Platform APIs and programming languages planned.

The easiest way to get started is by installing the gcloud package using npm:

$ npm install gcloud

With gcloud installed, your Node.js code is simpler to write, easier to read, and cleaner to integrate with your existing Node.js codebase. Take a look at the code required to retrieve entities from Datastore:

var gcloud = require('gcloud');

var dataset = new gcloud.datastore.Dataset({
projectId: 'my-project',
keyFilename: '/path/to/keyfile.json' // Details at 
              //https://github.com/googlecloudplatform/gcloud-node#README
});

dataset.get(dataset.key('Product', 123), function(err, entity) {
console.log(err, entity);
});

gcloud is open-sourced on Github; check out the code, file issues and contribute a PR - contributors are welcome. Got questions? Post them on StackOverflow with the [gcloud-node] tag.

Learn more about the Client Library for Node.js at http://googlecloudplatform.github.io/gcloud-node/ and try gcloud-node today.

-Posted by JJ Geewax, Software Engineer

Node.js is a trademark of Joyent, Inc. and npm is a trademark of npm, Inc.

In case you missed it in August: Google Cloud Platform Live, security that matters, Kubernetes expands and Zync joins Google

Posted: Wednesday, September 3, 2014

Out on holiday in August? We weren’t. Here’s what we were up to:

Google Cloud Platform Live date announced
This month, we announced the second-ever Google Cloud Platform Live, coming to you from San Francisco on November 4. Earlier this year, thousands of developers from around the world joined us in-person and online to hear our vision for the future of cloud computing. In March, the event filled up fast, so register to join us - either in person or online.

Security that matters
To help companies and developers use cloud services, it is essential that cloud providers are transparent about their security and privacy practices. We published our updated ISO 27001 certificate and SOC 2 and SOC 3 Type II audit report, which are the most widely recognized and accepted independent security compliance reports.

The most simple explanation of cloud ever
As Business Insider put it, our updated security certs gave us the opportunity to provide “the most simple explanation of cloud computing ever.”

VMware joins the Kubernetes family
We are very excited to welcome VMware to the Kubernetes family. With the addition of VMware in the community, we’re highlighting the infrastructure side of cluster management. VMware’s technical expertise in this area will contribute to making Kubernetes a capable, powerful and secure platform.

Google Compute Engine updates and improvements
The Compute Engine team has been busy with many new improvements. We’ve added new zones in the U.S. and Asia, and we’ve made SSD persistent disks generally available in all Compute Engine zones. We've also made it easier for developers to create custom images right from their root persistent disks.

Click-to-Deploy with Google Compute Engine
In July, we released a number of easy deployments on Google Compute Engine (check out our posts about one-click Apache Cassandra and one-click RabbitMQ. This past month, we kept the momentum. We introducing the first click-to-deploy open source development stack. Users can now deploy a MongoDB-Express Web Framework-AngularJS-NodeJS (MEAN) stack with a single click. We also announced GitLab Community Edition for Compute Engine. GitLab Community Server is great way to get the benefits of collaborative development for your team wherever you want it.

SSH from the Developers Console
No one likes abandoning a task to manage a VM. Now, you can SSH directly to your VM without leaving the Developers Console in your browser. Learn about the simplest way to access your Compute Engine VMs on the blog.

The GDELT Event Database comes to Google BigQuery
In May, we announced the availability of the entire quarter-billion-record GDELT Event Database in Google BigQuery, we were excited. This month, we revisited the GDELT Event Database to compute correlations between historical events.

Welcome Zync Render
Zync Render, the visual effects cloud rendering technology behind Star Trek Into Darkness, Transformers and Looper, has joined Google. Special effects require huge amounts of compute capacity to render visual effects. We’re excited to help, and we’ll have more on our rendering offerings in the next months.

-Posted by Benjamin Bechtolsheim, Product Marketing Manager

Click to Deploy GitLab Community Edition on Google Compute Engine

Posted: Wednesday, August 27, 2014

Every software company today needs a place to store their code and collaborate with teammates. Today we are announcing a solution that can scale with your business. GitLab Community Server is great way to get the benefits of collaborative development for your team wherever you want it. While GitLab already provides simple application installers, we wanted to take it one step further.

Today, we’re announcing Click to Deploy for the GitLab Community Server built on the following open source stack:

Nginx, a fast, minimal web server
Unicorn, Ruby on Rails hosting server
Redis, scalable caching service
PostgreSQL, popular SQL database

Get your own, dedicated code collaboration server today!

Learn more about running the GitLab Community Server on Google Compute Engine at https://developers.google.com/cloud/gitlab.

-Posted by Brian Lynch, Solutions Architect

GitLab is a registered trademark of GitLab B.V.. All other trademarks cited here are the property of their respective owners.

Containers, VMs, Kubernetes and VMware

Posted: Monday, August 25, 2014

Two months ago, we announced Kubernetes, an open source cluster manager for Docker containers. Since then we’ve seen an impressive community develop around Kubernetes, and today we’re thrilled to welcome VMware to the Kubernetes community.

We’ve spent a lot of time talking about how we’re building Kubernetes to provide a unique infrastructure for easily building scalable, reliable systems like we do at Google. With the addition of VMware in the community, we thought we’d take the time to discuss the infrastructure side of cluster management and how VMware’s deep technical expertise in this area will make Kubernetes a more capable, powerful and secure platform beyond Google Cloud Platform.

One of the fundamental tenets of Kubernetes is the decoupling of application containers from the details of the systems on which they run. Google Cloud Platform provides a homogenous set of raw resources via virtual machines (VMs) to Kubernetes, and in turn, Kubernetes schedules containers to use those resources. This decoupling simplifies application development since users only ask for abstract resources like cores and memory, and it also simplifies data center operations, since every machine is identical and isolated from the details of the applications that run on them.

VMware will provide enhanced capabilities for running a reliable Kubernetes cluster, much like Google Cloud Platform. The core resources here are:

Machines: virtual machines on which containers run
Network: the physical or virtualized connectivity between containers in the cluster
Storage: reliable, cluster level distributed storage outside of a container’s lifecycle

Providing machines for Kubernetes in not only necessary as a pool of raw cycles and bytes but also can provide a critical extra layer of security. Security is a continuum on which you pick solutions based on threats and risk tolerance. While container security is an evolving area, VMs have a longer track record and are a smaller attack surface. Fundamentally, even in Kubernetes, the machine is a strong security domain. Linux containers can provide strong resource isolation, ensuring, for example, that one container has dedicated access to a specific core in the processor. For semi-trusted workloads, containers may be sufficient. However, because containers share the same kernel, there’s an expanded surface area that may make them insufficient as your only line of defense. For untrusted workloads or users, we highly suggest defense in depth with virtual machine technology as a second layer of security. Indeed, this is how two different users’ Kubernetes clusters can safely co-exist on the same physical infrastructure in a Google data center. VMware will help Kubernetes implement this same pattern of using virtualization to secure physical machines, when those machines are outside of Google’s data centers.

While running individual containers is sufficient for some use cases, the real power of containers comes from implementing distributed systems, and to do this you need a network. However, you don’t just need any network. Containers provide end users with an abstraction that makes each container a self contained unit of computation Traditionally, one place where this has broken down is networking, where containers are exposed on the network via the shared host machine’s address. In Kubernetes, we’ve taken an alternative approach: that each group of containers (called a Pod) deserves its own, unique IP address that’s reachable from any other Pod in the cluster, whether they’re co-located on the same physical machine or not. To achieve this in the Google data center, we’ve taken advantage of the advanced routing features that are available via Google Compute Engine’s Andromeda network virtualization. VMware, with their deep knowledge in network virtualization, specifically Open Virtual Switch (OVS), will simplify network configuration in Kubernetes clusters running outside of Google’s data centers.

Finally, nearly every application that you run needs some sort of storage, but the storing that data on specific machines in your datacenter makes it difficult to schedule containers in the cluster to maximize efficiency and reliability, since pods are forced to co-locate with their data. When Kubernetes runs on Google Cloud Platform, you’ll soon be able to pair your container up with a Persistent Disk (PD) volume, so that regardless of where your container is scheduled in the cluster, its storage follows it to the physical machine. VMware will work with Kubernetes to include integration points to distributed storage systems such as their Virtual-SAN scalable virtual storage solution to enable similar capabilities for users not running on Google Cloud Platform, in addition to simpler less robust shared storage solutions available for users that don't have access to a reliable network storage system.

We developed and open sourced Kubernetes to provide applications developers and operations teams with the ability to build and scale their applications like Google. The addition of VMware’s technical expertise in cluster infrastructure will enable people begin to compute like Google, regardless of where they physically do that computation.

-Posted by Craig Mcluckie, Product Manager

Mesosphere collaborates with Kubernetes and Google Cloud Platform

Posted: Monday, August 18, 2014

Today’s guest post is by Florian Leibert, Mesosphere Co-Founder & CEO. Prior to Mesosphere, he was an engineering lead at Twitter where he helped introduced Mesos to Twitter where it now runs every new service. He then went on to help build the analytics stack at Airbnb on Mesos. He is the main author of Chronos, an Apache Mesos framework for managing and scheduling ETL systems.

Mesosphere enables users to manage their datacenter or cloud as if it were one large machine. It does this by creating a single, highly-elastic pool of resources from which all applications can draw, creating sophisticated clusters out of raw compute nodes (whether physical machines or virtual machines). These Mesosphere clusters are highly available and support scheduling of diverse workloads on the same cluster, such as those from Marathon, Chronos, Hadoop, and Spark. Mesosphere is based on the open source Apache Mesos distributed systems kernel used by customers like Twitter, Airbnb, and Hubspot to power internet scale applications. Mesosphere makes it possible to develop and deploy applications faster with less friction, operate them at massive scale with lower overhead, and enjoy higher levels of resiliency and resource efficiency with no code changes.

We’re collaborating with Google to bring together Mesosphere, Kubernetes and Google Cloud Platform to make it even easier for our customers to run applications and containers at scale. Today, we are excited to announce that we’re bringing Mesosphere to the Google Cloud Platform with a web app that enables customers to deploy Mesosphere clusters in minutes. In addition, we are also incorporating Kubernetes into Mesos to manage the deployment of Docker workloads. Together, we provide customers with a commercial-grade, highly-available and production-ready compute fabric.

With our new web app, developers can literally spin up a Mesosphere cluster on Cloud Platform in just a few clicks, using either standard or custom configurations. The app automatically installs and configures everything you need to run a Mesosphere cluster, including the Mesos kernel, Zookeeper and Marathon, as well as OpenVPN so you can log into your cluster. Also, we’re excited that this functionality will soon be incorporated into the Google Cloud Platform dashboard via the click-to-deploy feature. There is no cost for using this service beyond the charges for running the configured instances on your Google Cloud Platform account. To get started with our web app, simply login with your Google credentials and spin up a Mesos cluster.

We are also incorporating Kubernetes into Mesos and our Mesosphere ecosystem to manage the deployment of Docker workloads. Our combined compute fabric can run anywhere, whether on Google Cloud Platform, your own datacenter, or another cloud provider. You can schedule Docker containers side by side on the same Mesosphere cluster as other Linux workloads such as data analytics tasks like Spark and Hadoop and more traditional tasks like shell scripts and jar files.

Whether you are running massive, internet scale workloads like many of our customers, or you are just getting started, we think the combination of Mesos, Kubernetes, and Google Cloud Platform will help you build your apps faster, deploy them more efficiently, and run them with less overhead. We look forward to working with Google to make Cloud Platform the best place to run traditional Mesosphere workloads, such as Marathon, Chronos, Hadoop, or Spark—or newer Kubernetes workloads. And they can all be run together while sharing resources on the same cluster using Mesos. Please take Mesosphere for Google Cloud Platform for a test drive and let us know what you think.

- Contributed by Florian Leibert, Mesosphere Co-Founder & CEO

Click to Deploy MEAN Development Stack on Google Compute Engine

Posted: Thursday, August 14, 2014

If you’re starting out today, there are a number of development stacks to choose from. From the original LAMP (Linux, Apache, MySQL, PHP) to the myriad of other choices, there is a development stack to match your language and experience. For the NodeJS fans out there, the MEAN stack is a great option. Wouldn’t it be awesome if you could launch your favorite development stack with the click of a button?

Today, we’re announcing the first Click to Deploy development stack on Google Compute Engine. MEAN provides you with the best of open source software today:

MongoDB, a leading NoSQL database
Express Web Framework, a minimal and flexible node.js web application framework
AngularJS, an extensible Javascript framework for responsive applications
NodeJS, a platform built on Chrome’s JavaScript runtime for server-side Javascript

With a single button click, you can launch a complete MEAN development stack ready for development! Click to Deploy for MEAN handles all software installs and setting up a sample app for you to get started.

So, get out and click to deploy your MEAN development stack today!

Learn more about running the MEAN development stack on Google Compute Engine at https://developers.google.com/cloud/mean.

-Posted by Brian Lynch, Solutions Architect

MEAN.io, is a registered trademark of Linnovate Technologies Ltd, Inc. All other trademarks cited here are the property of their respective owners.

In case you missed in July: The Kubernetes community grows, Cloud Platform goes to the World Cup, and a whole lot more

Posted: Friday, August 1, 2014

In case you happened to miss some of the Cloud Platform news in July, we’ve got a round-up for you:

Expanding the Kubernetes community
This month, we announced that Microsoft, Red Hat, IBM, Docker, Mesosphere, CoreOS and SaltStack are joining the Kubernetes community. Kubernetes is our open source container management solution. These companies are going to work with us to ensure that Kubernetes is a strong container management framework for any application and in any environment - whether in a private, public or hybrid cloud.

Cloud Platform predicts the World Cup
We kicked off the month with a focus on the World Cup. We used Google Cloud Dataflow to ingest touch-by-touch gameplay data from World Cup matches going back to 2006 as well as three years of English Barclays Premier League, two seasons of Spanish La Liga, and two seasons of U.S. MLS. We then polished the raw data into predictive statistics using Google BigQuery. At the end of the day, we correctly predicted the final outcome as well as 11 of 12 of the games leading up to it. You can read our posts after the round of 16, after the quarterfinals, and before the final.

A great new way to learn about App Engine
We launched a new course on Udacity: Developing Scalable Apps with Google App Engine. We’ve already gotten great feedback from developers, and a few of our favorite sections are Urs talking about what makes App Engine unique as well as a brief history of the data center (pizza boxes included).

More container news: Red Hat Enterprise Atomic Host comes to Compute Engine
Jim Totton, Vice President and General Manager at Red Hat, wrote on our blog about Red Hat Enterprise Linux Atomic Host coming to Google Compute Engine. This provides a secure, lightweight and minimal footprint operating system optimized to run Linux Containers on Google’s infrastructure.

More great customers
We featured lots of great customers who are using Google Cloud Platform to power their business. Webydo, a B2B solution for professional web design, cut costs by 37% when they moved to Google Cloud Platform. And US Cellular is using BigQuery for “highly flexible analysis of large datasets.” This has allowed them to better measure the effectiveness of marketing campaigns.

David LaBine, Director of education software for SMART Technologies, wrote on our blog that using App Engine means “developers [at SMART Technologies] are more productive because they’re able to focus on writing new features rather than worrying about infrastructure…” Rafael Sanches, co-founder of Allthecooks, wrote on our blog that, “Google Cloud Platform played a key role in helping us grow... Since launching, we’ve grown to over 12 million users with a million monthly active users. Our application now sees millions of interactions daily that run through Google App Engine and Google Cloud Datastore. “

Finally, Brightcove and Fastly wrote on our blog that “because Google Cloud Platform launches instances in less than half the time of the rest of the industry, Fastly is able to launch new customers through Brightcove in a turnkey way.”

More product news
We introduced the Google Cloud Monitoring Read API, giving developers programmatic access to over 30 different metrics about their services, including CPU usage, disk IO and much more. Cloud Monitoring Read API allows you to query current and historical metric data for up to the past 30 days.

Also, click-to-deploy Apache Cassandra makes it easy to launch a dedicated Apache Cassandra cluster on Google Compute Engine. All it takes is one click after some basic information. In a matter of minutes, you can get a complete Cassandra cluster deployed and configured.

The roadshows kicked off
The Google Cloud Platform developer roadshow visited Los Angeles, San Francisco and Seattle in July. But, we’ve still got much of the tour coming up, so join us on the road to speak with the Cloud Platform team. You can still catch us in New York City (August 5), Cambridge (August 7), Boulder (August 12), Toronto (August 12), Austin (August 14), Atlanta (August 19), and Chicago (August 22). Click here to register.

-Posted by Benjamin Bechtolsheim, Product Marketing Manager

Reducing ranking latency from one hour to five seconds using Cloud Datastore

Posted: Thursday, July 31, 2014

We recently published a case study, Fast and Reliable Ranking in Datastore, that describes how we helped one of our Google App Engine customers shorten their ranking latency from one hour to five seconds. They applied unique design patterns such as job aggregation to achieve over 300 updates per second with strong consistency on Cloud Datastore. The following are highlights from the article.

The problem of ranking
Tomoaki Suzuki, an App Engine lead engineer at Applibot, a major game studio in Japan, has been trying to solve the common, yet difficult problem faced by every large gaming service: ranking.

Tomoaki Suzuki, App Engine lead engineer at Applibot, Inc. and their game Legend of Criptids (#1 ranked game in the Apple App Store North America gaming category in October 2012)

The requirements are simple:

Your game has hundreds of thousands (or more!) players.
Whenever a player fights enemies (or performs other activities), their score changes.
You want to show the latest ranking for the player on a web portal page.

Getting a rank is easy, if it's not expected to also be scalable and fast. For example, you could execute the following query:

SELECT count(key) FROM Players WHERE Score > YourScore

This query counts all the players who have a higher score than yours. But do you want to execute this query for every request from the portal page? How long would it take when you have a million players?

Tomoaki initially implemented this approach, but it took a few seconds to get each response. This was too slow, too expensive, and performed progressively worse as scale increased.

The easiest way: scan all players

Next, Tomoaki tried to maintain ranking data in Memcache. This was fast, but not reliable, because Memcache entries are just caches and could be evicted at any time. With a ranking service that depended solely on in-memory-key-values, it was difficult to maintain consistency and availability.

Looking for an O(log n) Algorithm
I was assigned to Applibot under a platinum support contract. I knew that ranking was a classic and yet hard-to-solve problem for any scalable distributed service. The simple query solution requires scanning all players with a higher score to count the rank of one player. The time complexity of this algorithm is O(n); that is, the time required for query execution increases proportionally to the number of players. In practice, this means that the algorithm is not scalable. Instead, we need an O(log n) or faster algorithm, where the time will only increase logarithmically as the number of players grows.

If you ever took a computer science course, you may remember that tree algorithms, such as binary trees, red-black trees, or B-Trees, can perform at O(log n) time complexity for finding an element. Tree algorithms can also be used to calculate an aggregate value of a range of elements, such as count, max/min, and average by holding the aggregated values on each branch node. Using this technique, it is possible to implement a ranking algorithm with O(log n) performance.

I found an open source implementation of a tree-based ranking algorithm for Datastore, written by a Google engineer: the Google Code Jam Ranking Library.

Getting the rank of a score in a tertiary tree with google Code Jam Ranking Library

Concurrent Updates Limit Scalability
However, during load testing, I found a critical limitation with the Code Jam ranking library. Its scalability in terms of update throughput was quite low. When he increased the load to three updates per second, the library started to return transaction retry errors. It was obvious that the library could not satisfy Applibot's requirement for 300 updates per second. It could handle only about 1% of that throughput.

Why is that? The reason is the cost of maintaining the consistency of the tree. In Datastore, you must use an entity group to assure strong consistency when updating multiple entities in a transaction—see "Balancing Strong and Eventual Consistency with Google Cloud Datastore". The Code Jam ranking library uses a single entity group to hold the entire tree to ensure consistency of the counts in the tree elements.

However, an entity group in Datastore has a performance limitation. Datastore only supports about one transaction per second on an entity group. Furthermore, if the same entity group is modified in concurrent transactions, they are likely to fail and must be retried. The Code Jam ranking library is strongly consistent, transactional, and fairly fast, but it does not support a high volume of concurrent updates.

Datastore Team's Solution: Job Aggregation
I remembered that a software engineer on the Datastore team had mentioned a technique to obtain much higher throughput than one update per second on an entity group. This could be achieved by aggregating a batch of updates into one transaction, rather than executing each update as a separate transaction. So Kaz asked the Datastore team for a solution for this problem.

In response to my request, the Datastore team started discussing this issue and advised us to consider using Job Aggregation, one of the design patterns used with Megastore, the underlying storage layer of Datastore, that manages the consistency and transactionality of entity groups. The basic idea of Job Aggregation is to use a single thread to process a batch of updates. Because there is only one thread and only one transaction open on the entity group, there are no transaction failures due to concurrent updates. You can find similar ideas in other storage products such as VoltDb and Redis.

Proposed Solution Runs at 300 Updates per Second Sustained
Based on the advice from the Datastore team, I wrote Proof of Concept (PoC) code that combines the Job Aggregation pattern with the Code Jam ranking library. The PoC creates a pull queue, which is a kind of Task Queue in App Engine that allows developers to implement one or multiple workers that consume the tasks added to the queue. The backend instance has a single thread in an infinite loop that keeps pulling as many tasks as possible (up to 1000) from the queue. The thread passes each update request to the Code Jam ranking library, which executes them as a batch in a single transaction. The transaction may be open for a second or more, but because there is a single thread driving the library and Datastore, there is no contention and no concurrent modification problem.

The following figure shows the load testing result of the final PoC implementation. Kaz also incorporated another design pattern, Queue Sharding, to effectively minimize the performance fluctuations in each task queue. With the final proposed solution, it can sustain 300 updates per second over several hours. Under usual load, each update is applied to Datastore within a few seconds of receiving the request.

Performance graph of the solution

With the load testing results and the PoC code, I presented the solution to Tomoaki and other Applibot engineers. Tomoaki plans to incorporate the solution in their production system, expects to reduce the latency of updating the ranking info from one hour to five seconds, and hopes to dramatically improve the user experience.

-Posted by Kazunori Sato, Solutions Architect

Notes
Any performance figures described in this article are sampled values for reference and do not guarantee any absolute performance of App Engine, Datastore, or other services.

Click to Deploy Apache Cassandra on Google Compute Engine

Posted: Monday, July 14, 2014

If you saw our post about Cassandra hitting 1 million writes per second on Google Compute Engine, then you know we’re getting serious about open source NoSQL. We’re making it easier to run the software you love at the scale you need with the reliability of Google Compute Platform. With over a dozen different virtual machine types, and the great price for performance of persistent disks, we think Google Compute Engine is a fantastic place for Apache Cassandra.

Today, we’re making it even easier to launch a dedicated Apache Cassandra cluster on Google Compute Engine. All it takes is one click after some basic information such as the size of the cluster. In a matter of minutes, you get a complete Cassandra cluster deployed and configured.

Each node is automatically configured for the cloud including:

Configured with the GoogleCloudSnitch for Google Cloud Platform awareness
Writes tuned for Google Persistent Disk
JVM tuned to perform on Google Compute Engine instances

The complete set of tuning parameters can be found on the Click to Deploy help page.

So, get out and click to deploy your Cassandra cluster today!

Learn more about running Apache Cassandra on Google Compute Engine at https://developers.google.com/cloud/cassandra.

-Posted by Brian Lynch, Solutions Architect

Cassandra, is registered trademarks of Apache, Inc. All other trademarks cited here are the property of their respective owners.

Red Hat Enterprise Linux Atomic Host expands to Google Compute Engine

Posted: Thursday, July 10, 2014

Today’s guest blog comes from Jim Totton, Vice President and General Manager, Platform Business Unit at Red Hat

Red Hat Enterprise Linux Atomic Host is now available as a technology preview on Google Compute Engine for customers participating in the Red Hat Enterprise Linux 7 Atomic special interest group (SIG). The inaugural SIG is focused on application containers and encompasses the technologies that are required to create, deploy and manage application containers.

Google and Red Hat actively collaborate on container technologies, approaches and best practices. Both companies are committed to standards for container management, interoperability and orchestration. As a gateway to the open hybrid cloud, application containers enable new possibilities for customers and software providers, including application portability, deployment choice and hyperscale and resilient architectures - whether on-premise or in the cloud.

At Red Hat Summit in April, Red Hat announced our vision for Linux Containers and expanded the Red Hat Enterprise Linux 7 High Touch Beta program to include Red Hat Enterprise Linux Atomic Host – a secure, lightweight and minimal footprint operating system optimized to run Linux Containers. Moving forward, Red Hat will work closely with these participants, with assistance from Google, to support them as they explore application containers. This will help us both gather important requirements and feedback on use cases for these technologies and enable the hybrid cloud for our joint customers.

We also announced today that Red Hat and Google are collaborating to tackle the challenge of how to manage application containers at scale, across hundreds or thousands of hosts. Red Hat will be joining the Kubernetes community and actively contributing code. Earlier today on our blog, we wrote:

Red Hat is embracing the Google Kubernetes project and plans to work to enable it with container management capabilities in our products and offerings. This will enable Red Hat customers to take advantage of cluster management capabilities in Kubernetes, to orchestrate Docker containers across multiple hosts, running on-premise, on Google Cloud Platform or in other public or private clouds. As part of this collaboration, Red Hat will become core committers to the Kubernetes project. This supports Red Hat’s open hybrid cloud strategy that uses open source to enable application portability across on-premise datacenters, private clouds and public cloud environments.

Both Google and Red Hat recognize the importance of delivering containerized applications that are secure, supported and exhibit a chain of trust. Red Hat's Container Certification program, launched in March 2014, supports this commitment and is designed to help deliver containerized applications that “work as intended” to trusted destinations within the hybrid cloud for software partners and end-customers.

Follow the Red Hat Enterprise Linux Blog to stay informed about Red Hat’s work on technologies required to create, deploy, and manage application containers.

-Contributed by Jim Totton, Vice President and General Manager, Platform Business Unit at Red Hat

Welcome Microsoft, RedHat, IBM, Docker and more to the Kubernetes community

Posted: Thursday, July 10, 2014

Kubernetes is an open source manager for Docker containers, based on Google’s years of experience using containers at Internet scale. Today, Microsoft, RedHat, IBM, Docker, Mesosphere, CoreOS and SaltStack are joining the Kubernetes community and will actively contribute to the project. Each company brings unique strengths, and together we will ensure that Kubernetes is a strong and open container management framework for any application and in any environment - whether in a private, public or hybrid cloud.

Our shared goal is to allow a broad range of developers to take advantage of container technologies. Kubernetes was built from the ground up as a lean, extensible and portable framework for managing Docker workloads. It lets customers manage their applications the way that Google manages hyper-scale applications like Search and Gmail.

Containers offer tremendous advantages for developers. Predictable deployments and simple scalability are possible because Docker packages all of a workload’s dependencies with the application. This allows for ultimate portability; you can avoid vendor lock-in and run containers in the cloud of your choice. It is just as important that the management framework has the same properties of portability and scalability, and that is what the community will bring to Kubernetes.

We look forward to the contributions of the expanded Kubernetes community:

Microsoft is working to ensure that Kubernetes works great in Linux environments in Azure VMs. Scott Guthrie, Executive Vice President of the Cloud and Enterprise group at Microsoft told us, “Microsoft will help contribute code to Kubernetes to enable customers to easily manage containers that can run anywhere. This will make it easier to build multi-cloud solutions including targeting Microsoft Azure.”
Red Hat is working to bring Kubernetes to the open hybrid cloud. Paul Cormier, President, Products and Technologies at Red Hat, told us, “Red Hat has a rich history of contributing to and maturing innovative, open source projects. Through this collaboration with Google on Kubernetes, we are contributing to the evolution of cloud computing and helping deliver the promises that container technologies offer to the open hybrid cloud.”
IBM is contributing code to Kubernetes and the broader Docker ecosystem to ensure that containers are enterprise-grade, and is working with the community to create an open governance model around the project.
Docker is delivering the full container stack that Kubernetes schedules into, and is looking to move critical capabilities upstream and align the Kubernetes framework with Libswarm.
CoreOS is working to ensure that Kubernetes can work seamlessly with the suite of CoreOS technologies that support cloud-native application development on any cloud.
Mesosphere is actively integrating Kubernetes with Mesos, making the advanced scheduling and management capabilities available to Kubernetes customers.
SaltStack is working to make Kubernetes a portable container automation framework that is designed for the reality of the platform-agnostic, multi-cloud world.

You can view the Go source and documentation for Kubernetes on GitHub. We look forward to the contributions of these companies alongside the already vibrant open source community.

- Posted by Urs Hölzle, Senior Vice President

In case you missed it in June: Big news about Containers, Big Data, Mobile, Disks, and more

Posted: Tuesday, July 1, 2014

If you thought May was busy, June probably left your head spinning.

Docker Containers in App Engine and Kubernetes
We started the month at DockerCon, where we took the lid off our efforts to make Containers first class citizens in Google Cloud Platform. Now, you can deploy Docker images in Managed VMs or use our newly-released, open-source container manager Kubernetes to deploy containers across a fleet of VMs. The source and documentation for Kubernetes are hosted on GitHub - and we were excited to see 55,000 views, 917 stars, and 89 forks in the first day alone.

SSD Persistent Disks and global load balancing
The week after DockerCon, we released the SSD persistent disks as well as global HTTP load balancing, an example of how we can enable new networking capabilities through software-defined networking.

Discussing the evolution of computing at Google
That same week, Senior Vice President Urs Hölzle spoke at GigaOm’s Structure conference. You can hear him talk about the evolution of computing at Google and how we are externalizing our technology to offer a great public cloud.

One-Click Rabbit MQ and MongoDB
We also continued to make it easy for you to run open source solutions on Cloud Platform. You can now deploy Rabbit MQ and MongoDB with just one click. We have a new command-line tool in the Google Cloud SDK with great Windows support, improved scripting, and a much more.

Importing data to Google Cloud Storage is faster than ever
And, it’s faster than ever to import data into Google Cloud Platform thanks to Cloud Storage Online Cloud Import.

Cloud Platform documentation is better than ever
Our developer documentation page got a facelift. Check out developers.google.com/cloud to see for yourself.

Google I/O
And, to round up a month of big launches, we gave you a sneak peek at several new products & features at Google I/O. Urs led Cloud Platform’s section of the Google I/O keynote, which you can watch here. Also, we launched a number of new services & products:

Cloud Trace and Cloud Debugger for Diagnosing Systems in Production: Cloud Trace and Cloud Debugger allow you to understand, diagnose, and improve systems in production. If you want to see Cloud Debugger in action, you can check out our demo during the I/O Keynote beginning at the 2 hour mark, or read more about it on our blog.
Analyze Huge Datasets with Cloud Dataflow: We demonstrated Cloud Dataflow, a managed service for creating data pipelines that ingest, transform and analyze data in both batch and streaming modes. You can see us analyze World Cup results using Dataflow. We also released a preview of Cloud Pub/Sub, helping you connect apps and devices that send data into processing pipelines.
Mobile Developers: Save User Data to the Cloud without Backend Code: We introduced a number of new features that make it easier to build mobile applications, including a new version of Google Cloud Save, which gives you a simple API for saving, retrieving, and synchronizing user data to the cloud and across devices without the need for backend code. We also put new tooling in Android Studio that simplifies the process of adding an App Engine backend to your mobile app. You can read more about these new features here.

Rising Star on ABC, Secret, and Coca-Cola
But it wasn’t just product launches that kept us busy. We’re always happy to see people using Cloud Platform to build great applications. This month, Rising Star premiered on ABC, with real-time, interactive voting powered by Cloud Platform. We’ve also been excited by the success of Secret, which is built on Cloud Platform and has experienced 1000-fold growth in just two months - all with only one backend developer. We featured their story during the I/O keynote. And, finally, CI&T launched the Google Cloud Platform-powered Coca-Cola “Happiness Flag,” unveiled on the pitch at the opening match of the World Cup.

Connect with our team in a city near you
If you missed us at DockerCon, GigaOM Structure, or Google I/O there’s still a chance to connect with our engineering team at the North American Cloud Platform Developers roadshow.

-Posted by Benjamin Bechtolsheim, Product Marketing Manager

Cloud Enabling your Mobile App

Posted: Thursday, June 26, 2014

Many mobile apps today suffer from “app-nesia” — the affliction that causes an app to forget who you are. Have you ever re-installed an app only to discover you have to re-create all your carefully crafted preferences? This is typically because the user’s app data lives only on the device.

By connecting your apps to a backend platform, you can solve this issue, but it can be challenging. Whether it’s building basic plumbing, or just trying to load and save data in a network & battery-efficient way, spending time dealing with the backend can take precious time away from building an awesome app. So, we’re introducing two new features to help make your life easier.

Google Cloud Save
Google Cloud Save allows you to easily load and save user data to the cloud without needing to code up the backend. This is handy for situations where you want to save user state and have that state synchronized to multiple devices, or survive an app reinstall.

We handle all the backend logic as well as the synchronization services on the client. The synchronization services work in the background, providing offline support for the data, and minimizing impact on the battery. All you need to do is tell us when and what to save, and you do this with just 4 simple methods:

.save(client, List<Entity>)
.delete(client, Query)
.query(client, Query)
.requestSync(client)

All data is written locally first, then automatically synchronized in the background. The save, delete and query methods provide your basic CRUD operations while the requestSync method allows you to force a synchronization at any time.

On the backend the data is stored in Google Cloud Datastore which means you can access the raw data directly from a Google App Engine or Google Compute Engine instance using the existing Datastore API. Changes on the server will even be automatically synced back to client devices.
Importantly, this per-user data belongs to you, the developer, and stored in your own Google Cloud Datastore database.
Cloud Save (3).png

Google Cloud Save is currently in private beta and will be available for general use soon. If you’re interested in participating in the private beta, you can sign up here!

Cloud Tools for Android Studio
To simplify the process of adding an App Engine backend to your app, Android Studio now provides three App Engine backend module templates which you can add to your app:

App Engine Java Servlet Module - Minimal Backend
App Engine Java Endpoints Module - Basic Endpoint scaffolding
App Engine with Google Cloud Messaging - Push notification wireup

When you choose one of these template types your project is updated with a new Gradle module containing your new App Engine backend. All of the required dependencies/permissions will be automatically set up for you.

Built-in rich editing support for Google Cloud Endpoints
Once you have added the backend module to your Android application, you can use Google Cloud Endpoints to streamline the communication between your backend and your Android app. Cloud Endpoints automatically generates strongly-typed, mobile optimized client libraries from simple Java server-side API annotations, automates Java object marshalling to and from JSON, and provides built-in OAuth 2.0 support.

On deployment, this annotated Endpoints API definition class generates a RESTful API. You can explore this generated API (and even make calls to it) by navigating to Endpoints API explorer as shown in the image below:

To simplify calling this generated API from your Android app, Android Studio will automatically set up your project to include all compile dependencies and permissions required to consume Cloud Endpoints, and will re-generate strongly-typed client libraries if your backend changes. This means that you can start calling the client libraries from your Android app immediately after defining the server-side Endpoints API.

The underlying work-horses: Gradle, and Gradle plug-in for App Engine
Under the hood, Gradle is used to build both your app and your App Engine backend. In fact, when you add an App Engine backend to your Android app, the open-source App Engine plug-in for Gradle is automatically downloaded by Android Studio, and common App Engine tasks become available as Gradle targets. This allows you to use the same build system across your IDE, command-line or continuous integration environments.

Checkout more details on the new Cloud Endpoints features in Android Studio on the Android Developer Blog.

-Posted by Jason Polites, Product Manager

Reimagining developer productivity and data analytics in the cloud - news from Google IO

Posted: Wednesday, June 25, 2014

Today at Google I/O, we are introducing new services that help developers build and optimize data pipelines, create mobile applications, and debug, trace, and monitor their cloud applications in production.

Introducing Google Cloud Dataflow
A decade ago, Google invented MapReduce to process massive datasets using distributed computing. Since then, more devices and information require more capable analytics pipelines — though they are difficult to create and maintain.

Today at Google I/O, we are demonstrating Google Cloud Dataflow for the first time. Cloud Dataflow is a fully managed service for creating data pipelines that ingest, transform and analyze data in both batch and streaming modes. Cloud Dataflow is a successor to MapReduce, and is based on our internal technologies like Flume and MillWheel.

Cloud Dataflow makes it easy for you to get actionable insights from your data while lowering operational costs without the hassles of deploying, maintaining or scaling infrastructure. You can use Cloud Dataflow for use cases like ETL, batch data processing and streaming analytics, and it will automatically optimize, deploy and manage the code and resources required.

Debug, trace and monitor your application in production
We are also introducing several new Cloud Platform tools that let developers understand, diagnose and improve systems in production.

Google Cloud Monitoring is designed to help you find and fix unusual behavior across your application stack. Based on technology from our recent acquisition of Stackdriver, Cloud Monitoring provides rich metrics, dashboards and alerting for Cloud Platform, as well as more than a dozen popular open source apps, including Apache, Nginx, MongoDB, MySQL, Tomcat, IIS, Redis, Elasticsearch and more. For example, you can use Cloud Monitoring to identify and troubleshoot cases where users are experiencing increased error rates connecting from an App Engine module or slow query times from a Cassandra database with minimal configuration.

We know that it can be difficult to isolate the root cause of performance bottlenecks. Cloud Trace helps you visualize and understand time spent by your application for request processing. In addition, you can compare performance between various releases of your application using latency distributions.

Finally, we’re introducing Cloud Debugger, a new tool to help you debug your applications in production with effectively no performance overhead. Cloud Debugger gives you a full stack trace and snapshots of all local variables for any watchpoint that you set in your code while your application continues to run undisturbed in production. This brings modern debugging to cloud-based applications.

New features for mobile development
With rapid autoscaling, caching and other mobile friendly capabilities, many apps like Snapchat or Rising Star have built and run on Cloud Platform. We’re adding new features that make building a mobile app using Cloud Platform even better.

Today, we’re demonstrating a new version of Google Cloud Save, which gives you a simple API for saving, retrieving, and synchronizing user data to the cloud and across devices without needing to code up the backend. Data is stored in Google Cloud Datastore, making the data accessible from Google App Engine or Google Compute Engine using the existing Datastore API. Google Cloud Save is currently in private beta and will be available for general use soon.

We’ve also added tooling to Android Studio, which simplifies the process of adding an App Engine backend to your mobile app. In particular, Android Studio now has three built-in App Engine backend module templates, including Java Servlet, Java Endpoints and an App Engine backend with Google Cloud Messaging. Since this functionality is powered by the open-source App Engine plug-in for Gradle, you can use the same build configuration for both your app and your backend across IDE, CLI and Continuous Integration environments.

We’ll be doing more detailed follow-up posts about these announcements in the coming days, so stay tuned.

-Posted by Greg DeMichillie, Director of Product Management

*Apache, Nginx, MongoDB, MySQL, Tomcat, IIS, Redis, Elasticsearch and Cassandra are trademarks of their respective owners.

One click to deploy a RabbitMQ cluster handling over 1 million msg/sec

Posted: Monday, June 23, 2014

If you develop scalable applications, you often want to use a messaging system, but you may be concerned that setting up such a system is time consuming and its throughput limits are not sufficient for your growing needs. But what if the whole setup required just a few minutes and benchmarks showed that the system was capable of processing over one million messages per second? This is exactly what we are announcing today with Pivotal.

When we talked about high-throughput, low-latency messaging scenarios with our customers, they were particularly interested in RabbitMQ, a popular open source messaging system that serves a variety of use cases.

We recently made it super easy for our users to deploy a dedicated RabbitMQ cluster on Google Compute Engine. All it takes is one click after providing some basic information, such as the size of the cluster and RabbitMQ username / password.

In a matter of a few minutes, you can get a RabbitMQ cluster deployed and configured. You will also get two load balancers set up for your cluster. As a next step you can securely access RabbitMQ web management console and start developing your app that uses RabbitMQ. Even if you need a very large 64-node cluster, the deployment usually takes less than 10 minutes.

In fact, the RabbitMQ cluster is so easy to deploy and configure that we’ve received feedback that for some developers the experience may be similar to having their own "Rabbit service" consisting of multiple nodes.

We also wanted to demonstrate that a RabbitMQ cluster running on Google Compute Engine is a good fit for high-throughput and low-latency scenarios. To do so, we ran a series of benchmarks in various configurations. These test clusters were able to sustain a throughput of over 1 million messages published and consumed per second (a sustained combined ingress/egress of over two million messages per second).

To put this throughput in context, one million messages per second translates to 86 billion messages per day. U.S. text messages reached 6 billion per day in 2012. Apple processes about 40 billion iMessages per day, and WhatsApp recently hit a new daily record in December when it sent 20 billion daily messages.

Google and Pivotal Benchmarks
We then invited engineers from Pivotal, the company behind RabbitMQ, to validate and endorse our results.

For our joint tests on Google Compute Engine we provisioned a cluster of 32 virtual machines with 8 vCPUs and 30GB of RAM each, and deployed RabbitMQ on Debian.

To generate load we used RabbitMQ PerfTest client tool running on a set of GCE virtual machines separate from the ones housing the RabbitMQ cluster nodes. As per RabbitMQ clustering recommendations the clients were configured to target the IP address of a GCE load balancer that was automatically created through click to deploy. This is a more flexible approach than having the clients know the IP addresses or host names of every node in the Rabbit cluster.

After the traffic generating clients warmed up, we reached a steady state as shown in RabbitMQ’s web management console: over 1.3 million messages published and consumed per second.
Screen Shot 2014-04-23 at 2.04.02 PM.png

Screen Shot 2014-04-23 at 2.04.02 PM.png

Under multiple hours of such load, the RabbitMQ cluster and its underlying GCE VMs remained stable, with average throughput staying at approximately the same levels as shown in the screenshot.

Conclusion
We are excited to offer our customers an easy way to deploy RabbitMQ. We are proud that both Google and Pivotal benchmarks validate that Compute Engine is an excellent choice for running RabbitMQ in a public cloud

To deploy your own dedicated RabbitMQ cluster, go to Google Developer Console and navigate to an existing project, or create a new one, and you will have an option to deploy RabbitMQ from the project dashboard:
Screen Shot 2014-06-19 at 10.13.59 AM copy.png

Screen Shot 2014-06-19 at 10.13.59 AM copy.png

You can also find a link at the bottom of the project landing page:
Screen Shot 2014-04-22 at 5.13.16 PM.png

There is no extra charge for using click to deploy - you will be only billed for the underlying Compute Engine resources, so why not deploy your cluster today and tell us what you think.

-Posted by Grzegorz Gogolowicz, Cloud Solutions Architect

P.S. If you don’t want to have your own dedicated RabbitMQ cluster deployed as described in this post, and rather would like to use a managed RabbitMQ, you may want to check out CloudAMQP offering that runs on Google Compute Engine.

CI&T uses Google Cloud Platform to power the Coca-Cola “Happiness Flag” unveiled on the pitch at the opening match of the 2014 FIFA World Cup™

Posted: Saturday, June 14, 2014

Editor’s note: Today's guest post is from Daniel Viveiros, Head of Technology at CI&T, a Google Cloud Platform Partner of the Year LATAM 2013. In this post, Daniel describes how CI&T in partnership with Coca-Cola built the ‘Happiness Flag’ for the Coca-Cola 2014 FIFA World Cup™ campaign in Brazil. To learn more about the Happiness flag visit this website.

As part of the ‘The World’s Cup’ campaign, Coca-Cola wanted to do something that would visually illustrate soccer’s global reach. Coca-Cola invited fans around the world to share their photos to create the Happiness Flag -- the world’s largest mosaic flag crafted from thousands of crowdsourced images submitted by people in more than 200 countries. The flag, 3,015 square meters in size, was unveiled during the opening ceremony of the 2014 FIFA World Cup™.
Coca-Cola Happiness Flag Live.png

A project of this scale calls for high performing and reliable technology, so when we started working with Coca-Cola to build the infrastructure for the Happiness Flag campaign, we knew we had to use Google Cloud Platform. By using Google Cloud Platform, we turned a big, innovative idea into reality on a global scale.

To create the Happiness Flag, we leveraged the whole Google Cloud Platform stack as shown below:

Google App Engine enabled us to handle the computing workload, capable of handling millions of images via Twitter, Facebook, Instagram and email, to the searches for images and view requests. The architecture was scalable to meet this kind of transaction demand and the fluctuations in traffic. We stored all the images in Google Cloud Storage, where integrated edge caching support and image services made it an ideal choice for serving the images. Meanwhile, Google Compute Engine gave us the capability for long-running processes, such as the Twitter integration and advanced image transformations. We were able to show how powerful the creation of hybrid environments can be, using both Platform-as-a-Service (Google App Engine) and raw virtual machines (Google Compute Engine) in the cloud.

We used other out-of-the-box Google Cloud Platform technologies like Memcache, Datastore and Task Queues to ensure outstanding levels of performance and scalability. We know that many fans will be viewing the Happiness Flag on their mobile devices, so we needed a platform that would offer different capacities of computational power. The system provides amazing user experience with high performance and low latency, regardless of the device and its location. Using Google Cloud Platform, the campaign runs smoothly 24/7 and includes redundancy, failover techniques, backups and state-of-the-art monitoring. Plus, it’s affordable.

After the physical flag was unveiled before the opening match, the digital mosaic was made available with a Google map-like zoom in and out with eleven levels of detail. Anyone who submitted an image can now search for themselves on the virtual flag and the search results will show up as pins in the mosaic, like locations found in a Google map. By clicking on the pin, their photos open up in an overlay and they are taken to the maximum level of zoom in to see the "neighborhood" around their image in the flag. After the match, a link to the Happiness Flag site was sent to each participant as a souvenir.

Our goal was to help Coca-Cola create a project that would celebrate the 2014 FIFA World Cup™ by enabling fans from all over the world to express their creativity in a show of unity and art. What better way to open the games than by displaying the Happiness Flag, which is a symbol of the spirit of the game and its fans.

-Posted by Daniel Viveiros, Head of Technology at CI&T

Accelerating breakthroughs in understanding autism with Google Cloud Platform

Posted: Tuesday, June 10, 2014

Today’s guest blog is by Autism Speaks Chief Science Officer Robert Ring. As the world’s largest autism science and advocacy organization, Autism Speaks has committed more than $500 million to its mission, the majority in science and medical research.

An estimated 1 in 68 children in the U.S. is on the autism spectrum. Caused by a combination of genetic and environmental influences, autism is characterized, in varying degrees, by deficits in social communication and interaction, along with the presence of repetitive patterns of behavior, interests or activities. Many individuals with autism also face a lifetime of associated medical conditions (e.g. anxiety, sleep problems, seizures and/or GI symptoms) that frequently contribute to poor outcomes.

With the participation of our amazing autism community, Autism Speaks has worked for 15 years to assembled the largest open-access collections of DNA samples from families affected by autism. Our Autism Genetic Resource Exchange (AGRE) holds the DNA of 12,000 individuals affected by autism and their parents and siblings as well as information on the autism symptoms and autism-related medical conditions of these individuals.

Building on AGRE, Autism Speaks launched the AUT10K program in collaboration with the University of Toronto’s Hospital for Sick Children’s Centre for Applied Genomics. AUT10K has already completed the sequencing of 1,000 cases, and currently has close to 2,000 additional samples nearing completion.

From the beginning, we realized that the amount of data collected by AUT10K would create many challenges. We needed to find a way to store and analyze massive data sets, while allowing remote access to this unprecedented resource for autism researchers around the world.

In the beginning, we shared genomic information by shipping hard drives around the world. Downloading even one individual’s whole genome in a conventional manner can take hours – the equivalent of downloading a hundred feature films. And by the time AUT10K achieves its milestone of 10,000 genomes, we knew we’d have a database on the petabyte scale.

Now, Autism Speaks is using Google Cloud Platform to store its data and enable real-time, collaborative access among researchers around the world. We are in the process of uploading 100 terabytes of data to Google Cloud Storage, and from there, we can import it into Google Genomics. Google Genomics will allow scientists to access the data via the Genomics API, explore it interactively using Google BigQuery, and perform custom analysis using Google Compute Engine.

Researchers will spend less time moving data around and more time analyzing data and collaborating with colleagues. We hope this will enable us to make discoveries and drive innovation faster than ever.

The insight and expertise the Google team has already brought to the table has been unmatched. Our work with them has been a game-changer for AUT10k. Together, we hold the capability of accelerating breakthroughs in understanding the causes and subtypes of autism in ways that can advance diagnosis and treatment as never before.

- Contributed By Robert Ring, Chief Science Officer, Autism Speaks

An update on container support on Google Cloud Platform

Posted: Tuesday, June 10, 2014

Everything at Google, from Search to Gmail, is packaged and run in a Linux container. Each week we launch more than 2 billion container instances across our global data centers, and the power of containers has enabled both more reliable services and higher, more-efficient scalability. Now we’re taking another step toward making those capabilities available to developers everywhere.

Support for Docker images in Google App Engine
Last month we released improved Docker image support in Compute Engine. Today, we’re building on that work and adding a set of extensions that allow App Engine developers to build and deploy Docker images in Managed VMs. Developers can use these extensions to easily access the large and growing library of Docker images, and the Docker community can easily deploy containers into a completely managed environment with access to services such as Cloud Datastore. If you want to try it, sign up via this form.

Kubernetes—an open source container manager
Based on our experience running Linux containers within Google, we know how important it is to be able to efficiently schedule containers at Internet scale. We use Omega within Google, but many developers have more modest needs. To that end, we’re announcing Kubernetes, a lean yet powerful open-source container manager that deploys containers into a fleet of machines, provides health management and replication capabilities, and makes it easy for containers to connect to one another and the outside world. (For the curious, Kubernetes (koo-ber-nay'-tace) is Greek for “helmsman” of a ship.)

Kubernetes was developed from the outset to be an extensible, community-supported project. Take a look at the source and documentation on GitHub and let us know what you think via our mailing list. We’ll continue to build out the feature set, while collaborating with the Docker community to incorporate the best ideas from Kubernetes into Docker.

Container stack improvements
We’ve released an open-source tool called cAdvisor that enables fine-grain statistics on resource usage for containers. It tracks both instantaneous and historical stats for a wide variety of resources, handles nested containers, and supports both LMCTFY and Docker’s libcontainer. It’s written in Go with the hope that we can move some of these tools into libcontainer directly if people find them useful (as we have).

A commitment to open container standards
Finally, I'm happy that I've been nominated to Docker's Governance Committee to continue working with the Docker community toward better open container standards. Containers have been a great building block for Google and by working together we can make them the key building block for “cloud native” applications.

-Posted by Eric Brewer, VP of Infrastructure

See what Cloud Foundry is doing with Google Compute Engine

Posted: Thursday, May 29, 2014

Yesterday, Cloud Foundry demonstrated how you can use Cloud Foundry and the BOSH CPI with Google Compute Engine. Since Compute Engine became Generally Available in December, 2013, we've seen an ever increasing number of open source projects, partners, and other software vendors build in support of our platform.

Cloud Foundry's post covers using BOSH to deploy a Hadoop cluster on Compute Engine and manage it with Cloud Foundry. With Compute Engine's fast and consistent provisioning, Cloud Foundry was able to deploy a working Hadoop cluster in less than 3 minutes! So in a few short minutes, you are able to start your Hadoop processing. When combined with Compute Engine's sub-hour billing and sustained-use discounts, you have multiple options for keeping costs low.

- Posted by Eric Johnson, Program Manager