Google Open Source Blog: database

Lovefield: a powerful Javascript SQL-like database query engine for the web

Posted: Monday, November 17, 2014

Today we are announcing the release of a powerful library to be added to the arsenal of every web developer's toolbox. Since WebSQL standardization efforts ceased in 2010, there has been no cross-browser relational database solution for web clients. Existing persistence solutions such as IndexedDB and LocalStorage fall under the category of object-oriented storage and therefore lack traditional relational database features.

Lovefield is finally closing that gap by providing a feature rich database query engine built using IndexedDB as a backend. It provides an intuitive SQL-like declarative syntax such that developers can pick it up with minimal effort. Its declarative form provides immunity to SQL injection attacks, since there is no query parsing involved. The feature list includes:

select, insert, update, delete queries.
atomicity with intuitive transaction semantics (unlike IndexedDB’s surprising auto-commit behavior).
integrity constraint checks (primary key, unique, nullable/not-nullable).
aggregators (count, min, max, sum, avg, stddev, distinct)
"group by" for select queries.
multi-table join
easier schema upgrade mechanism than IndexedDB.
cross browser support (Chrome, Firefox, IE10).

On the performance front, Lovefield includes a query optimizer which will evaluate different execution plans and finally pick the most promising. We are confident that current performance will satisfy the majority of use cases (less than 50k rows) and we plan to further improve the performance for larger datasets in the near future.

Lovefield’s vision is captured in this specification document and we are working to provide some more exciting features such as foreign keys, cascaded delete/update, self-table join, observers/data-binding, in the near future.

Lovefield is already successfully powering a few Google services, including Google Play Movies Chrome app. With this open source release we are hoping to enable the development of data-rich applications and to attract interest and feedback from developers which will allow us to better understand how to move forward.

By Demetrios Papadopoulos, Chrome team

Cayley: graphs in Go

Posted: Wednesday, June 25, 2014

Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google. It’s been astounding to watch the growth of the Knowledge Graph and how it has improved Google search to delight users every day.

When I moved to New York last year, I saw just how far the concepts of Freebase and its data had spread through Google’s worldwide offices. I began to wonder how the concepts would advance if developers everywhere could work with similar tools. However, there wasn’t a graph available that was fast, free, and easy to get started working with.

With the Freebase data already public and universally accessible, it was time to make it useful, and that meant writing some code as a side project.

So today we are excited to release Cayley, an open source graph database.

Cayley is a spiritual successor to graphd; it shares a similar query strategy for speed. While not an exact replica of it’s predecessor, it brings it’s own features to the table:
• RESTful API
• Multiple (modular) backend stores, such as LevelDB and MongoDB
• Multiple (modular) query languages
• Easy to get started
• Simple to build on top of as a library
and of course
• Open Source

Cayley is written in Go, which was a natural choice. As a backend service that depends upon speed and concurrent access, Go seemed like a good fit. Go did not disappoint; with a fantastic standard library and easy access to open source libraries from the community, the necessary building blocks were already there. Combined with Go’s effective concurrency patterns compared to C, creating a performance-competitive successor to graphd became a reality.

To get a sense of Cayley, check out the I/O Bytes video we created where we “Build A Small Knowledge Graph”. The video includes a quick introduction to graph stores as well as an example of processing Freebase and Schema.org linked data.

You can also check out the demo dataset in a live instance running on Google App Engine. It’s running with the sample dataset in the repository — 30,000 movies and their actors, roles, and directors using Freebase film schema. For a more-than-trivial query, try running the following code, both as a query and as a visualization; what you’ll see is the neighborhood of the given actor and how the actors who co-star with that actor interact with each other:

costar =

g.M().In("/film/performance/actor").In("/film/film/starring")

function getCostars(x) {

return g.V(x).As("source").In("name")

.Follow(costar).FollowR(costar)

.Out("name").As("target")

}

function getActorNeighborhood(primary_actor) {

actors = getCostars(primary_actor).TagArray()

seen = {}

for (a in actors) {

g.Emit(actors[a])

seen[actors[a].target] = true

}

seen[primary_actor] = false

actor_list = []

for (actor in seen) {

if (seen[actor]) {

actor_list.push(actor)

}

getCostars(actor_list).Intersect(g.V(actor_list)).ForEach(function(d)
{

if (d.source < d.target) {

g.Emit(d)

}

})

}

getActorNeighborhood("Humphrey Bogart")

To get involved, check out the project on GitHub and join the mailing list. But most importantly, have fun building your own graphs!

By Barak Michener, Software Engineer, Knowledge NYC

Welcoming MariaDB 10.0.5

Posted: Thursday, November 7, 2013

MariaDB is a community-developed fork of MySQL, a relational database management system for developers looking for a robust, scalable, and reliable SQL server. Its current version is based on MySQL 5.5 and has the capability to provide powerful multi-source replication for data warehouses, to support subqueries that maximize performance, and to make replication more reliable with global transaction IDs.

Today, the MariaDB team is releasing MariaDB 10.0.5, which includes parallel slave replication threads, a feature sponsored by Google. Parallel replication has the ability to remove bottlenecks in replicated configurations, which is crucial as storage speeds increase to keep systems moving quickly.

Internally at Google, we’ve already deployed MariaDB 10.0 to our non-production MySQL instances to help report bugs and work with the MariaDB team to test their fixes. This release takes the MariaDB 10.0 branch from alpha to beta status, where the team will shift focus from stabilization to bug fixes.

Google’s move and support of MariaDB doesn’t affect our Google Cloud Platform’s Cloud SQL offering for developers.

Congratulations and thank you to everyone who has worked hard to get here!

By Ian Gulliver, Site Reliability Manager

LevelDB: A Fast Persistent Key-Value Store

Posted: Wednesday, July 27, 2011

LevelDB is a fast key-value storage engine written at Google that provides an ordered mapping from string keys to string values. We are pleased to announce that we are open sourcing LevelDB under a BSD-style license.

LevelDB is a C++ library that can be used in many contexts. For example, LevelDB may be used by a web browser to store a cache of recently accessed web pages, or by an operating system to store the list of installed packages and package dependencies, or by an application to store user preference settings. We designed LevelDB to also be useful as a building block for higher-level storage systems. Upcoming versions of the Chrome browser include an implementation of the IndexedDB HTML5 API that is built on top of LevelDB. Google's Bigtable manages millions of tablets where the contents of a particular tablet are represented by a precursor to LevelDB. The Riak distributed database has added support for using LevelDB for its per-node storage.

We structured LevelDB to have very few dependencies and it can be easily ported to new systems; it has already been ported to a variety of Unix based systems, Mac OS X, Windows, and Android.

LevelDB has good performance across a wide variety of workloads; we have put together a benchmark comparing its performance to SQLite and Kyoto Cabinet. The Riak team has compared LevelDB’s performance to InnoDB. A significant difference from similar systems like SQLite and Kyoto Cabinet is that LevelDB is optimized for batch updates that modify many keys scattered across a large key space. This is an important requirement for efficiently updating an inverted index that does not fit in memory.

LevelDB is available on Google Code, we hope you’ll find it useful for your projects.

By Jeff Dean and Sanjay Ghemawat; Google Fellows

Announcing Google Refine 2.0, a power tool for data wranglers

Posted: Wednesday, November 10, 2010

Our acquisition of Metaweb back in July also brought along Freebase Gridworks, an open source software project for cleaning and enhancing entire data sets. Today we’re announcing that the project has been renamed to Google Refine and version 2.0 is now available.

Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.

Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:

http://www.youtube.com/watch?v=yNccGtn3Wb0 (7 min)

http://www.youtube.com/watch?v=45EnWK-fE9k (9 min)

http://www.youtube.com/watch?v=m5ER2qRH1OQ (6 min)

The project is open source and its code and downloads are available here. Changes from version 1.1 to 2.0 are listed here.

By David Huynh, Google Search Infrastructure

Acre, an open source platform for building Freebase apps

Posted: Wednesday, August 25, 2010

Freebase is an open, Creative Commons licensed repository of structured data that contains information about 12 million real-world entities including people, places, films, books, events, businesses, and almost any other thing you can imagine. Our graph database has about 400 million facts and connections between entities, and all of it is accessible via our REST API. Freebase was acquired by Google last month, and one thing we knew would happen was that Freebase would become “even more open.”

We first launched Acre, the hosted, server-side JavaScript platform behind Freebase Apps, just over a year ago. Since then it's become more and more important to us and to the Freebase community. Not only are all kinds of individual developers and businesses using Acre to build apps and integrate Freebase data into their own platforms, but we've also recently announced our intention to develop the Freebase.com site on the platform, too.

Until now, Acre development has always been tied to Freebase.com, meaning that you need to develop your Acre apps on our server, using our app editor. But we know that most software developers prefer to use their own native development environments -- their favourite text editor, version control system, and so on -- so lately we've been working on ways to make Acre work with source code that's not stored in Freebase.

Last week we announced that we're releasing the Acre platform as open source software. This means that you can run Acre on your own machine, pulling templates and other files from your local disk and using your own development environment. While Acre still has close ties to Freebase (such as API hooks for easily making Freebase queries), this also means that you'll be able to develop standalone, non-Freebase apps using the platform if you want. And, by running Acre on your own platform, you can avoid the resource limitations that are necessary in a shared environment.

If you're interested in server-side JavaScript platforms, you may also be interested in some of the technical details of Acre.

Acre is based on Rhino, Mozilla's implementation of Javascript in Java. (In fact, "Acre" stands for "A Crash of Rhinos Evaluating.") Acre, by default, uses the Jetty servlet engine as its HTTP server, but can be run in any servlet container.
Acre includes a module system that supports high-latency source retrieval using extensive caching. Although Acre was originally designed to fetch data only from Freebase itself, it can also fetch data from disk and will support a wider range of require() options such as WebDAV.
Acre is capable of running on Google AppEngine, with support for the Keystore and for synchronous and asynchronous HTTP requests. Soon, Freebase's own Acre installation will run on AppEngine.

Please download Acre and try it out, and let us know what you think! You might also like to look at some of our other open source releases, like freebase-python (a Python library for working with the Freebase API) or freebase-suggest (a jQuery plugin that makes it easy to have your users select Freebase topics based on any criteria). For more information about Freebase and our open source efforts, see the Freebase wiki or post to the freebase-discuss mailing list.

By Kirrily Robert, Freebase Team

Google Releases More Patches for MySQL

Posted: Monday, September 8, 2008

By Mark Callaghan, Software Engineering Team

Did you know that Google uses MySQL as part of its Ads system? As you can imagine, we demand a lot from this Open Source code base and so we have spent a fair amount of time enhancing it to work better in our massively scaled environment. In the past, we have published several patches and today we have a few more to offer. We expect several of these features to be merged into a future official MySQL release, and one of them, semi-synchronous replication, is already available as a MySQL feature preview.

All of the features in the patch are described on our project wiki. The features include:

enhancements and bug fixes for features from the previous patch

changes to make InnoDB run faster on multi-core servers

changes to display mutex contention statistics

changes to monitor and rate-limit activity by database account and client IP

We are publishing several patches:

a patch for MySQL 5.0.37 with all of our changes

a patch for MySQL 5.1.26 with the changes for mutex contention statistics

a patch for MySQL 5.0.67 to make InnoDB run faster on multi-core servers

We hope these features we've Open Sourced will be useful to other developers. Check out the code and let us know what you think. We'd love to hear from you and answer any questions you might have in our Google MySQL Tools Discussion Group.

Open Source Blog

Lovefield: a powerful Javascript SQL-like database query engine for the web

Cayley: graphs in Go

Welcoming MariaDB 10.0.5

LevelDB: A Fast Persistent Key-Value Store

Announcing Google Refine 2.0, a power tool for data wranglers

Acre, an open source platform for building Freebase apps

Google Releases More Patches for MySQL

Search This Blog

Labels

Archive

Feed

Company-wide

Products

Developers