Google Research Blog
The latest news from Research at Google
TensorFlow - Google’s latest machine learning system, open sourced for everyone
Monday, November 09, 2015
Posted by Jeff Dean, Senior Google Fellow, and Rajat Monga, Technical Lead
Deep Learning has had a huge impact on computer science, making it possible to explore new frontiers of research and to develop amazingly useful products that millions of people use every day. Our internal deep learning infrastructure
DistBelief
, developed in 2011, has allowed Googlers to build ever larger
neural networks
and scale training to thousands of cores in our datacenters. We’ve used it to demonstrate that
concepts like “cat”
can be learned from unlabeled YouTube images, to improve speech recognition in
the Google app
by 25%, and to build image search
in Google Photos
. DistBelief also trained the Inception model that won Imagenet’s
Large Scale Visual Recognition Challenge in 2014
, and drove our experiments in
automated image captioning
as well as
DeepDream
.
While DistBelief was very successful, it had some limitations. It was narrowly targeted to neural networks, it was difficult to configure, and it was tightly coupled to Google’s internal infrastructure - making it nearly impossible to share research code externally.
Today we’re proud to announce the open source release of
TensorFlow
-- our second-generation machine learning system, specifically designed to correct these shortcomings. TensorFlow is general, flexible, portable, easy-to-use, and completely open source. We added all this while improving upon DistBelief’s speed, scalability, and production readiness -- in fact, on some benchmarks, TensorFlow is twice as fast as DistBelief (see the
whitepaper
for details of TensorFlow’s programming model and implementation).
TensorFlow has extensive built-in support for deep learning, but is far more general than that -- any computation that you can express as a computational flow graph, you can compute with TensorFlow (see some
examples
). Any gradient-based machine learning algorithm will benefit from TensorFlow’s
auto-differentiation
and suite of first-rate optimizers. And it’s easy to express your new ideas in TensorFlow via the flexible Python interface.
Inspecting a model with TensorBoard, the visualization tool
TensorFlow is great for research, but it’s ready for use in real products too. TensorFlow was built from the ground up to be fast, portable, and ready for production service. You can move your idea seamlessly from training on your desktop GPU to running on your mobile phone. And you can get started quickly with powerful machine learning tech by using our state-of-the-art
example model architectures
. For example, we plan to release our complete, top shelf ImageNet computer vision model on TensorFlow soon.
But the most important thing about TensorFlow is that it’s yours. We’ve open-sourced TensorFlow as a standalone library and associated tools, tutorials, and examples with the Apache 2.0 license so you’re free to use TensorFlow at your institution (no matter where you work).
Our deep learning researchers all use TensorFlow in their experiments. Our engineers use it to infuse Google Search with
signals derived from deep neural networks
, and to power the
magic features of tomorrow
. We’ll continue to use TensorFlow to serve machine learning in products, and our research team is committed to sharing TensorFlow implementations of our published ideas. We hope you’ll join us at
www.tensorflow.org
.
DeepDream - a code example for visualizing Neural Networks
Wednesday, July 01, 2015
Posted by Alexander Mordvintsev, Software Engineer, Christopher Olah, Software Engineering Intern and Mike Tyka, Software Engineer
Two weeks ago
we blogged about a visualization tool
designed to help us understand how neural networks work and what each layer has learned. In addition to gaining some insight on how these networks carry out classification tasks, we found that this process also generated some beautiful art.
Top: Input image. Bottom: output image made using a network trained on places by
MIT Computer Science and AI Laboratory
.
We have seen a lot of interest and received some great questions, from programmers and artists alike, about the details of how these visualizations are made. We have decided to open source the code we used to generate these images in an
IPython notebook
, so now you can make neural network inspired images yourself!
The code is based on
Caffe
and uses available open source packages, and is designed to have as few dependencies as possible. To get started, you will need the following (full details in the notebook):
NumPy
,
SciPy
,
PIL
,
IPython
, or a scientific python distribution such as
Anaconda
or
Canopy
.
Caffe
deep learning framework (
Installation instructions
)
Once you’re set up, you can supply an image and choose which layers in the network to enhance, how many iterations to apply and how far to zoom in. Alternatively, different pre-trained networks can be plugged in.
It'll be interesting to see what imagery people are able to generate. If you post images to Google+, Facebook, or Twitter, be sure to tag them with #deepdream so other researchers can check them out too.
Learning Statistics with Privacy, aided by the Flip of a Coin
Thursday, October 30, 2014
Posted by Úlfar Erlingsson, Tech Lead Manager, Security Research
(Cross-posted on the
Chromium Blog
and the
Google Online Security Blog
)
At Google, we are constantly trying to improve the techniques we use to
protect our users' security and privacy
. One such project, RAPPOR (Randomized Aggregatable Privacy-Preserving Ordinal Response), provides a new state-of-the-art, privacy-preserving way to learn software statistics that we can use to better safeguard our users’ security, find bugs, and improve the overall user experience.
Building on the concept of
randomized response
, RAPPOR enables learning statistics about the behavior of users’ software while guaranteeing client privacy. The guarantees of
differential privacy
, which are widely accepted as being the
strongest form of privacy
, have almost never been used in practice despite
intense research in academia
. RAPPOR introduces a practical method to achieve those guarantees.
To understand RAPPOR, consider the following example. Let’s say you wanted to count how many of your online friends were dogs, while respecting the maxim that,
on the Internet, nobody should know you’re a dog
. To do this, you could ask each friend to answer the question “Are you a dog?” in the following way. Each friend should flip a coin in secret, and answer the question truthfully if the coin came up heads; but, if the coin came up tails, that friend should always say “Yes” regardless. Then you could get a good estimate of the true count from the greater-than-half fraction of your friends that answered “Yes”. However, you still wouldn’t know which of your friends was a dog: each answer “Yes” would most likely be due to that friend’s coin flip coming up tails.
RAPPOR builds on the above concept, allowing software to send reports that are effectively indistinguishable from the results of random coin flips and are free of any unique identifiers. However, by aggregating the reports we can learn the common statistics that are shared by many users. We’re currently testing the use of RAPPOR in Chrome, to learn statistics about how
unwanted software
is
hijacking
users’ settings.
We believe that RAPPOR has the potential to be applied for a number of different purposes, so we're making it freely available for all to use. We'll continue development of RAPPOR as a standalone
open-source project
so that anybody can inspect and test its reporting and analysis mechanisms, and help develop the technology. We’ve written up the technical details of RAPPOR in a
report
that will be published next week at the
ACM Conference on Computer and Communications Security
.
We’re encouraged by the
feedback
we’ve received so far from academics and other stakeholders, and we’re looking forward to additional comments from the community. We hope that everybody interested in preserving user privacy will review the technology and share their feedback at
rappor-discuss@googlegroups.com
Sudoku, Linear Optimization, and the Ten Cent Diet
Tuesday, September 30, 2014
Posted by Jon Orwant, Engineering Manager
(
cross-posted on the
Google Apps Developer blog
, and the
Google Developers blog
)
In 1945, future Nobel laureate
George Stigler
wrote an essay in the Journal of Farm Economics titled
The Cost of Subsistence
about a seemingly simple problem: how could a soldier be fed for as little money as possible?
The “Stigler Diet” became a classic problem in the then-new field of
linear optimization
, which is used today in many areas of science and engineering. Any time you have a set of linear constraints such as “at least 50 square meters of solar panels” or “the amount of paint should equal the amount of primer” along with a linear goal (e.g., “minimize cost” or “maximize customers served”), that’s a linear optimization problem.
At Google, our engineers work on plenty of optimization problems. One example is our
YouTube video stabilization system
, which uses linear optimization to eliminate the shakiness of handheld cameras. A more lighthearted example is in the
Google Docs Sudoku add-on
, which instantaneously generates and solves Sudoku puzzles inside a Google Sheet, using the
SCIP
mixed integer programming solver to compute the solution.
Today we’re proud to announce two new ways for everyone to solve linear optimization problems. First, you can now solve linear optimization problems in Google Sheets with the
Linear Optimization add-on
written by Google Software Engineer Mihai Amarandei-Stavila. The add-on uses Google Apps Script to send optimization problems to Google servers. The solutions are displayed inside the spreadsheet. For developers who want to create their own applications on top of Google Apps, we also provide an
API
to let you call our linear solver directly.
Second, we’re open-sourcing the linear solver underlying the add-on: Glop (the Google Linear Optimization Package), created by
Bruno de Backer
with other members of the Google Optimization team. It’s available as part of the
or-tools suite
and we provide a
few examples
to get you started. On that page, you’ll find the Glop solution to the Stigler diet problem. (A Google Sheets file that uses Glop and the Linear Optimization add-on to solve the Stigler diet problem is available
here
. You’ll need to
install the add-on first
.)
Stigler posed his problem as follows: given nine nutrients (calories, protein, Vitamin C, and so on) and 77 candidate foods, find the foods that could sustain soldiers at minimum cost.
The
Simplex algorithm
for linear optimization was two years away from being invented, so Stigler had to do his best, arriving at a diet that cost $39.93 per year (in 1939 dollars), or just over ten cents per day. Even that wasn’t the cheapest diet. In 1947, Jack Laderman used Simplex, nine calculator-wielding clerks, and 120 person-days to arrive at the optimal solution.
Glop’s Simplex implementation solves the problem in 300 milliseconds. Unfortunately, Stigler didn’t include taste as a constraint, and so the poor hypothetical soldiers will eat nothing but the following, ever:
Enriched wheat flour
Liver
Cabbage
Spinach
Navy beans
Is it possible to create an appealing dish out of these five ingredients? Google Chef Anthony Marco took it as a challenge, and we’re calling the result
Foie Linéaire à la Stigler
:
This optimal meal consists of seared calf liver dredged in flour, atop a navy bean purée with marinated cabbage and a spinach pesto.
Chef Marco reported that the most difficult constraint was making the dish tasty without butter or cream. That said, I had the opportunity to taste our linear optimization solution, and it was delicious.
Collaborative Mathematics with SageMathCloud and Google Cloud Platform
Monday, September 29, 2014
Posted by Craig Citro, Software Engineer
(
cross-posted on the
Google for Education blog
and
Google Cloud Platform blog
)
Modern mathematics research is distinguished by its openness. The notion of "mathematical truth" depends on theorems being published with proof, letting the reader understand how new results build on the old, all the way down to basic mathematical axioms and definitions. These new results become tools to aid further progress.
Nowadays, many of these tools come either in the form of software or theorems whose proofs are supported by software. If new tools produce unexpected results, researchers must be able to collaborate and investigate how those results came about. Trusting software tools means being able to inspect and modify their source code. Moreover, open source tools can be modified and extended when research veers in new directions.
In an attempt to create an open source tool to satisfy these requirements, University of Washington Professor
William Stein
built
SageMathCloud
(or SMC). SMC is a robust, low-latency web application for collaboratively editing mathematical documents and code. This makes SMC a viable platform for mathematics research, as well as a powerful tool for teaching any mathematically-oriented course. SMC is built on top of standard open-source tools, including
Python
,
LaTeX
, and
R
. In 2013, William received a 2013 Google Research Award which provided
Google Cloud Platform
credits for SMC development. This allowed William to extend SMC to use
Google Compute Engine
as a hosting platform, achieving better scalability and global availability.
SMC allows users to interactively explore 3D graphics with only a browser
SMC has its roots in 2005, when William started the
Sage
project in an attempt to create a viable free and open source alternative to existing closed-source mathematical software. Rather than starting from scratch, Sage was built by making the best existing open-source mathematical software work together transparently and filling in any gaps in functionality.
During the first few years, Sage grew to have about 75K active users, while the developer community matured with well over 100 contributors to each new Sage release and about 500 developers contributing
peer-reviewed code
.
Inspired by Google Docs, William and his students built the first web-based interface to Sage in 2006, called
The Sage Notebook
. However, The Sage Notebook was designed for a small number of users and would work for a small group (such as a single class), but soon became difficult to maintain for larger groups, let alone the whole web.
As the growth of new users for Sage began to stall in 2010, due largely to installation complexity, William turned his attention to finding ways to expand Sage's availability to a broader audience. Based on his experience teaching his own courses with Sage, and feedback from others doing the same, William began building a new Web-hosted version of Sage that can scale to the next generation of users.
The result is
SageMathCloud
, a highly distributed multi-datacenter application that creates a viable way to do computational mathematics collaboratively online. SMC uses a wide variety of open source tools, from languages (
CoffeeScript
,
node.js
, and
Python
) to infrastructure-level components (especially
Cassandra
,
ZFS
, and
bup
) and a number of in-browser toolkits (such as
CodeMirror
and
three.js
).
Latency is critical for collaborative tools: like an online video game, everything in SMC is interactive. The initial versions of SMC were hosted at UW, at which point the distance between Seattle and far away continents was a significant issue, even for the fastest networks. The global coverage of Google Cloud Platform provides a low-latency connection to SMC users around the world that is both fast and stable. It's not uncommon for long-running research computations to last days, or even weeks -- and here the robustness of Google Compute Engine, with machines live-migrating during maintenance, is crucial. Without it, researchers would often face multiple restarts and delays, or would invest in engineering around the problem, taking time away from the core research.
SMC sees use across a number of areas, especially:
Teaching:
any course with a programming or math software component, where you want all your students to be able to use that component without dealing with the installation pain. Also, SMC allows students to easily share files, and even work together in realtime. There are
dozens of courses
using SMC right now.
Collaborative Research:
all co-authors of a paper can work together in an SMC project, both writing the paper there and doing research-level computations.
Since launching SMC in May 2013, there are already more than 20,000 monthly active users who've started using Sage via SMC. We look forward to seeing if SMC has an impact on the number of active users of Sage, and are excited to learn about the collaborative research and teaching that it makes possible.
Course Builder now supports the Learning Tools Interoperability (LTI) Specification
Thursday, September 11, 2014
Posted by John Cox, Software Engineer
Since the release of
Course Builder
two years ago, it has been used by individuals, companies, and universities worldwide to create and deliver online courses on a variety of subjects, helping to show the potential for making education more accessible through open source technology.
Today, we’re excited to announce that Course Builder now supports the
Learning Tools Interoperability
(LTI) specification. Course Builder can now interoperate with other LTI-compliant systems and online learning platforms, allowing users to interact with high-quality educational content no matter where it lives. This is an important step toward our goal of making educational content available to everyone.
If you have LTI-compliant software and would like to serve its content inside Course Builder, you can do so by using Course Builder as an LTI consumer. If you want to serve Course Builder content inside another LTI-compliant system, you can use Course Builder as an LTI provider. You can use either of these features, both, or none—the choice is entirely up to you.
The Course Builder LTI extension module,
now available on Github
, supports LTI version 1.0, and its LTI provider is certified by
IMS Global
, the nonprofit member organization that created the LTI specification. Like Course Builder itself, this module is open source and available under the Apache 2.0 license.
As part of our continued commitment to online education, we are also happy to announce we have become an affiliate member of IMS Global. IMS Global shares our desire to provide education online at scale, and we look forward to working with the IMS community on LTI and other online education technologies.
Doing Data Science with coLaboratory
Friday, August 08, 2014
Posted by Kayur Patel, Kester Tong, Mark Sandler, and Corinna Cortes, Google Research
Building products and making decisions based on data is at the core of what we do at Google. Increasingly common among fields such as journalism and government, this data-driven mindset is changing the way traditionally non-technical organizations do work. In order to bring this approach to even more fields, Google Research is excited to be a partner in the
coLaboratory project
, a new tool for data science and analysis, designed to make collaborating on data easier.
Created by Google Research,
Matthew Turk
(creator of the
yt
visualization package), and the
IPython
/
Jupyter
development team, coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over ad-hoc workflows involving emailing documents back and forth.
Setting up an environment for collaborative data analysis can be a hurdle, as requirements vary among different machines and operating systems, and installation errors can be cryptic. The
coLaboratory Chrome App
addresses this hurdle. One-click installs coLaboratory, IPython, and a large set of popular scientific python libraries (with more on the way). Furthermore, because we use
Portable Native Client (PNaCl)
, coLaboratory runs at native speeds and is secure, allowing new users to start working with IPython faster than ever.
In addition to ease of installation, coLaboratory enables collaboration between people with different skill sets. One example of this would be interactions between programmers who write complex logic in code and non-programmers who are more familiar with GUIs. As shown below, a programmer writes code (step 1) and then annotates that code with simple markup to create an interactive form (step 2). The programmer can then hide the complexity of code to show only the form (step 3), which allows a non-programmer to re-run the code by changing the slider and dropdowns in the form (step 4). This interaction allows programmers to write complex logic in code and allows non-programmers to manipulate that logic through simple GUI hooks.
For more information about this project please see our talks on
collaborative data science
and
zero dependency python
. In addition to our external partners in the coLaboratory project, we would like to thank everyone at Google who contributed: the Chromium Native Client team, the Google Drive team, the Open Source team, and the Security team.
Facilitating Genomics Research with Google Cloud Platform
Wednesday, July 30, 2014
Posted by Paul C. Boutros, Ontario Institute for Cancer Research, Josh Stuart, UC Santa Cruz, Adam Margolin, Oregon Health & Science University; Nicole Deflaux and Jonathan Bingham, Google Cloud Platform and Google Genomics
The understanding of the origin and progression of cancer remains in its infancy. However, due to rapid advances in the ability to accurately read and identify (i.e. sequence) the DNA of cancerous cells, the knowledge in this field is growing rapidly. Several
comprehensive
sequencing
studies
have shown that alterations of single base pairs within the DNA, known as
Single Nucleotide Variants
(SNVs), or duplications, deletions and rearrangements of larger segments of the genome, known as
Structural Variations
(SVs), are the
primary causes of cancer
and can influence what drugs will be effective against an individual tumor.
However, one of the major roadblocks hampering progress is the availability of accurate methods for interpreting genome sequence data. Due to the sheer volume of genomics data (the entire genome of just one person produces more than 100 gigabytes of raw data!), the ability to precisely localize a genomic alteration (SNV or SV) and resolve its association with cancer remains a considerable research challenge. Furthermore, preliminary benchmark studies conducted by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) have discovered that different mutation calling software run on the same data can result in detection of different sets of mutations. Clearly, optimization and standardization of mutation detection methods is a prerequisite for realizing personalized medicine applications based on a patient’s own genome.
The ICGC and TCGA are working to address this issue through an open community-based collaborative competition, run in conjunction with leading research institutions: the
Ontario Institute for Cancer Research
,
University of California Santa Cruz
,
Sage Bionetworks
,
IBM-DREAM
, and
Oregon Health and Sciences University
. Together, they are running the
DREAM Somatic Mutation Calling Challenge
, in which researchers from across the world “compete” to find the most accurate SNV and SV detection algorithms. By creating a living benchmark for mutation detection, the DREAM Challenge aims to improve standard methods for identifying cancer-associated mutations and rearrangements in tumor and normal samples from
whole-genome sequencing
data.
Given Google’s recent partnership with the
Global Alliance for Genomics and Health
, we are excited to provide cloud computing resources on Google Cloud Platform for competitors in the DREAM Challenge, enabling scientists who do not have ready access to large local computer clusters to participate with open access to contest data as well as credits that can be used for Google Compute Engine virtual machines. By leveraging the power of cloud technologies for genomics computing, contestants have access to powerful computational resources and a platform that allows the sharing of data. We hope to democratize research, foster the open access of data, and spur collaboration.
In addition to the core Google Cloud Platform infrastructure, the Google Genomics team has implemented a
simple web-based API
to store, process, explore, and share genomic data at scale. We have made the Challenge datasets available through the Google Genomics API. The challenge includes both simulated tumor data for which the correct answers are known and real tumor data for which the correct answers are not known.
Genomics API Browser
showing a particular cancer variant position (highlighted) in dataset
in silico #1
that was missed by many challenge participants.
Although submissions for the simulated data can be scored immediately, the winners on the real tumor data will not immediately be known when the challenge closes. This is a consequence of the fact that current DNA sequencing technology does not provide 100% accurate data, which adds to the complexity of the problem these algorithms are attempting to tackle. Therefore, to identify the winners, researchers must turn to alternative laboratory technologies to verify if a particular mutation that was found in sequencing data is actually (or likely) to be true. As such, additional data will be collected after the Challenge is complete in order to determine the winner. The organizers will re-sequence DNA from the cells of the real tumor using an independent sequencing technology (Ion Torrent), specifically examining regions overlapping the positions of the cancer mutations submitted by the contest participants.
As an analogy, a "scratched magnifying glass" is used to examine the genome the first time around. The second time around, a "stronger magnifying glass with scratches in different places" is used to look at the specific locations in the genome reported by the challenge participants. By combining the data collected by those two different "magnifying glasses", and then comparing that against the cancer mutations submitted by the contest participants, the winner will then be determined.
We believe we are at the beginning of a transformation in medicine and basic research, driven by advances in genome sequencing and computing at scale. With the DREAM Challenge, we are all excited to be part of bringing researchers around the world to focus on this particular cancer research problem. To learn more about how to participate in the challenge
register here
.
Opening up Course Builder data
Wednesday, October 09, 2013
Posted by John Cox and Pavel Simakov, Course Builder Team, Google Research
Course Builder
is an experimental, open source platform for delivering massive online open courses. When you run Course Builder, you own everything from the production instance to the student data that builds up while your course is running.
Part of being open is making it easy for you to access and work with your data. Earlier this year we shipped a tool called ETL (short for extract-transform-load) that you can use to pull your data out of Course Builder, run arbitrary computations on it, and load it back. We
wrote a post
that goes into detail on how you can use ETL to get copies of your data in an open, easy-to-read format, as well as write custom jobs for processing that data offline.
Now we’ve taken the next step and added richer data processing tools to ETL. With them, you can
build data processing pipelines
that analyze large datasets with MapReduce. Inside Google we’ve used these tools to
learn from the courses we’ve run
. We provide example pipelines ranging from the simple to the complex, along with formatters to convert your data into open formats (CSV, JSON, plain text, and XML) that play nice with third-party data analysis tools.
We hope that adding robust data processing features to Course Builder will not only provide direct utility to organizations that need to process data to meet their internal business goals, but also make it easier for educators and researchers to gauge the efficacy of the massive online open courses run on the Course Builder platform.
Applauding the White House Memorandum on Open Access
Monday, February 25, 2013
Posted by Alfred Spector, Vice President of Research and Special Initiatives
Last week the Obama Administration issued a
Memorandum
that could vastly increase the impact of federally funded research on innovation and the economy. Entrepreneurs, businesses, students, patients, researchers, and the public will soon have digital access to the wealth of research publications and data funded by Federal agencies. We're excited that this important work will be made more broadly accessible.
This memorandum directs federal agencies with annual research and development budgets of $100 million or more to open up access to the crucial results of publicly funded research (including both unclassified articles and data). These agencies will need to provide the public with free and unlimited online access to the results of that research after a guideline 12 month embargo period. Before last week only one agency, the National Institutes of Health, had a public research access policy.
The federal government funds tens of billions of dollars in research each year through agencies like the National Science Foundation, National Institutes of Health, and the Department of Energy. These investments are intended to advance science, accelerate innovation, grow our economy, and improve the lives of all Americans and members of the public. Opening this research up to the public will accelerate these goals.
Federal investment in research and development only pays off if it has an impact. Researchers, businesses, policymakers, entrepreneurs, and the public need to be able to access and use the knowledge contained in the articles and data generated by those funds. Making the results of scholarly research accessible and reusable in digital form is one important way to increase the impact of existing taxpayer investments.
ReFr: A New Open-Source Framework for Building Reranking Models
Thursday, October 04, 2012
Posted by
Dan Bikel
and
Keith Hall
, Research Scientists at Google
We are pleased to announce the release of an open source, general-purpose framework designed for reranking problems, ReFr (Reranker Framework), now available at:
http://code.google.com/p/refr/
.
Many types of systems capable of processing speech and human language text produce multiple hypothesized outputs for a given input, each with a score. In the case of machine translation systems, these hypotheses correspond to possible translations from some sentence in a source language to a target language. In the case of speech recognition, the hypotheses are possible word sequences of what was said derived from the input audio. The goal of such systems is usually to produce a single output for a given input, and so they almost always just pick the highest-scoring hypothesis.
A
reranker
is a system that uses a trained model to rerank these scored hypotheses, possibly inducing a different ranked order. The goal is that by employing a second model after the fact, one can make use of additional information not available to the original model, and produce better overall results. This approach has been shown to be useful for a wide variety of speech and natural language processing problems, and was the
subject of one of the groups
at the 2011 summer workshop at Johns Hopkins’ Center for Language and Speech Processing. At that workshop, led by Professor Brian Roark of Oregon Health & Science University, we began building a general-purpose framework for training and using reranking models. The result of all this work is
ReFr
.
From the outset, we designed ReFr with both speed and flexibility in mind. The core implementation is entirely in C++, with a flexible architecture allowing rich experimentation with both features and learning methods. The framework also employs a powerful runtime configuration mechanism to make experimentation even easier. Finally, ReFr leverages the parallel processing power of Hadoop to train and use large-scale reranking models in a distributed computing environment.
Keeping an “OER mind” about shared resources for education
Monday, March 05, 2012
Posted by Maggie Johnson, Director of Education and University Relations
With ever-increasing demands being placed on our education system, including new skill sets that need to be taught to create a pipeline that can fill 21st century jobs, we must figure out how to make high-quality education more accessible to more people without overburdening our existing educational institutions. The Internet, and the platforms, tools and programs it enables, will surely be a part of the answer to this challenge.
Open Educational Resources (OER) are one piece of the solution. OER are teaching and learning resources that anyone can share, reuse and remix. As part of our ongoing commitment to increasing access to a cost-effective, high-quality education, we’re supporting the
OpenCourseWare Consortium
— a collaboration of higher education institutions and associated organizations from around the world creating OER — in organizing
Open Education Week 2012
, which begins today.
An example of OER in action is
OpenStax
, a recent non-profit initiative of Rice University and
Connexions
to offer students free, professional quality textbooks that meet scope and sequence requirements for several courses. They
believe
that these books could save students over $90 million in the next five years. Non-profit isn’t the only model for open education.
Flat World Knowledge
has built a business around OER by providing free online access to open textbooks, then selling print-on-demand copies and supplemental materials.
We’ll be acknowledging OER week through a panel event in Washington, DC, and over on our
+Google in Education page
, where we’ll be posting articles, sharing stories and interviews about the benefits of open education resources. Opening these resources to everyone can improve the quality of education while getting more out of our investments in educational resources. We hope you’ll join us in celebrating Open Education Week. Go to
openeducationweek.org
to learn more and get involved.
Data and code open sourced from Google's Renewable Energy Cheaper than Coal project
Monday, January 30, 2012
Posted by Ross Koningstein, Engineer, Google RE<C team
Cross-posted
with the Open Source at Google Blog
Google’s
RE<C
renewable energy research project has recently open sourced a new tool and a significant amount of data to support future CSP (concentrating solar power)
heliostat
development.
HOpS Open Source Site
HOpS
,
h
eliostat
op
tical
s
imulation, is an open source software tool for accurately and efficiently performing optical simulations of fields of heliostats, the actuated mirror assemblies that direct sunlight onto a target in CSP applications.
Google used this tool to help
evaluate heliostat field layouts
and calculate heat input into a CSP receiver for power production. HOpS works by passing "packets" of light between optical elements (the sun, heliostats, and elements of the target surface), tracking shadowing and blocking masks along the way. For our analysis goals, this approach gave our researchers more flexibility and accuracy than analytic tools (such as DELSOL or HFLCAL), and it was easier to set up for thousands of runs than using ray tracers. Output from the simulation includes heliostat efficiency, target irradiance, and more, while an included shell script facilitates plotting heat maps of the output data using gnuplot.
REC-CSP Open Source Site
The
REC_CSP
open source project contains data sets and software useful for designing cheaper heliostats.
Available on the project site are:
1. Thirty days of three-dimensional wind measurement data taken with ultrasonic anemometers (sampled at ~7 Hz), recorded at several near surface elevations. The data is presented in the
RE<C wind data collection document
and is available for download on the open source site
here
.
2. A collection of
heliostat aerodynamic load data
obtained in a NASA wind tunnel and graphically represented in the
appendix
. This data is available for download on the open source site
here
.
3. Matlab software for high-precision, on-target heliostat control with built-in simulation for testing. This is essentially the same software used in the
RE<C heliostat control demonstrations
and described in the
accelerometer sensing
and
control system design
documents. The source code is available for download
here
.
Video: Demonstrating single and multiple heliostat control
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
Android
API
App Engine
App Inventor
April Fools
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
China
Chrome
Cloud Computing
Collaboration
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
data science
datasets
Deep Learning
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Exacycle
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Genomics
Gmail
Google Books
Google Drive
Google Science Fair
Google Sheets
Google Translate
Google Voice Search
Google+
Government
grants
HCI
Health
High Dynamic Range Imaging
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Machine Hearing
Machine Intelligence
Machine Learning
Machine Translation
MapReduce
market algorithms
Market Research
ML
MOOC
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Ngram
NIPS
NLP
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PiLab
Policy
Professional Development
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Search
search ads
Security and Privacy
SIGCOMM
SIGMOD
Site Reliability Engineering
Software
Speech
Speech Recognition
statistics
Structured Data
Systems
TensorFlow
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Follow @googleresearch
Give us feedback in our
Product Forums
.