Blog
The latest news from Google AI
Off-Policy Classification - A New Reinforcement Learning Model Selection Method
Wednesday, June 19, 2019
Posted by Alex Irpan, Software Engineer, Robotics at Google
Reinforcement learning
(RL) is a framework that lets agents learn decision making from experience. One of the many variants of RL is
off-policy RL
, where an agent is trained using a combination of data collected by other agents (off-policy data) and data it collects itself to learn generalizable skills like robotic
walking
and
grasping
. In contrast,
fully off-policy RL
is a variant in which an agent learns
entirely
from older data, which is appealing because it enables model iteration without requiring a physical robot. With fully off-policy RL, one can train several models on the same fixed dataset collected by previous agents, then select the best one. However, fully off-policy RL comes with a catch: while
training
can occur without a real robot,
evaluation
of the models cannot. Furthermore, ground-truth evaluation with a physical robot is too inefficient to test promising approaches that require evaluating a large number of models, such as automated architecture search with
AutoML
.
This challenge motivates o
ff-policy evaluation
(OPE), techniques for studying the quality of new agents using data from other agents. With rankings from OPE, we can selectively test only the most promising models on real-world robots, significantly scaling experimentation with the same fixed real robot budget.
A diagram for real-world model development. Assuming we can evaluate 10 models per day, without off-policy evaluation, we would need 100x as many days to evaluate our models.
Though the OPE framework shows promise, it assumes one has an off-policy evaluation method that accurately ranks performance from old data. However, agents that collected past experience may act very differently from newer learned agents, which makes it hard to get good estimates of performance.
In “
Off-Policy Evaluation via Off-Policy Classification
”, we propose a new off-policy evaluation method, called
off-policy classification
(OPC), that evaluates the performance of agents from past data by treating evaluation as a classification problem, in which actions are labeled as either potentially leading to success or guaranteed to result in failure. Our method works for image (camera) inputs, and doesn’t require reweighting data with
importance sampling
or using accurate models of the target environment, two approaches commonly used in prior work. We show that OPC scales to larger tasks, including a vision-based robotic grasping task in the real world.
How OPC Works
OPC relies on two assumptions: 1) that the final task has deterministic dynamics, i.e. no randomness is involved in how states change, and 2) that the agent either succeeds or fails at the end of each trial. This second “success or failure” assumption is natural for many tasks, such as picking up an object, solving a maze, winning a game, and so on. Because each trial will either succeed or fail in a deterministic way, we can assign binary classification labels to each action. We say an action is
effective
if it could lead to success, and is
catastrophic
if it is guaranteed to lead to failure.
OPC utilizes a
Q-function
, learned with a
Q-learning
algorithm, that estimates the future total reward if the agent chooses to take some action from its current state. The agent will then choose the action with the largest total reward estimate. In our paper, we prove that the performance of an agent is measured by how often its chosen action is an effective action, which depends on how well the Q-function correctly classifies actions as effective vs. catastrophic. This classification accuracy acts as an off-policy evaluation score.
However, the labeling of data from previous trials is only partial. For example, if a previous trial was a failure, we do not get negative labels because we do not know which action was the catastrophic one. To overcome this, we leverage techniques from semi-supervised learning,
positive-unlabeled learning
in particular, to get an estimate of classification accuracy from partially labeled data. This accuracy is the OPC score.
Off-Policy Evaluation for Sim-to-Real Learning
In robotics, it’s common to use
simulated data
and
transfer learning techniques
to reduce the
sample complexity
of learning robotics skills. This can be very useful, but tuning these sim-to-real techniques for real-world robotics is challenging. Much like off-policy RL, training doesn’t use the real robot, because it is trained in simulation, but evaluation of that policy still needs to use a real robot. Here, off-policy evaluation can come to the rescue again—we can take a policy trained only in simulation, then evaluate it using previous real-world data to measure its transfer to the real robot. We examine OPC across both fully off-policy RL and sim-to-real RL.
An example of how simulated experience can differ from real-world experience. Here, simulated images (
left
) have much less visual complexity than real-world images (
right
).
Results
First, we set up a simulated version of our robot grasping task, where we could easily train and evaluate several models to benchmark off-policy evaluation. These models were trained with fully off-policy RL, then evaluated with off-policy evaluation. We found that in our robotics tasks, a variant of the OPC called the SoftOPC performed best at predicting final success rate.
An experiment in the simulated grasping task. The red curve is the dimensionless SoftOPC score over the course of training, evaluated from old data. The blue curve is the grasp success rate in simulation. We see the SoftOPC on old data correlates well with grasp success of the model within our simulator.
After success in sim, we then tried SoftOPC in the real-world task. We took 15 models, trained to have varying degrees of robustness to the gap between simulation and reality. Of these models, 7 of them were trained purely in simulation, and the rest were trained on mixes of simulated and real-world data. For each model, we evaluated the SoftOPC on off-policy real-world data, then the real-world grasp success, to see how well SoftOPC predicted performance of that model. We found that on real data, the SoftOPC does produce scores that correlate with true grasp success, letting us rank sim-to-real techniques using past real experience.
SoftOPC score and true performance for 3 different sim-to-real methods: a baseline simulation, a simulation with
random textures and lighting
, and a model trained with
RCAN
. All three models are trained with no real data, then evaluated with off-policy evaluation on a validation set of real data. The ordering of the SoftOPC score matches the order of real grasp success.
Below is a scatterplot of the full results from all 15 models. Each point represents the off-policy evaluation score and real-world grasp success of each model. We compare different scoring functions by their correlation to final grasp success. The SoftOPC does not correlate perfectly with true grasp success, but its scores are significantly more reliable than baseline approaches like the
temporal-difference error
(the standard Q-learning loss).
Results from our sim-to-real evaluation experiment. On the left is a baseline, the
temporal difference
error of the model. On the right is one of our proposed methods, the SoftOPC. The shaded region is a 95% confidence interval. The correlation is significantly better with SoftOPC.
Future Work
One promising direction for future work is to see if we can relax our assumptions about the task, to support tasks where dynamics are more noisy, or where we get partial credit for almost succeeding. However, even with our included assumptions, we think the results are promising enough to be applied to many real-world RL problems.
Acknowledgements
This research was conducted by Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz and Sergey Levine. We’d like to thank Razvan Pascanu, Dale Schuurmans, George Tucker and Paul Wohlhart for valuable discussions. A preprint is available on
arXiv
.
Google at CVPR 2019
Monday, June 17, 2019
Posted by
Andrew Helton, Editor, Google AI Communications
This week, Long Beach, CA hosts the
2019 Conference on Computer Vision and Pattern Recognition
(CVPR 2019), the premier annual computer vision event comprising the main conference and several co-located
workshops
and
tutorials
. As a leader in computer vision research and a Platinum Sponsor, Google will have a strong presence at CVPR 2019—over 250 Googlers will be in attendance to present papers and invited talks at the conference, and to organize and participate in multiple workshops.
If you are attending CVPR this year, please stop by our booth and chat with our researchers who are actively pursuing the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of
machine perception
. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind
predicting pedestrian motion
, the
Open Images V5 dataset
and much more.
You can learn more about our research being presented at CVPR 2019 in the list below (Google affiliations highlighted in
blue
)
Area Chairs
include:
Jonathan T. Barron
,
William T. Freeman
,
Ce Liu
,
Michael Ryoo
,
Noah Snavely
Oral Presentations
Relational Action Forecasting
Chen Sun
, Abhinav Shrivastava,
Carl Vondrick
,
Rahul Sukthankar
,
Kevin Murphy
,
Cordelia Schmid
Pushing the Boundaries of View Extrapolation With Multiplane Images
Pratul P. Srinivasan,
Richard Tucker
,
Jonathan T. Barron
, Ravi Ramamoorthi, Ren Ng,
Noah Snavely
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation
Chenxi Liu,
Liang-Chieh Chen
,
Florian Schroff
,
Hartwig Adam
,
Wei Hua
, Alan L. Yuille, Li Fei-Fei
AutoAugment: Learning Augmentation Strategies From Data
Ekin D. Cubuk
,
Barret Zoph
, Dandelion Mane,
Vijay Vasudevan
,
Quoc V. Le
DeepView: View Synthesis With Learned Gradient Descent
John Flynn
,
Michael Broxton
,
Paul Debevec
,
Matthew DuVall
,
Graham Fyffe
,
Ryan Overbeck
,
Noah Snavely
,
Richard Tucker
Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation
He Wang, Srinath Sridhar, Jingwei Huang,
Julien Valentin
, Shuran Song, Leonidas J. Guibas
Do Better ImageNet Models Transfer Better?
Simon Kornblith
,
Jonathon Shlens
,
Quoc V. Le
TextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on Meshes
Jingwei Huang, Haotian Zhang, Li Yi,
Thomas Funkhouser
, Matthias Niessner, Leonidas J. Guibas
Diverse Generation for Multi-Agent Sports Games
Raymond A. Yeh, Alexander G. Schwing,
Jonathan Huang
,
Kevin Murphy
Occupancy Networks: Learning 3D Reconstruction in Function Space
Lars Mescheder, Michael Oechsle, Michael Niemeyer,
Sebastian Nowozin
, Andreas Geiger
A General and Adaptive Robust Loss Function
Jonathan T. Barron
Learning the Depths of Moving People by Watching Frozen People
Zhengqi Li,
Tali Dekel
,
Forrester Cole
,
Richard Tucker
,
Noah Snavely
,
Ce Liu
,
William T. Freeman
(CVPR 2019 Best Paper Honorable Mention)
Composing Text and Image for Image Retrieval - an Empirical Odyssey
Nam Vo,
Lu Jiang
,
Chen Sun
,
Kevin Murphy
,
Li-Jia Li
,
Li Fei-Fei
, James Hays
Learning to Synthesize Motion Blur
Tim Brooks
,
Jonathan T. Barron
Neural Rerendering in the Wild
Moustafa Meshry,
Dan B. Goldman
,
Sameh Khamis
,
Hugues Hoppe
,
Rohit Pandey
,
Noah Snavely
,
Ricardo Martin-Brualla
Neural Illumination: Lighting Prediction for Indoor Environments
Shuran Song,
Thomas Funkhouser
Unprocessing Images for Learned Raw Denoising
Tim Brooks
, Ben Mildenhall,
Tianfan Xue
,
Jiawen Chen
,
Dillon Sharlet
,
Jonathan T. Barron
Posters
Co-Occurrent Features in Semantic Segmentation
Hang Zhang,
Han Zhang
, Chenguang Wang, Junyuan Xie
CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency
Yun-Chun Chen, Yen-Yu Lin,
Ming-Hsuan Yang
, Jia-Bin Huang
Im2Pencil: Controllable Pencil Illustration From Photographs
Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman,
Ming-Hsuan Yang
Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis
Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma,
Ming-Hsuan Yang
Revisiting Self-Supervised Visual Representation Learning
Alexander Kolesnikov
,
Xiaohua Zhai
,
Lucas Beyer
Scene Graph Generation With External Knowledge and Image Reconstruction
Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai,
Mingyang Ling
Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks
Kuan Fang,
Alexander Toshev
, Li Fei-Fei, Silvio Savarese
Spatially Variant Linear Representation Models for Joint Filtering
Jinshan Pan, Jiangxin Dong, Jimmy S. Ren, Liang Lin, Jinhui Tang,
Ming-Hsuan Yang
Target-Aware Deep Tracking
Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He,
Ming-Hsuan Yang
Temporal Cycle-Consistency Learning
Debidatta Dwibedi
, Yusuf Aytar,
Jonathan Tompson
,
Pierre Sermanet
, Andrew Zisserman
Depth-Aware Video Frame Interpolation
Wenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao,
Ming-Hsuan Yang
MnasNet: Platform-Aware Neural Architecture Search for Mobile
Mingxing Tan
,
Bo Chen
,
Ruoming Pang
,
Vijay Vasudevan
,
Mark Sandler
,
Andrew Howard
,
Quoc V. Le
A Compact Embedding for Facial Expression Similarity
Raviteja Vemulapalli
,
Aseem Agarwala
Contrastive Adaptation Network for Unsupervised Domain Adaptation
Guoliang Kang,
Lu Jiang
, Yi Yang, Alexander G. Hauptmann
DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality
Chloe LeGendre,
Wan-Chun Ma
,
Graham Fyffe
,
John Flynn
,
Laurent Charbonnel
,
Jay Busch
,
Paul Debevec
Detect-To-Retrieve: Efficient Regional Aggregation for Image Search
Marvin Teichmann,
Andre Araujo
,
Menglong Zhu
,
Jack Sim
Fast Object Class Labelling via Speech
Michael Gygli
,
Vittorio Ferrari
Learning Independent Object Motion From Unlabelled Stereoscopic Videos
Zhe Cao, Abhishek Kar,
Christian Hane
, Jitendra Malik
Peeking Into the Future: Predicting Future Person Activities and Locations in Videos
Junwei Liang,
Lu Jiang
,
Juan Carlos Niebles
, Alexander G. Hauptmann,
Li Fei-Fei
SpotTune: Transfer Learning Through Adaptive Fine-Tuning
Yunhui Guo, Honghui Shi,
Abhishek Kumar
, Kristen Grauman, Tajana Rosing, Rogerio Feris
NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection
Golnaz Ghiasi
,
Tsung-Yi Lin
,
Quoc V. Le
Class-Balanced Loss Based on Effective Number of Samples
Yin Cui, Menglin Jia,
Tsung-Yi Lin
, Yang Song, Serge Belongie
FEELVOS: Fast End-To-End Embedding Learning for Video Object Segmentation
Paul Voigtlaender, Yuning Chai,
Florian Schroff
,
Hartwig Adam
, Bastian Leibe,
Liang-Chieh Chen
Inserting Videos Into Videos
Donghoon Lee,
Tomas Pfister
,
Ming-Hsuan Yang
Volumetric Capture of Humans With a Single RGBD Camera via Semi-Parametric Learning
Rohit Pandey
,
Anastasia Tkach
,
Shuoran Yang
,
Pavel Pidlypenskyi
,
Jonathan Taylor
,
Ricardo Martin-Brualla
,
Andrea Tagliasacchi
, George Papandreou,
Philip Davidson
,
Cem Keskin
,
Shahram Izadi
,
Sean Fanello
You Look Twice: GaterNet for Dynamic Filter Selection in CNNs
Zhourong Chen,
Yang Li
,
Samy Bengio
,
Si Si
Interactive Full Image Segmentation by Considering All Regions Jointly
Eirikur Agustsson
,
Jasper R. R. Uijlings
,
Vittorio Ferrari
Large-Scale Interactive Object Segmentation With Human Annotators
Rodrigo Benenson
,
Stefan Popov
,
Vittorio Ferrari
Self-Supervised GANs via Auxiliary Rotation Loss
Ting Chen,
Xiaohua Zhai
,
Marvin Ritter
,
Mario Lučić
,
Neil Houlsby
Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks
Stephen James, Paul Wohlhart, Mrinal Kalakrishnan,
Dmitry Kalashnikov
,
Alex Irpan
,
Julian Ibarz
,
Sergey Levine
, Raia Hadsell, Konstantinos Bousmalis
Using Unknown Occluders to Recover Hidden Scenes
Adam B. Yedidia, Manel Baradad, Christos Thrampoulidis,
William T. Freeman
, Gregory W. Wornell
Workshops
Computer Vision for Global Challenges
Organizers include:
Timnit Gebru
,
Ernest Mwebaze
,
John Quinn
Deep Vision 2019
Invited speakers include:
Pierre Sermanet
,
Chris Bregler
Landmark Recognition
Organizers include:
Andre Araujo
,
Bingyi Cao
,
Jack Sim
,
Tobias Weyand
Image Matching: Local Features and Beyond
Organizers include:
Eduard Trulls
3D-WiDGET: Deep GEneraTive Models for 3D Understanding
Invited speakers include:
Julien Valentin
Fine-Grained Visual Categorization
Organizers include:
Christine Kaeser-Chen
Advisory panel includes:
Hartwig Adam
Low-Power Image Recognition Challenge (LPIRC)
Organizers include:
Aakanksha Chowdhery
,
Achille Brighton
,
Alec Go
,
Andrew Howard
,
Bo Chen
,
Jaeyoun Kim
,
Jeff Gilbert
New Trends in Image Restoration and Enhancement Workshop and Associated Challenges
Program chairs include:
Vivek Kwatra
,
Peyman Milanfar
,
Sebastian Nowozin
,
George Toderici
,
Ming-Hsuan Yang
Spatio-temporal Action Recognition (AVA) @ ActivityNet Challenge
Organizers include:
David Ross
,
Sourish Chaudhuri
,
Radhika Marvin
,
Arkadiusz Stopczynski
,
Joseph Roth
,
Caroline Pantofaru
,
Chen Sun
,
Cordelia Schmid
Third Workshop on Computer Vision for AR/VR
Organizers include:
Sofien Bouaziz
,
Serge Belongie
DAVIS Challenge on Video Object Segmentation
Organizers include:
Jordi Pont-Tuset
,
Alberto Montes
Efficient Deep Learning for Computer Vision
Invited speakers include:
Andrew Howard
Fairness Accountability Transparency and Ethics in Computer Vision
Organizers include:
Timnit Gebru
,
Margaret Mitchell
Precognition Seeing through the Future
Organizers include:
Utsav Prabhu
Workshop and Challenge on Learned Image Compression
Organizers include:
George Toderici
,
Michele Covell
,
Johannes Ballé
,
Eirikur Agustsson
,
Nick Johnston
When Blockchain Meets Computer Vision & AI
Invited speakers include:
Chris Bregler
Applications of Computer Vision and Pattern Recognition to Media Forensics
Organizers include:
Paul Natsev
,
Christoph Bregler
Tutorials
Towards Relightable Volumetric Performance Capture of Humans
Organizers include:
Sean Fanello
,
Christoph Rhemann
,
Graham Fyffe
,
Jonathan Taylor
,
Sofien Bouaziz
,
Paul Debevec
,
Shahram Izadi
Learning Representations via Graph-structured Networks
Organizers include:
Ming-Hsuan Yang
Applying AutoML to Transformer Architectures
Friday, June 14, 2019
Posted by David So, Software Engineer, Google AI
Since it was introduced a few years ago, Google’s
Transformer architecture
has been applied to challenges ranging from
generating fantasy fiction
to
writing musical harmonies
. Importantly, the Transformer’s high performance has demonstrated that
feed forward neural networks
can be as effective as
recurrent neural networks
when applied to sequence tasks, such as language modeling and translation. While the Transformer and
other feed forward models
used for sequence problems are rising in popularity, their architectures are almost exclusively manually designed, in contrast to the computer vision domain where
AutoML
approaches have found
state-of-the-art models
that outperform those that are designed by hand. Naturally, we wondered if the application of AutoML in the sequence domain could be equally successful.
After conducting an
evolution-based
neural architecture search (NAS), using translation as a proxy for sequence tasks in general, we found the
Evolved Transformer
, a new Transformer architecture that demonstrates promising improvements on a variety of
natural language processing
(NLP) tasks. Not only does the Evolved Transformer achieve state-of-the-art translation results, but it also demonstrates improved performance on language modeling when compared to the original Transformer. We are
releasing this new model
as part of
Tensor2Tensor
, where it can be used for any sequence problem.
Developing the Techniques
To begin the evolutionary NAS, it was necessary for us to develop new techniques, due to the fact that the task used to evaluate the “fitness” of each architecture,
WMT’14 English-German
translation, is computationally expensive. This makes the searches more expensive than similar searches executed in the vision domain, which can leverage smaller datasets, like CIFAR-10. The first of these techniques is warm starting—seeding the initial evolution population with the Transformer architecture instead of random models. This helps ground the search in an area of the search space we know is strong, thereby allowing it to find better models faster.
The second technique is a new method we developed called Progressive Dynamic Hurdles (PDH), an algorithm that augments the evolutionary search to allocate more resources to the strongest candidates, in contrast to previous works, where each candidate model of the NAS is allocated the same amount of resources when it is being evaluated. PDH allows us to terminate the evaluation of a model early if it is flagrantly bad, allowing promising architectures to be awarded more resources.
The Evolved Transformer
Using these methods, we conducted a large-scale NAS on our translation task and discovered the Evolved Transformer (ET). Like most sequence to sequence (seq2seq) neural network architectures, it has an encoder that encodes the input sequence into embeddings and a decoder that uses those embeddings to construct an output sequence; in the case of translation, the input sequence is the sentence to be translated and the output sequence is the translation.
The most interesting feature of the Evolved Transformer is the
convolutional layers
at the bottom of both its encoder and decoder modules that were added in a similar branching pattern in both places (i.e. the inputs run through two separate convolutional layers before being added together).
A comparison between the Evolved Transformer and the original Transformer encoder architectures. Notice the branched convolution structure at the bottom of the module, which formed in both the encoder and decoder independently. See
our paper
for a description of the decoder.
This is particularly interesting because the encoder and decoder architectures are not shared during the NAS, so this architecture was independently discovered as being useful in both the encoder and decoder, speaking to the strength of this design. Whereas the original Transformer relied solely on self-attention, the Evolved Transformer is a hybrid, leveraging the strengths of both self-attention and wide convolution.
Evaluation of the Evolved Transformer
To test the effectiveness of this new architecture, we first compared it to the original Transformer on the English-German translation task we used during the search. We found that the Evolved Transformer had better
BLEU
and
perplexity performance
at all parameter sizes, with the biggest gain at the size compatible with mobile devices (~7 million parameters), demonstrating an efficient use of parameters. At a larger size, the Evolved Transformer reaches state-of-the-art performance on WMT’ 14 En-De with a BLEU score of 29.8 and a SacreBLEU score of 29.2.
Comparison between the Evolved Transformer and the original Transformer on WMT’14 En-De at varying sizes. The biggest gains in performance occur at smaller sizes, while ET also shows strength at larger sizes, outperforming the largest Transformer with 37.6% less parameters (models to compare are circled in green). See Table 3 in
our paper
for the exact numbers.
To test generalizability, we also compared ET to the Transformer on additional NLP tasks. First, we looked at translation using different language pairs, and found ET demonstrated improved performance, with margins similar to those seen on English-German; again, due to its efficient use of parameters, the biggest improvements were observed for medium sized models. We also compared the decoders of both models on language modeling using
LM1B
, and saw a performance improvement of nearly 2 perplexity.
Future Work
These results are the first step in exploring the application of architecture search to feed forward sequence models. The Evolved Transformer is being
open sourced
as part of
Tensor2Tensor
, where it can be used for any sequence problem. To promote reproducibility, we are also open sourcing the
search space we used for our search
and a
Colab
with an implementation of Progressive Dynamic Hurdles. We look forward to seeing what the research community does with the new model and hope that others are able to build off of these new search techniques!
Google at ICML 2019
Monday, June 10, 2019
Posted by Andrew Helton, Editor, Google AI Communications
Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.
As a leader in machine learning research, Google is proud to be a Sapphire Sponsor of the thirty-sixth
International Conference on Machine Learning
(ICML 2019), a premier annual event supported by the
International Machine Learning Society
taking place this week in Long Beach, CA. With nearly 200 Googlers attending the conference to present publications and host workshops, we look forward to our continued collaboration with the larger machine learning research community.
If you're attending ICML 2019, we hope you'll visit the Google booth to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges, with researchers on hand to talk about
Google Research Football Environment
,
AdaNet
,
Robotics at Google
and much more. You can also learn more about the Google research being presented at ICML 2019 in the list below (Google affiliations highlighted in
blue
).
ICML 2019 Committees
Board Members include:
Andrew McCallum
,
Corinna Cortes
,
Hugo Larochelle
,
William Cohen
(Emeritus)
Senior Area Chairs include:
Charles Sutton
,
Claudio Gentile
,
Corinna Cortes
,
Kevin Murphy
,
Mehryar Mohri
,
Nati Srebro
,
Samy Bengio
,
Surya Ganguli
Area Chairs include:
Jacob Abernethy
,
William Cohen
,
Dumitru Erhan
,
Cho-Jui Hsieh
,
Chelsea Finn
,
Sergey Levine
,
Manzil Zaheer
,
Sergei Vassilvitskii
,
Boqing Gong
,
Been Kim
,
Dale Schuurmans
,
Danny Tarlow
,
Dustin Tran
,
Hanie Sedghi
,
Honglak Lee
,
Jasper Snoek
,
Lihong Li
,
Minmin Chen
,
Mohammad Norouzi
,
Nicolas Le Roux
,
Phil Long
,
Sanmi Koyejo
,
Timnit Gebru
,
Vitaly Feldman
,
Satyen Kale
,
Katherine Heller
,
Hossein Mobahi
,
Amir Globerson
,
Ilya Tolstikhin
,
Marco Cuturi
,
Sebastian Nowozin
,
Amin Karbasi
,
Ohad Shamir
,
Graham Taylor
Accepted Publications
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
Francesco Locatello, Stefan Bauer,
Mario Lučić
, Gunnar Rätsch,
Sylvain Gelly
, Bernhard Schölkopf,
Olivier Bachem
(Recipient of an ICML 2019 Best Paper Award)
Learning to Groove with Inverse Sequence Transformations
Jon Gillick
,
Adam Roberts
,
Jesse Engel
,
Douglas Eck
, David Bamman
Metric-Optimized Example Weights
Sen Zhao
,
Mahdi Milani Fard
,
Harikrishna Narasimhan
,
Maya Gupta
HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving
Kshitij Bansal
,
Sarah Loos
,
Markus Rabe
,
Christian Szegedy
,
Stewart Wilcox
Learning to Clear the Market
Weiran Shen,
Sebastien Lahaie
,
Renato Paes Leme
Shape Constraints for Set Functions
Andrew Cotter
,
Maya Gupta
,
Heinrich Jiang
,
Erez Louidor
,
James Muller
,
Tamann Narayan
,
Serena Wang
,
Tao Zhu
Self-Attention Generative Adversarial Networks
Han Zhang
,
Ian Goodfellow
, Dimitris Metaxas,
Augustus Odena
High-Fidelity Image Generation With Fewer Labels
Mario Lučić
, Michael Tschannen,
Marvin Ritter
,
Xiaohua Zhai
,
Olivier Bachem
,
Sylvain Gelly
Learning Optimal Linear Regularizers
Matthew Streeter
DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Carles Gelada
,
Saurabh Kumar
, Jacob Buckman,
Ofir Nachum
,
Marc G. Bellemare
kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection
Lotfi Slim, Clément Chatelain, Chloe-Agathe Azencott,
Jean-Philippe Vert
Learning from a Learner
Alexis Jacq,
Matthieu Geist
, Ana Paiva,
Olivier Pietquin
Rate Distortion For Model Compression:From Theory To Practice
Weihao Gao,
Yu-Han Liu
,
Chong Wang
, Sewoong Oh
An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani,
Shankar Krishnan
,
Ying Xiao
Graph Matching Networks for Learning the Similarity of Graph Structured Objects
Yujia Li, Chenjie Gu,
Thomas Dullien
, Oriol Vinyals, Pushmeet Kohli
Subspace Robust Wasserstein Distances
François-Pierre Paty,
Marco Cuturi
Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
Andrew Cotter
,
Maya Gupta
,
Heinrich Jiang
, Nathan Srebro, Karthik Sridharan,
Serena Wang
, Blake Woodworth, Seungil You
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel Park
,
Jascha Sohl-Dickstein
,
Quoc Le
, Samuel Smith
A Theory of Regularized Markov Decision Processes
Matthieu Geist
, Bruno Scherrer,
Olivier Pietquin
Area Attention
Yang Li
,
Łukasz Kaiser
,
Samy Bengio
,
Si Si
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
,
Quoc Le
Static Automatic Batching In TensorFlow
Ashish Agarwal
The Evolved Transformer
David So
,
Quoc Le
,
Chen Liang
Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann,
Lihong Li,
Wei Wei
, Emma Brunskill
Self-similar Epochs: Value in Arrangement
Eliav Buchnik
,
Edith Cohen
,
Avinatan Hasidim
,
Yossi Matias
The Value Function Polytope in Reinforcement Learning
Robert Dadashi
,
Marc G. Bellemare
,
Adrien Ali Taiga
,
Nicolas Le Roux
,
Dale Schuurmans
Adversarial Examples Are a Natural Consequence of Test Error in Noise
Justin Gilmer
,
Nicolas Ford
,
Nicholas Carlini
,
Ekin Cubuk
SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel,
Matthew Johnson
, Sergey Levine
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton
,
Csaba Szepesvari
, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Yao Qin,
Nicholas Carlini
, Garrison Cottrell,
Ian Goodfellow
,
Colin Raffel
Direct Uncertainty Prediction for Medical Second Opinions
Maithra Raghu
,
Katy Blumer
,
Rory Sayres
, Ziad Obermeyer, Bobby Kleinberg, Sendhil Mullainathan, Jon Kleinberg
A Large-Scale Study on Regularization and Normalization in GANs
Karol Kurach
,
Mario Lučić
,
Xiaohua Zhai
,
Marcin Michalski
,
Sylvain Gelly
Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling
Shanshan Wu, Alex Dimakis, Sujay Sanghavi,
Felix Yu
,
Daniel Holtmann-Rice
,
Dmitry Storcheus
,
Afshin Rostamizadeh
,
Sanjiv Kumar
NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks
Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang,
Boqing Gong
Distributed Weighted Matching via Randomized Composable Coresets
Sepehr Assadi,
Mohammad Hossein Bateni
,
Vahab Mirrokni
Monge blunts Bayes: Hardness Results for Adversarial Training
Zac Cranko,
Aditya Menon
, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder
Generalized Majorization-Minimization
Sobhan Naderi Parizi
,
Kun He
,
Reza Aghajani
,
Stan Sclaroff
,
Pedro Felzenszwalb
NAS-Bench-101: Towards Reproducible Neural Architecture Search
Chris Ying
, Aaron Klein,
Eric Christiansen
,
Esteban Real
,
Kevin Murphy
, Frank Hutter
Variational Russian Roulette for Deep Bayesian Nonparametrics
Kai Xu, Akash Srivastava,
Charles Sutton
Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization
Zhenxun Zhuang,
Ashok Cutkosky
, Francesco Orabona
Improved Parallel Algorithms for Density-Based Network Clustering
Mohsen Ghaffari,
Silvio Lattanzi
, Slobodan Mitrović
The Advantages of Multiple Classes for Reducing Overfitting from Test Set Reuse
Vitaly Feldman
,
Roy Frostig
,
Moritz Hardt
Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity
Ehsan Kazemi, Marko Mitrovic,
Morteza Zadimoghaddam
,
Silvio Lattanzi
, Amin Karbasi
Hiring Under Uncertainty
Manish Purohit
,
Sreenivas Gollapudi
, Manish Raghavan
A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes
Jennifer Gillenwater
,
Alex Kulesza
,
Zelda Mariet
,
Sergei Vassilvtiskii
Statistics and Samples in Distributional Reinforcement Learning
Mark Rowland,
Robert Dadashi
,
Saurabh Kumar
, Remi Munos,
Marc G. Bellemare
, Will Dabney
Provably Efficient Maximum Entropy Exploration
Elad Hazan
,
Sham Kakade
,
Karan Singh
, Abby Van Soest
Active Learning with Disagreement Graphs
Corinna Cortes
,
Giulia DeSalvo,
,
Mehryar Mohri
,
Ningshan Zhang
,
Claudio Gentile
MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
Sami Abu-El-Haija,
Bryan Perozzi
,
Amol Kapoor
, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan
Understanding the Impact of Entropy on Policy Optimization
Zafarali Ahmed
,
Nicolas Le Roux
,
Mohammad Norouzi
,
Dale Schuurmans
Matrix-Free Preconditioning in Online Learning
Ashok Cutkosky
,
Tamas Sarlos
State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations
Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio,
Michael Mozer
Online Convex Optimization in Adversarial Markov Decision Processes
Aviv Rosenberg,
Yishay Mansour
Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy
Kareem Amin
,
Alex Kulesza
,
Andres Munoz Medina
,
Sergei Vassilvtiskii
Complementary-Label Learning for Arbitrary Losses and Models
Takashi Ishida, Gang Niu,
Aditya Menon
, Masashi Sugiyama
Learning Latent Dynamics for Planning from Pixels
Danijar Hafner
,
Timothy Lillicrap,
Ian Fischer
,
Ruben Villegas
,
David Ha
,
Honglak Lee
,
James Davidson
Unifying Orthogonal Monte Carlo Methods
Krzysztof Choromanski
, Mark Rowland, Wenyu Chen, Adrian Weller
Differentially Private Learning of Geometric Concepts
Haim Kaplan
,
Yishay Mansour
,
Yossi Matias
, Uri Stemmer
Online Learning with Sleeping Experts and Feedback Graphs
Corinna Cortes
,
Giulia DeSalvo
,
Claudio Gentile
,
Mehryar Mohri
, Scott Yang
Adaptive Scale-Invariant Online Algorithms for Learning Linear Models
Michal Kempka, Wojciech Kotlowski,
Manfred K. Warmuth
TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing
Augustus Odena
, Catherine Olsson,
David Andersen
,
Ian Goodfellow
Online Control with Adversarial Disturbances
Naman Agarwal
,
Brian Bullins
,
Elad Hazan
,
Sham Kakade
,
Karan Singh
Adversarial Online Learning with Noise
Alon Resler,
Yishay Mansour
Escaping Saddle Points with Adaptive Gradient Methods
Matthew Staib,
Sashank Reddi
,
Satyen Kale
,
Sanjiv Kumar
, Suvrit Sra
Fairness Risk Measures
Robert Williamson,
Aditya Menon
DBSCAN++: Towards Fast and Scalable Density Clustering
Jennifer Jang,
Heinrich Jiang
Learning Linear-Quadratic Regulators Efficiently with only √T Regret
Alon Cohen
,
Tomer Koren
,
Yishay Mansour
Understanding and correcting pathologies in the training of learned optimizers
Luke Metz
,
Niru Maheswaranathan
,
Jeremy Nixon
,
Daniel Freeman
,
Jascha Sohl-Dickstein
Parameter-Efficient Transfer Learning for NLP
Neil Houlsby
,
Andrei Giurgiu
, Stanislaw Jastrzebski,
Bruna Morrone
,
Quentin De Laroussilhe
,
Andrea Gesmundo
,
Mona Attariyan
,
Sylvain Gelly
Efficient Full-Matrix Adaptive Regularization
Naman Agarwal
, Brian Bullins,
Xinyi Chen
,
Elad Hazan
, Karan Singh, Cyril Zhang, Yi Zhang
Efficient On-Device Models Using Neural Projections
Sujith Ravi
Flexibly Fair Representation Learning by Disentanglement
Elliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis,
Kevin Swersky
, Toniann Pitassi, Richard Zemel
Recursive Sketches for Modular Deep Learning
Badih Ghazi
,
Rina Panigrahy
,
Joshua Wang
POLITEX: Regret Bounds for Policy Iteration Using Expert Prediction
Yasin Abbasi-Yadkori, Peter L. Bartlett, Kush Bhatia,
Nevena Lazić
, Csaba Szepesvári, Gellért Weisz
Anytime Online-to-Batch, Optimism and Acceleration
Ashok Cutkosky
Insertion Transformer: Flexible Sequence Generation via Insertion Operations
Mitchell Stern
,
William Chan
,
Jamie Kiros
,
Jakob Uszkoreit
Robust Inference via Generative Classifiers for Handling Noisy Labels
Kimin Lee, Sukmin Yun, Kibok Lee,
Honglak Lee
, Bo Li, Jinwoo Shin
A Better k-means++ Algorithm via Local Search
Silvio Lattanzi
,
Christian Sohler
Analyzing and Improving Representations with the Soft Nearest Neighbor Loss
Nicholas Frosst
,
Nicolas Papernot
,
Geoffrey Hinton
Learning to Generalize from Sparse and Underspecified Rewards
Rishabh Agarwal
,
Chen Liang
,
Dale Schuurmans
,
Mohammad Norouzi
MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization
Eric Chu,
Peter Liu
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
Tom Kenter
,
Vincent Wan
,
Chun-An Chan
,
Rob Clark
, Jakub Vit
Similarity of Neural Network Representations Revisited
Simon Kornblith
,
Mohammad Norouzi
,
Honglak Lee
,
Geoffrey Hinton
Online Algorithms for Rent-Or-Buy with Expert Advice
Sreenivas Gollapudi
, Debmalya Panigrahi
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities
Octavian Ganea,
Sylvain Gelly
, Gary Becigneul,
Aliaksei Severyn
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
Matthew Fahrbach,
Vahab Mirrokni
,
Morteza Zadimoghaddam
Agnostic Federated Learning
Mehryar Mohri
,
Gary Sivek
,
Ananda Theertha Suresh
Categorical Feature Compression via Submodular Optimization
Mohammad Hossein Bateni
,
Lin Chen
,
Hossein Esfandiari
,
Thomas Fu
,
Vahab Mirrokni
,
Afshin Rostamizadeh
Cross-Domain 3D Equivariant Image Embeddings
Carlos Esteves,
Avneesh Sud
, Zhengyi Luo, Kostas Daniilidis,
Ameesh Makadia
Faster Algorithms for Binary Matrix Factorization
Ravi Kumar
,
Rina Panigrahy
,
Ali Rahimi
, David Woodruff
On Variational Bounds of Mutual Information
Ben Poole
,
Sherjil Ozair
, Aaron Van Den Oord,
Alex Alemi
,
George Tucker
Guided Evolutionary Strategies: Augmenting Random Search with Surrogate Gradients
Niru Maheswaranathan
,
Luke Metz
,
George Tucker
,
Dami Choi
,
Jascha Sohl-Dickstein
Semi-Cyclic Stochastic Gradient Descent
Hubert Eichner
,
Tomer Koren
,
Brendan McMahan
, Nathan Srebro,
Kunal Talwar
Stochastic Deep Networks
Gwendoline de Bie, Gabriel Peyré,
Marco Cuturi
Workshops
1st Workshop on Understanding and Improving Generalization in Deep Learning
Organizers Include:
Dilip Krishnan
,
Hossein Mobahi
Invited Speaker:
Chelsea Finn
Climate Change: How Can AI Help?
Invited Speaker:
John Platt
Generative Modeling and Model-Based Reasoning for Robotics and AI
Organizers Include:
Dumitru Erhan
,
Sergey Levine
,
Kimberly Stachenfeld
Invited Speaker:
Chelsea Finn
Human In the Loop Learning (HILL)
Organizers Include:
Been Kim
ICML 2019 Time Series Workshop
Organizers Include:
Vitaly Kuznetsov
Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR)
Organizers Include:
Sujith Ravi
,
Zornitsa Kozareva
Negative Dependence: Theory and Applications in Machine Learning
Organizers Include:
Jennifer Gillenwater
,
Alex Kulesza
Reinforcement Learning for Real Life
Organizers Include:
Lihong Li
Invited Speaker:
Craig Boutilier
Uncertainty and Robustness in Deep Learning
Organizers Include:
Justin Gilmer
Theoretical Physics for Deep Learning
Organizers Include:
Jaehoon Lee
,
Jeffrey Pennington
,
Yasaman Bahri
Workshop on the Security and Privacy of Machine Learning
Organizers Include:
Nicolas Papernot
Invited Speaker:
Been Kim
Exploration in Reinforcement Learning Workshop
Organizers Include:
Benjamin Eysenbach
,
Surya Bhupatiraju
,
Shixiang Gu
ICML Workshop on Imitation, Intent, and Interaction (I3)
Organizers Include:
Sergey Levine
,
Chelsea Finn
Invited Speaker:
Pierre Sermanet
Identifying and Understanding Deep Learning Phenomena
Organizers Include:
Hanie Sedghi
,
Samy Bengio
,
Kenji Hata
,
Maithra Raghu
,
Ali Rahimi
,
Ying Xiao
Workshop on Multi-Task and Lifelong Reinforcement Learning
Organizers Include:
Sarath Chandar
,
Chelsea Finn
Invited Speakers:
Karol Hausman
,
Sergey Levine
Workshop on Self-Supervised Learning
Organizers Include:
Pierre Sermanet
Invertible Neural Networks and Normalizing Flows
Organizers Include:
Rianne Van den Berg
,
Danilo J. Rezende
Invited Speakers:
Eric Jang
,
Laurent Dinh
Introducing Google Research Football: A Novel Reinforcement Learning Environment
Friday, June 7, 2019
Posted by Karol Kurach, Research Lead and Olivier Bachem, Research Scientist, Google Research, Zürich
The goal of
reinforcement learning
(RL) is to train smart agents that can interact with their environment and solve complex tasks, with real-world applications towards
robotics
,
self-driving cars
, and
more
. The rapid progress in this field has been fueled by making agents play games such as the iconic
Atari console games
, the ancient
game of Go
, or professionally played video games like
Dota 2
or
Starcraft 2
, all of which provide challenging environments where new algorithms and ideas can be quickly tested in a safe and reproducible manner. The game of football is particularly challenging for RL, as it requires a natural balance between short-term control, learned concepts, such as passing, and high level strategy.
Today we are happy to announce the release of the
Google Research Football Environment
, a novel RL environment where agents aim to master the world’s most popular sport—football. Modeled after popular football video games, the Football Environment provides a physics based 3D football simulation where agents control either one or all football players on their team, learn how to pass between them, and manage to overcome their opponent’s defense in order to score goals. The Football Environment provides several crucial components: a highly-optimized game engine, a demanding set of research problems called Football Benchmarks, as well as the Football Academy, a set of progressively harder RL scenarios. In order to facilitate research, we have released a beta version of the underlying
open-source code on Github
.
Football Engine
The core of the Football Environment is an advanced football simulation, called Football Engine, which is based on a heavily modified version of
Gameplay Football
. Based on input actions for the two opposing teams, it simulates a match of football including goals, fouls, corner and penalty kicks, and offsides. The Football Engine is written in highly optimized C++ code, allowing it to be run on off-the-shelf machines, both with GPU and without GPU-based rendering enabled. This allows it to reach a performance of approximately 25 million steps per day on a single hexa-core machine.
The Football Engine is an advanced football simulation that supports all the major football rules such as kickoffs (top left), goals (top right), fouls, cards (bottom left), corner and penalty kicks (bottom right), and offside.
The Football Engine has additional features geared specifically towards RL. First, it allows learning from both different state representations, which contain semantic information such as the player’s locations, as well as learning from raw pixels. Second, to investigate the impact of randomness, it can be run in both a
stochastic
mode (enabled by default), in which there is randomness in both the environment and opponent AI actions, and in a deterministic mode, where there is no randomness. Third, the Football Engine is out of the box compatible with the widely used
OpenAI Gym
API. Finally, researchers can get a feeling for the game by playing against each other or their agents, using either keyboards or gamepads.
Football Benchmarks
With the Football Benchmarks, we propose a set of benchmark problems for RL research based on the Football Engine. The goal in these benchmarks is to play a “standard” game of football against a fixed
rule-based
opponent that was hand-engineered for this purpose. We provide three versions: the Football Easy Benchmark, the Football Medium Benchmark, and the Football Hard Benchmark, which only differ in the strength of the opponent.
As a reference, we provide benchmark results for two state-of-the-art reinforcement learning algorithms:
DQN
and
IMPALA
, which both can be run in multiple processes on a single machine or concurrently on many machines. We investigate both the setting where the only rewards provided to the algorithm are the goals scored and the setting where we provide additional rewards for moving the ball closer to the goal.
Our results indicate that the Football Benchmarks are interesting research problems of varying difficulties. In particular, the Football Easy Benchmark appears to be suitable for research on single-machine algorithms while the Football Hard Benchmark proves to be challenging even for massively distributed RL algorithms. Based on the nature of the environment and the difficulty of the benchmarks, we expect them to be useful for investigating current scientific challenges such as
sample-efficient RL
,
sparse rewards
, or
model based RL
.
The average goal difference of agent versus opponent at different difficulty levels for different baselines. The Easy opponent can be beaten by a DQN agent trained for 20 million steps, while the Medium and Hard opponents require a distributed algorithm such as IMPALA that is trained for 200 million steps.
Football Academy & Future Directions
As training agents for the full Football Benchmarks can be challenging, we also provide Football Academy, a diverse set of scenarios of varying difficulty. This allows researchers to get the ball rolling on new research ideas, allows testing of high-level concepts (such as passing), and provides a foundation to investigate
curriculum learning
research ideas, where agents learn from progressively harder scenarios. Examples of the Football Academy scenarios include settings where agents have to learn how to score against the empty goal, where they have to learn how to quickly pass between players, and where they have to learn how to execute a counter-attack. Using a simple API, researchers can further define their own scenarios and train agents to solve them.
Top:
A successful policy that runs towards the goal (as required, since a number of opponents chase our player) and scores against the goal-keeper.
Second:
A beautiful way to drive and finish a counter-attack.
Third:
A simple way to solve a 2-vs-1 play.
Bottom:
The agent scores after a corner kick.
The Football Benchmarks and the Football Academy consider the standard RL setup, in which agents compete against a fixed opponent, i.e., where the opponent can be considered a part of the environment. Yet, in reality, football is a two-player game where two different teams compete and where one has to adapt to the actions and strategy of the opposing team. The Football Engine provides a unique opportunity for research into this setting and, once we complete our on-going effort to implement self-play, even more interesting research settings can be investigated.
Acknowledgments
This project was undertaken together with Anton Raichuk, Piotr Stańczyk, Michał Zając, Lasse Espeholt, Carlos Riquelme, Damien Vincent, Marcin Michalski, Olivier Bousquet and Sylvain Gelly at Google Research, Zürich. We also wish to thank Lucas Beyer, Nal Kalchbrenner, Tim Salimans and the rest of the Google Brain team for helpful discussions, comments, technical help and code contributions. Finally, we would like to thank Bastiaan Konings Schuiling, who authored and open-sourced the original version of this game.
Labels
2018
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Augmented Reality
Australia
Automatic Speech Recognition
AutoML
Awards
BigQuery
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Compression
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gboard
Gmail
Google Accelerated Science
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICCV
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
India
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
Kaggle
KDD
Keyboard Input
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
ML Fairness
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
NeurIPS
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
Peer Review
ph.d. fellowship
PhD Fellowship
PhotoScan
Physics
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum AI
Quantum Computing
Reinforcement Learning
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semantic Models
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Sound Search
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorBoard
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
Year in Review
YouTube
Archive
2019
Jun
May
Apr
Mar
Feb
Jan
2018
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2017
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Follow @googleai
Give us feedback in our
Product Forums
.