Google AI Blog

Off-Policy Classification - A New Reinforcement Learning Model Selection Method

Wednesday, June 19, 2019

Reinforcement learningoff-policy RLwalkinggraspingfully off-policy RLentirelytrainingevaluationAutoMLff-policy evaluation

A diagram for real-world model development. Assuming we can evaluate 10 models per day, without off-policy evaluation, we would need 100x as many days to evaluate our models.

Off-Policy Evaluation via Off-Policy Classificationoff-policy classificationimportance samplingHow OPC WorkseffectivecatastrophicQ-functionQ-learningpositive-unlabeled learningOff-Policy Evaluation for Sim-to-Real Learningsimulated datatransfer learning techniquessample complexity

An example of how simulated experience can differ from real-world experience. Here, simulated images (left) have much less visual complexity than real-world images (right).

Results

An experiment in the simulated grasping task. The red curve is the dimensionless SoftOPC score over the course of training, evaluated from old data. The blue curve is the grasp success rate in simulation. We see the SoftOPC on old data correlates well with grasp success of the model within our simulator.

SoftOPC score and true performance for 3 different sim-to-real methods: a baseline simulation, a simulation with random textures and lighting, and a model trained with RCAN. All three models are trained with no real data, then evaluated with off-policy evaluation on a validation set of real data. The ordering of the SoftOPC score matches the order of real grasp success.

temporal-difference error

Results from our sim-to-real evaluation experiment. On the left is a baseline, the temporal difference error of the model. On the right is one of our proposed methods, the SoftOPC. The shaded region is a 95% confidence interval. The correlation is significantly better with SoftOPC.

Future WorkAcknowledgementsThis research was conducted by Alex Irpan, Kanishka Rao, Konstantinos Bousmalis, Chris Harris, Julian Ibarz and Sergey Levine. We’d like to thank Razvan Pascanu, Dale Schuurmans, George Tucker and Paul Wohlhart for valuable discussions. A preprint is available on arXiv.

Google at CVPR 2019

Monday, June 17, 2019

Posted by Andrew Helton, Editor, Google AI Communications2019 Conference on Computer Vision and Pattern Recognitionworkshopstutorialsmachine perceptionpredicting pedestrian motionOpen Images V5 datasetblueArea Chairs Jonathan T. Barron, William T. Freeman, Ce Liu, Michael Ryoo, Noah SnavelyOral PresentationsRelational Action ForecastingChen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, Cordelia SchmidPushing the Boundaries of View Extrapolation With Multiplane ImagesPratul P. Srinivasan, Richard Tucker, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng, Noah SnavelyAuto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image SegmentationChenxi Liu, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Wei Hua, Alan L. Yuille, Li Fei-FeiAutoAugment: Learning Augmentation Strategies From DataEkin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. LeDeepView: View Synthesis With Learned Gradient DescentJohn Flynn, Michael Broxton, Paul Debevec, Matthew DuVall, Graham Fyffe, Ryan Overbeck, Noah Snavely, Richard TuckerNormalized Object Coordinate Space for Category-Level 6D Object Pose and Size EstimationHe Wang, Srinath Sridhar, Jingwei Huang, Julien Valentin, Shuran Song, Leonidas J. GuibasDo Better ImageNet Models Transfer Better?Simon Kornblith, Jonathon Shlens, Quoc V. LeTextureNet: Consistent Local Parametrizations for Learning From High-Resolution Signals on MeshesJingwei Huang, Haotian Zhang, Li Yi, Thomas Funkhouser, Matthias Niessner, Leonidas J. GuibasDiverse Generation for Multi-Agent Sports GamesRaymond A. Yeh, Alexander G. Schwing, Jonathan Huang, Kevin MurphyOccupancy Networks: Learning 3D Reconstruction in Function SpaceLars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, Andreas GeigerA General and Adaptive Robust Loss FunctionJonathan T. BarronLearning the Depths of Moving People by Watching Frozen PeopleZhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman(CVPR 2019 Best Paper Honorable Mention)Composing Text and Image for Image Retrieval - an Empirical OdysseyNam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, James HaysLearning to Synthesize Motion BlurTim Brooks, Jonathan T. BarronNeural Rerendering in the WildMoustafa Meshry, Dan B. Goldman, Sameh Khamis, Hugues Hoppe, Rohit Pandey, Noah Snavely, Ricardo Martin-BruallaNeural Illumination: Lighting Prediction for Indoor EnvironmentsShuran Song, Thomas FunkhouserUnprocessing Images for Learned Raw DenoisingTim Brooks, Ben Mildenhall, Tianfan Xue, Jiawen Chen, Dillon Sharlet, Jonathan T. BarronPostersCo-Occurrent Features in Semantic SegmentationHang Zhang, Han Zhang, Chenguang Wang, Junyuan XieCrDoCo: Pixel-Level Domain Transfer With Cross-Domain ConsistencyYun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin HuangIm2Pencil: Controllable Pencil Illustration From PhotographsYijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan YangMode Seeking Generative Adversarial Networks for Diverse Image SynthesisQi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan YangRevisiting Self-Supervised Visual Representation LearningAlexander Kolesnikov, Xiaohua Zhai, Lucas BeyerScene Graph Generation With External Knowledge and Image ReconstructionJiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang LingScene Memory Transformer for Embodied Agents in Long-Horizon TasksKuan Fang, Alexander Toshev, Li Fei-Fei, Silvio SavareseSpatially Variant Linear Representation Models for Joint FilteringJinshan Pan, Jiangxin Dong, Jimmy S. Ren, Liang Lin, Jinhui Tang, Ming-Hsuan YangTarget-Aware Deep TrackingXin Li, Chao Ma, Baoyuan Wu, Zhenyu He, Ming-Hsuan YangTemporal Cycle-Consistency LearningDebidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew ZissermanDepth-Aware Video Frame InterpolationWenbo Bao, Wei-Sheng Lai, Chao Ma, Xiaoyun Zhang, Zhiyong Gao, Ming-Hsuan YangMnasNet: Platform-Aware Neural Architecture Search for MobileMingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. LeA Compact Embedding for Facial Expression SimilarityRaviteja Vemulapalli, Aseem AgarwalaContrastive Adaptation Network for Unsupervised Domain AdaptationGuoliang Kang, Lu Jiang, Yi Yang, Alexander G. HauptmannDeepLight: Learning Illumination for Unconstrained Mobile Mixed RealityChloe LeGendre, Wan-Chun Ma, Graham Fyffe, John Flynn, Laurent Charbonnel, Jay Busch, Paul DebevecDetect-To-Retrieve: Efficient Regional Aggregation for Image SearchMarvin Teichmann, Andre Araujo, Menglong Zhu, Jack SimFast Object Class Labelling via SpeechMichael Gygli, Vittorio FerrariLearning Independent Object Motion From Unlabelled Stereoscopic VideosZhe Cao, Abhishek Kar, Christian Hane, Jitendra MalikPeeking Into the Future: Predicting Future Person Activities and Locations in VideosJunwei Liang, Lu Jiang, Juan Carlos Niebles, Alexander G. Hauptmann, Li Fei-FeiSpotTune: Transfer Learning Through Adaptive Fine-TuningYunhui Guo, Honghui Shi, Abhishek Kumar, Kristen Grauman, Tajana Rosing, Rogerio FerisNAS-FPN: Learning Scalable Feature Pyramid Architecture for Object DetectionGolnaz Ghiasi, Tsung-Yi Lin, Quoc V. LeClass-Balanced Loss Based on Effective Number of SamplesYin Cui, Menglin Jia, Tsung-Yi Lin, Yang Song, Serge BelongieFEELVOS: Fast End-To-End Embedding Learning for Video Object SegmentationPaul Voigtlaender, Yuning Chai, Florian Schroff, Hartwig Adam, Bastian Leibe, Liang-Chieh ChenInserting Videos Into VideosDonghoon Lee, Tomas Pfister, Ming-Hsuan YangVolumetric Capture of Humans With a Single RGBD Camera via Semi-Parametric LearningRohit Pandey, Anastasia Tkach, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Ricardo Martin-Brualla, Andrea Tagliasacchi, George Papandreou, Philip Davidson, Cem Keskin, Shahram Izadi, Sean FanelloYou Look Twice: GaterNet for Dynamic Filter Selection in CNNsZhourong Chen, Yang Li, Samy Bengio, Si SiInteractive Full Image Segmentation by Considering All Regions JointlyEirikur Agustsson, Jasper R. R. Uijlings, Vittorio FerrariLarge-Scale Interactive Object Segmentation With Human AnnotatorsRodrigo Benenson, Stefan Popov, Vittorio FerrariSelf-Supervised GANs via Auxiliary Rotation LossTing Chen, Xiaohua Zhai, Marvin Ritter, Mario Lučić, Neil HoulsbySim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation NetworksStephen James, Paul Wohlhart, Mrinal Kalakrishnan, Dmitry Kalashnikov, Alex Irpan, Julian Ibarz, Sergey Levine, Raia Hadsell, Konstantinos BousmalisUsing Unknown Occluders to Recover Hidden ScenesAdam B. Yedidia, Manel Baradad, Christos Thrampoulidis, William T. Freeman, Gregory W. WornellWorkshopsComputer Vision for Global ChallengesTimnit Gebru, Ernest Mwebaze, John QuinnDeep Vision 2019Pierre Sermanet, Chris BreglerLandmark RecognitionAndre Araujo, Bingyi Cao, Jack Sim, Tobias WeyandImage Matching: Local Features and BeyondEduard Trulls3D-WiDGET: Deep GEneraTive Models for 3D UnderstandingJulien ValentinFine-Grained Visual CategorizationChristine Kaeser-ChenHartwig AdamLow-Power Image Recognition Challenge (LPIRC)Aakanksha Chowdhery, Achille Brighton, Alec Go, Andrew Howard, Bo Chen, Jaeyoun Kim, Jeff GilbertNew Trends in Image Restoration and Enhancement Workshop and Associated ChallengesVivek Kwatra, Peyman Milanfar, Sebastian Nowozin, George Toderici, Ming-Hsuan YangSpatio-temporal Action Recognition (AVA) @ ActivityNet ChallengeDavid Ross, Sourish Chaudhuri, Radhika Marvin, Arkadiusz Stopczynski, Joseph Roth, Caroline Pantofaru, Chen Sun, Cordelia SchmidThird Workshop on Computer Vision for AR/VRSofien Bouaziz, Serge BelongieDAVIS Challenge on Video Object SegmentationJordi Pont-Tuset, Alberto MontesEfficient Deep Learning for Computer VisionAndrew HowardFairness Accountability Transparency and Ethics in Computer VisionTimnit Gebru, Margaret MitchellPrecognition Seeing through the FutureUtsav PrabhuWorkshop and Challenge on Learned Image CompressionGeorge Toderici, Michele Covell, Johannes Ballé, Eirikur Agustsson, Nick JohnstonWhen Blockchain Meets Computer Vision & AIChris BreglerApplications of Computer Vision and Pattern Recognition to Media ForensicsPaul Natsev, Christoph BreglerTutorialsTowards Relightable Volumetric Performance Capture of HumansSean Fanello, Christoph Rhemann, Graham Fyffe, Jonathan Taylor, Sofien Bouaziz, Paul Debevec, Shahram IzadiLearning Representations via Graph-structured NetworksMing-Hsuan Yang

Applying AutoML to Transformer Architectures

Friday, June 14, 2019

Posted by David So, Software Engineer, Google AITransformer architecturegenerating fantasy fictionwriting musical harmoniesfeed forward neural networksrecurrent neural networksother feed forward modelsAutoMLstate-of-the-art modelsevolution-basedEvolved Transformernatural language processingreleasing this new modelTensor2TensorDeveloping the TechniquesWMT’14 English-GermanThe Evolved Transformerconvolutional layers

A comparison between the Evolved Transformer and the original Transformer encoder architectures. Notice the branched convolution structure at the bottom of the module, which formed in both the encoder and decoder independently. See our paper for a description of the decoder.

Evaluation of the Evolved TransformerBLEUperplexity performance

Comparison between the Evolved Transformer and the original Transformer on WMT’14 En-De at varying sizes. The biggest gains in performance occur at smaller sizes, while ET also shows strength at larger sizes, outperforming the largest Transformer with 37.6% less parameters (models to compare are circled in green). See Table 3 in our paper for the exact numbers.

LM1B

Future Workopen sourcedTensor2Tensorsearch space we used for our searchColab

Google at ICML 2019

Monday, June 10, 2019

Posted by Andrew Helton, Editor, Google AI CommunicationsInternational Conference on Machine LearningInternational Machine Learning SocietyGoogle Research Football EnvironmentAdaNetRobotics at GoogleblueICML 2019 CommitteesAndrew McCallum, Corinna Cortes, Hugo Larochelle, William Cohen (Emeritus)Charles Sutton, Claudio Gentile, Corinna Cortes, Kevin Murphy, Mehryar Mohri, Nati Srebro, Samy Bengio, Surya GanguliJacob Abernethy, William Cohen, Dumitru Erhan, Cho-Jui Hsieh, Chelsea Finn, Sergey Levine, Manzil Zaheer, Sergei Vassilvitskii, Boqing Gong, Been Kim, Dale Schuurmans, Danny Tarlow, Dustin Tran, Hanie Sedghi, Honglak Lee, Jasper Snoek, Lihong Li, Minmin Chen, Mohammad Norouzi, Nicolas Le Roux, Phil Long, Sanmi Koyejo, Timnit Gebru, Vitaly Feldman, Satyen Kale, Katherine Heller, Hossein Mobahi, Amir Globerson, Ilya Tolstikhin, Marco Cuturi, Sebastian Nowozin, Amin Karbasi, Ohad Shamir, Graham TaylorAccepted PublicationsChallenging Common Assumptions in the Unsupervised Learning of Disentangled RepresentationsFrancesco Locatello, Stefan Bauer, Mario Lučić, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem(Recipient of an ICML 2019 Best Paper Award)Learning to Groove with Inverse Sequence TransformationsJon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, David BammanMetric-Optimized Example WeightsSen Zhao, Mahdi Milani Fard, Harikrishna Narasimhan, Maya GuptaHOList: An Environment for Machine Learning of Higher Order Logic Theorem ProvingKshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, Stewart WilcoxLearning to Clear the MarketWeiran Shen, Sebastien Lahaie, Renato Paes LemeShape Constraints for Set FunctionsAndrew Cotter, Maya Gupta, Heinrich Jiang, Erez Louidor, James Muller, Tamann Narayan, Serena Wang, Tao ZhuSelf-Attention Generative Adversarial NetworksHan Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus OdenaHigh-Fidelity Image Generation With Fewer LabelsMario Lučić, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain GellyLearning Optimal Linear RegularizersMatthew StreeterDeepMDP: Learning Continuous Latent Space Models for Representation LearningCarles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. BellemarekernelPSI: a Post-Selection Inference Framework for Nonlinear Variable SelectionLotfi Slim, Clément Chatelain, Chloe-Agathe Azencott, Jean-Philippe VertLearning from a LearnerAlexis Jacq, Matthieu Geist, Ana Paiva, Olivier PietquinRate Distortion For Model Compression:From Theory To PracticeWeihao Gao, Yu-Han Liu, Chong Wang, Sewoong OhAn Investigation into Neural Net Optimization via Hessian Eigenvalue DensityBehrooz Ghorbani, Shankar Krishnan, Ying XiaoGraph Matching Networks for Learning the Similarity of Graph Structured ObjectsYujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, Pushmeet KohliSubspace Robust Wasserstein DistancesFrançois-Pierre Paty, Marco CuturiTraining Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent ConstraintsAndrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil YouThe Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical StudyDaniel Park, Jascha Sohl-Dickstein, Quoc Le, Samuel SmithA Theory of Regularized Markov Decision ProcessesMatthieu Geist, Bruno Scherrer, Olivier PietquinArea AttentionYang Li, Łukasz Kaiser, Samy Bengio, Si SiEfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksMingxing Tan, Quoc LeStatic Automatic Batching In TensorFlowAshish AgarwalThe Evolved TransformerDavid So, Quoc Le, Chen LiangPolicy Certificates: Towards Accountable Reinforcement LearningChristoph Dann, Lihong Li, Wei Wei, Emma BrunskillSelf-similar Epochs: Value in ArrangementEliav Buchnik, Edith Cohen, Avinatan Hasidim, Yossi MatiasThe Value Function Polytope in Reinforcement LearningRobert Dadashi, Marc G. Bellemare, Adrien Ali Taiga, Nicolas Le Roux, Dale SchuurmansAdversarial Examples Are a Natural Consequence of Test Error in NoiseJustin Gilmer, Nicolas Ford, Nicholas Carlini, Ekin CubukSOLAR: Deep Structured Representations for Model-Based Reinforcement LearningMarvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, Sergey LevineGarbage In, Reward Out: Bootstrapping Exploration in Multi-Armed BanditsBranislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad GhavamzadehImperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech RecognitionYao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, Colin RaffelDirect Uncertainty Prediction for Medical Second OpinionsMaithra Raghu, Katy Blumer, Rory Sayres, Ziad Obermeyer, Bobby Kleinberg, Sendhil Mullainathan, Jon KleinbergA Large-Scale Study on Regularization and Normalization in GANsKarol Kurach, Mario Lučić, Xiaohua Zhai, Marcin Michalski, Sylvain GellyLearning a Compressed Sensing Measurement Matrix via Gradient UnrollingShanshan Wu, Alex Dimakis, Sujay Sanghavi, Felix Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv KumarNATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural NetworksYandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing GongDistributed Weighted Matching via Randomized Composable CoresetsSepehr Assadi, Mohammad Hossein Bateni, Vahab MirrokniMonge blunts Bayes: Hardness Results for Adversarial TrainingZac Cranko, Aditya Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian WalderGeneralized Majorization-MinimizationSobhan Naderi Parizi, Kun He, Reza Aghajani, Stan Sclaroff, Pedro FelzenszwalbNAS-Bench-101: Towards Reproducible Neural Architecture SearchChris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, Frank HutterVariational Russian Roulette for Deep Bayesian NonparametricsKai Xu, Akash Srivastava, Charles SuttonSurrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex OptimizationZhenxun Zhuang, Ashok Cutkosky, Francesco OrabonaImproved Parallel Algorithms for Density-Based Network ClusteringMohsen Ghaffari, Silvio Lattanzi, Slobodan MitrovićThe Advantages of Multiple Classes for Reducing Overfitting from Test Set ReuseVitaly Feldman, Roy Frostig, Moritz HardtSubmodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive ComplexityEhsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, Amin KarbasiHiring Under UncertaintyManish Purohit, Sreenivas Gollapudi, Manish RaghavanA Tree-Based Method for Fast Repeated Sampling of Determinantal Point ProcessesJennifer Gillenwater, Alex Kulesza, Zelda Mariet, Sergei VassilvtiskiiStatistics and Samples in Distributional Reinforcement LearningMark Rowland, Robert Dadashi, Saurabh Kumar, Remi Munos, Marc G. Bellemare, Will DabneyProvably Efficient Maximum Entropy ExplorationElad Hazan, Sham Kakade, Karan Singh, Abby Van SoestActive Learning with Disagreement GraphsCorinna Cortes, Giulia DeSalvo,, Mehryar Mohri, Ningshan Zhang, Claudio GentileMixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood MixingSami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram GalstyanUnderstanding the Impact of Entropy on Policy OptimizationZafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale SchuurmansMatrix-Free Preconditioning in Online LearningAshok Cutkosky, Tamas SarlosState-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden RepresentationsAlex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio, Michael MozerOnline Convex Optimization in Adversarial Markov Decision ProcessesAviv Rosenberg, Yishay MansourBounding User Contributions: A Bias-Variance Trade-off in Differential PrivacyKareem Amin, Alex Kulesza, Andres Munoz Medina, Sergei VassilvtiskiiComplementary-Label Learning for Arbitrary Losses and ModelsTakashi Ishida, Gang Niu, Aditya Menon, Masashi SugiyamaLearning Latent Dynamics for Planning from PixelsDanijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James DavidsonUnifying Orthogonal Monte Carlo MethodsKrzysztof Choromanski, Mark Rowland, Wenyu Chen, Adrian WellerDifferentially Private Learning of Geometric ConceptsHaim Kaplan, Yishay Mansour, Yossi Matias, Uri StemmerOnline Learning with Sleeping Experts and Feedback GraphsCorinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Scott YangAdaptive Scale-Invariant Online Algorithms for Learning Linear ModelsMichal Kempka, Wojciech Kotlowski, Manfred K. WarmuthTensorFuzz: Debugging Neural Networks with Coverage-Guided FuzzingAugustus Odena, Catherine Olsson, David Andersen, Ian GoodfellowOnline Control with Adversarial DisturbancesNaman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, Karan SinghAdversarial Online Learning with NoiseAlon Resler, Yishay MansourEscaping Saddle Points with Adaptive Gradient MethodsMatthew Staib, Sashank Reddi, Satyen Kale, Sanjiv Kumar, Suvrit SraFairness Risk MeasuresRobert Williamson, Aditya MenonDBSCAN++: Towards Fast and Scalable Density ClusteringJennifer Jang, Heinrich JiangLearning Linear-Quadratic Regulators Efficiently with only √T RegretAlon Cohen, Tomer Koren, Yishay MansourUnderstanding and correcting pathologies in the training of learned optimizersLuke Metz, Niru Maheswaranathan, Jeremy Nixon, Daniel Freeman, Jascha Sohl-DicksteinParameter-Efficient Transfer Learning for NLPNeil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain GellyEfficient Full-Matrix Adaptive RegularizationNaman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi ZhangEfficient On-Device Models Using Neural ProjectionsSujith RaviFlexibly Fair Representation Learning by DisentanglementElliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, Richard ZemelRecursive Sketches for Modular Deep LearningBadih Ghazi, Rina Panigrahy, Joshua WangPOLITEX: Regret Bounds for Policy Iteration Using Expert PredictionYasin Abbasi-Yadkori, Peter L. Bartlett, Kush Bhatia, Nevena Lazić, Csaba Szepesvári, Gellért WeiszAnytime Online-to-Batch, Optimism and AccelerationAshok CutkoskyInsertion Transformer: Flexible Sequence Generation via Insertion OperationsMitchell Stern, William Chan, Jamie Kiros, Jakob UszkoreitRobust Inference via Generative Classifiers for Handling Noisy LabelsKimin Lee, Sukmin Yun, Kibok Lee, Honglak Lee, Bo Li, Jinwoo ShinA Better k-means++ Algorithm via Local SearchSilvio Lattanzi, Christian SohlerAnalyzing and Improving Representations with the Soft Nearest Neighbor LossNicholas Frosst, Nicolas Papernot, Geoffrey HintonLearning to Generalize from Sparse and Underspecified RewardsRishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad NorouziMeanSum: A Neural Model for Unsupervised Multi-Document Abstractive SummarizationEric Chu, Peter LiuCHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational NetworkTom Kenter, Vincent Wan, Chun-An Chan, Rob Clark, Jakub VitSimilarity of Neural Network Representations RevisitedSimon Kornblith, Mohammad Norouzi, Honglak Lee, Geoffrey HintonOnline Algorithms for Rent-Or-Buy with Expert AdviceSreenivas Gollapudi, Debmalya PanigrahiBreaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearitiesOctavian Ganea, Sylvain Gelly, Gary Becigneul, Aliaksei SeverynNon-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query ComplexityMatthew Fahrbach, Vahab Mirrokni, Morteza ZadimoghaddamAgnostic Federated LearningMehryar Mohri, Gary Sivek, Ananda Theertha SureshCategorical Feature Compression via Submodular OptimizationMohammad Hossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, Afshin RostamizadehCross-Domain 3D Equivariant Image EmbeddingsCarlos Esteves, Avneesh Sud, Zhengyi Luo, Kostas Daniilidis, Ameesh MakadiaFaster Algorithms for Binary Matrix FactorizationRavi Kumar, Rina Panigrahy, Ali Rahimi, David WoodruffOn Variational Bounds of Mutual InformationBen Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, George TuckerGuided Evolutionary Strategies: Augmenting Random Search with Surrogate GradientsNiru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-DicksteinSemi-Cyclic Stochastic Gradient DescentHubert Eichner, Tomer Koren, Brendan McMahan, Nathan Srebro, Kunal TalwarStochastic Deep NetworksGwendoline de Bie, Gabriel Peyré, Marco CuturiWorkshops1st Workshop on Understanding and Improving Generalization in Deep LearningDilip Krishnan, Hossein MobahiChelsea FinnClimate Change: How Can AI Help?John PlattGenerative Modeling and Model-Based Reasoning for Robotics and AIDumitru Erhan, Sergey Levine, Kimberly StachenfeldChelsea FinnHuman In the Loop Learning (HILL)Been KimICML 2019 Time Series WorkshopVitaly KuznetsovJoint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR)Sujith Ravi, Zornitsa KozarevaNegative Dependence: Theory and Applications in Machine LearningJennifer Gillenwater, Alex KuleszaReinforcement Learning for Real LifeLihong LiCraig BoutilierUncertainty and Robustness in Deep LearningJustin GilmerTheoretical Physics for Deep LearningJaehoon Lee, Jeffrey Pennington, Yasaman BahriWorkshop on the Security and Privacy of Machine LearningNicolas PapernotBeen KimExploration in Reinforcement Learning WorkshopBenjamin Eysenbach, Surya Bhupatiraju, Shixiang GuICML Workshop on Imitation, Intent, and Interaction (I3) Sergey Levine, Chelsea FinnPierre SermanetIdentifying and Understanding Deep Learning PhenomenaHanie Sedghi, Samy Bengio, Kenji Hata, Maithra Raghu, Ali Rahimi, Ying XiaoWorkshop on Multi-Task and Lifelong Reinforcement LearningSarath Chandar, Chelsea FinnKarol Hausman, Sergey LevineWorkshop on Self-Supervised LearningPierre SermanetInvertible Neural Networks and Normalizing Flows Rianne Van den Berg, Danilo J. RezendeEric Jang, Laurent Dinh

Introducing Google Research Football: A Novel Reinforcement Learning Environment

Friday, June 7, 2019

Posted by Karol Kurach, Research Lead and Olivier Bachem, Research Scientist, Google Research, Zürichreinforcement learningroboticsself-driving carsmoreAtari console gamesgame of GoDota 2Starcraft 2Google Research Football Environmentopen-source code on GithubFootball Engine Gameplay Football

The Football Engine is an advanced football simulation that supports all the major football rules such as kickoffs (top left), goals (top right), fouls, cards (bottom left), corner and penalty kicks (bottom right), and offside.

stochasticOpenAI GymFootball Benchmarksrule-basedDQNIMPALAsample-efficient RLsparse rewardsmodel based RL

The average goal difference of agent versus opponent at different difficulty levels for different baselines. The Easy opponent can be beaten by a DQN agent trained for 20 million steps, while the Medium and Hard opponents require a distributed algorithm such as IMPALA that is trained for 200 million steps.

Football Academy & Future Directionscurriculum learning

Top: A successful policy that runs towards the goal (as required, since a number of opponents chase our player) and scores against the goal-keeper. Second: A beautiful way to drive and finish a counter-attack. Third: A simple way to solve a 2-vs-1 play. Bottom: The agent scores after a corner kick.

AcknowledgmentsThis project was undertaken together with Anton Raichuk, Piotr Stańczyk, Michał Zając, Lasse Espeholt, Carlos Riquelme, Damien Vincent‎, Marcin Michalski, Olivier Bousquet‎ and Sylvain Gelly at Google Research, Zürich. We also wish to thank Lucas Beyer, Nal Kalchbrenner, Tim Salimans and the rest of the Google Brain team for helpful discussions, comments, technical help and code contributions. Finally, we would like to thank Bastiaan Konings Schuiling, who authored and open-sourced the original version of this game.

Blog

Off-Policy Classification - A New Reinforcement Learning Model Selection Method

Google at CVPR 2019

Applying AutoML to Transformer Architectures

Google at ICML 2019

Introducing Google Research Football: A Novel Reinforcement Learning Environment

Labels

Archive

Feed