Learning to Refine Object Segments

ECCV
October 10, 2016

Abstract

Object segmentation requires both object-level information and low-level pixel data. This presents a challenge for feedforward networks: lower layers in convolutional nets capture rich spatial information, while upper layers encode object-level knowledge but are invariant to factors such as pose and appearance. In this work we propose to augment feedforward nets for object segmentation with a novel top-down refinement approach. The resulting bottom-up/top-down architecture is capable of efficiently generating high-fidelity object masks. Similarly to skip connections, our approach leverages features at all layers of the net. Unlike skip connections, our approach does not attempt to output independent predictions at each layer. Instead, we first output a coarse `mask encoding' in a feedforward pass, then refine this mask encoding in a top-down pass utilizing features at successively lower layers. The approach is simple, fast, and effective. Building on the recent DeepMask network for generating object proposals, we show accuracy improvements of 10-20% in average recall for various setups. Additionally, by optimizing the overall network architecture, our approach, which we call SharpMask, is 50% faster than the original DeepMask network (under .8s per image).

Related Blog Posts

Learning to Segment
by Piotr DollarAug 25, 2016
fastText
by Armand Joulin, Edouard Grave, Piotr Bojanowski, Tomas MikolovAug 18, 2016
Facebook Researchers Focus on the Most Challenging Machine Learning Questions at ICML 2016
by Jason Weston, Leon Bottou, Joaquin Quinonero Candela, Hussein Mehanna, Pierre Andrews, Aditya Kalro, Alexander Sidorov, Ronan Collobert, Armand Joulin, Laurens van der Maaten, David Grangier, Tomas Mikolov, Antoine Bordes, Rob Fergus, Lars Backstrom, Ross GirshickJun 19, 2016

Related Publications

Learning to Refine Object Segments
by Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, Piotr DollarECCVOct 10, 2016
Polysemous Codes
by Matthijs Douze, Hervé Jégou, Florent PerronninEuropean Conference on Computer Vision 2016 (ECCV)Oct 10, 2016
A MultiPath Network for Object Detection
by Sergey Zagoruyko, Adam Lerer, Tsung-Yi Lin, Pedro O. Pinheiro, Sam Gross, Soumith Chintala, Piotr DollarBMVCSep 18, 2016
Joint Learning of Speaker and Phonetic Similarities with Siamese Networks
by Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel DupouxInterspeech 2016Sep 8, 2016

Join Us

Do you want to help more than a billion people all over the world connect and share?

View Open Positions

Code

Learn about our open source tools and technologies, our challenging scaling experiences, and more.

Go to Facebook Code