Computer Vision and Pattern Recognition

New submissions

Submissions received from Wed 20 Mar 19 to Thu 21 Mar 19, announced Fri, 22 Mar 19

New submissions
Cross-lists
Replacements

[ total of 44 entries: 1-44 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 22 Mar 19

[1] arXiv:1903.08649 [pdf, other]: Title: Face Detection in Repeated Settings

Authors: Mohammad Nayeem Teli, Bruce A. Draper, J. Ross Beveridge

Comments: 14 pages, 21 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Face detection is an important first step before face verification and recognition. In unconstrained settings it is still an open challenge because of the variation in pose, lighting, scale, background and location. However, for the purposes of verification we can have a control on background and location. Images are primarily captured in places such as the entrance to a sensitive building, in front of a door or some location where the background does not change. We present a correlation based face detection algorithm to detect faces in such settings, where we control the location, and leave lighting, pose, and scale uncontrolled. In these scenarios the results indicate that our algorithm is easy and fast to train, outperforms Viola and Jones face detection accuracy and is faster to test.
[2] arXiv:1903.08682 [pdf, other]: Title: Im2Pencil: Controllable Pencil Illustration from Photographs

Authors: Yijun Li, Chen Fang, Aaron Hertzmann, Eli Shechtman, Ming-Hsuan Yang

Comments: Accepted by CVPR 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We propose a high-quality photo-to-pencil translation method with fine-grained control over the drawing style. This is a challenging task due to multiple stroke types (e.g., outline and shading), structural complexity of pencil shading (e.g., hatching), and the lack of aligned training data pairs. To address these challenges, we develop a two-branch model that learns separate filters for generating sketchy outlines and tonal shading from a collection of pencil drawings. We create training data pairs by extracting clean outlines and tonal illustrations from original pencil drawings using image filtering techniques, and we manually label the drawing styles. In addition, our model creates different pencil styles (e.g., line sketchiness and shading style) in a user-controllable manner. Experimental results on different types of pencil drawings show that the proposed algorithm performs favorably against existing methods in terms of quality, diversity and user evaluations.
[3] arXiv:1903.08701 [pdf, other]: Title: LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving

Authors: Gregory P. Meyer, Ankit Laddha, Eric Kee, Carlos Vallespi-Gonzalez, Carl K. Wellington

Comments: Accepted for publication at CVPR 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)

In this paper, we present LaserNet, a computationally efficient method for 3D object detection from LiDAR data for autonomous driving. The efficiency results from processing LiDAR data in the native range view of the sensor, where the input data is naturally compact. Operating in the range view involves well known challenges for learning, including occlusion and scale variation, but it also provides contextual information based on how the sensor data was captured. Our approach uses a fully convolutional network to predict a multimodal distribution over 3D boxes for each point and then it efficiently fuses these distributions to generate a prediction for each object. Experiments show that modeling each detection as a distribution rather than a single deterministic box leads to better overall detection performance. Benchmark results show that this approach has significantly lower runtime than other recent detectors and that it achieves state-of-the-art performance when compared on a large dataset that has enough data to overcome the challenges of training on the range view.
[4] arXiv:1903.08746 [pdf, other]: Title: Affordance Learning In Direct Perception for Autonomous Driving

Authors: Chen Sun, Jean M. Uwabeza Vianney, Dongpu Cao

Comments: 9 pages, 13 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR)

Recent development in autonomous driving involves high-level computer vision and detailed road scene understanding. Today, most autonomous vehicles are using mediated perception approach for path planning and control, which highly rely on high-definition 3D maps and real time sensors. Recent research efforts aim to substitute the massive HD maps with coarse road attributes. In this paper, we follow the direct perception based method to train a deep neural network for affordance learning in autonomous driving. Our goal in this work is to develop the affordance learning model based on freely available Google Street View panoramas and Open Street Map road vector attributes. Driving scene understanding can be achieved by learning affordances from the images captured by car-mounted cameras. Such scene understanding by learning affordances may be useful for corroborating base maps such as HD maps so that the required data storage space is minimized and available for processing in real time. We compare capability in road attribute identification between human volunteers and our model by experimental evaluation. Our results indicate that this method could act as a cheaper way for training data collection in autonomous driving. The cross validation results also indicate the effectiveness of our model.
[5] arXiv:1903.08773 [pdf, other]: Title: Robust Image Segmentation Quality Assessment without Ground Truth

Authors: Leixin Zhou, Wenxiang Deng, Xiaodong Wu

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep learning based image segmentation methods have achieved great success, even having human-level accuracy in some applications. However, due to the black box nature of deep learning, the best method may fail in some situations. Thus predicting segmentation quality without ground truth would be very crucial especially in clinical practice. Recently, people proposed to train neural networks to estimate the quality score by regression. Although it can achieve promising prediction accuracy, the network suffers robustness problem, e.g. it is vulnerable to adversarial attacks. In this paper, we propose to alleviate this problem by utilizing the difference between the input image and the reconstructed image, which is reconstructed from the segmentation to be assessed. The deep learning based reconstruction network (REC-Net) is trained with the input image masked by the ground truth segmentation against the original input image as the target. The rationale behind is that the trained REC-Net can best reconstruct the input image masked by accurate segmentation. The quality score regression network (REG-Net) is then trained with difference images and the corresponding segmentations as input. In this way, the regression network may have lower chance to overfit to the undesired image features from the original input image, and thus is more robust. Results on ACDC17 dataset demonstrated our method is promising.
[6] arXiv:1903.08811 [pdf, other]: Title: Networks for Joint Affine and Non-parametric Image Registration

Authors: Zhengyang Shen, Xu Han, Zhenlin Xu, Marc Niethammer

Comments: Accepted to CVPR 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We introduce an end-to-end deep-learning framework for 3D medical image registration. In contrast to existing approaches, our framework combines two registration methods: an affine registration and a vector momentum-parameterized stationary velocity field (vSVF) model. Specifically, it consists of three stages. In the first stage, a multi-step affine network predicts affine transform parameters. In the second stage, we use a Unet-like network to generate a momentum, from which a velocity field can be computed via smoothing. Finally, in the third stage, we employ a self-iterable map-based vSVF component to provide a non-parametric refinement based on the current estimate of the transformation map. Once the model is trained, a registration is completed in one forward pass. To evaluate the performance, we conducted longitudinal and cross-subject experiments on 3D magnetic resonance images (MRI) of the knee of the Osteoarthritis Initiative (OAI) dataset. Results show that our framework achieves comparable performance to state-of-the-art medical image registration approaches, but it is much faster, with a better control of transformation regularity including the ability to produce approximately symmetric transformations, and combining affine and non-parametric registration.
[7] arXiv:1903.08814 [pdf]: Title: Prostate Segmentation from Ultrasound Images using Residual Fully Convolutional Network

Authors: M. S. Hossain, A. P. Paplinski, J. M. Betts

Comments: 6 pages, 4 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Medical imaging based prostate cancer diagnosis procedure uses intra-operative transrectal ultrasound (TRUS) imaging to visualize the prostate shape and location to collect tissue samples. Correct tissue sampling from prostate requires accurate prostate segmentation in TRUS images. To achieve this, this study uses a novel residual connection based fully convolutional network. The advantage of this segmentation technique is that it requires no pre-processing of TRUS images to perform the segmentation. Thus, it offers a faster and straightforward prostate segmentation from TRUS images. Results show that the proposed technique can achieve around 86% Dice Similarity accuracy using only few TRUS datasets.
[8] arXiv:1903.08817 [pdf, other]: Title: Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration

Authors: Xing Liu, Masanori Suganuma, Zhun Sun, Takayuki Okatani

Comments: Accepted to CVPR 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we study design of deep neural networks for tasks of image restoration. We propose a novel style of residual connections dubbed "dual residual connection", which exploits the potential of paired operations, e.g., up- and down-sampling or convolution with large- and small-size kernels. We design a modular block implementing this connection style; it is equipped with two containers to which arbitrary paired operations are inserted. Adopting the "unraveled" view of the residual networks proposed by Veit et al., we point out that a stack of the proposed modular blocks allows the first operation in a block interact with the second operation in any subsequent blocks. Specifying the two operations in each of the stacked blocks, we build a complete network for each individual task of image restoration. We experimentally evaluate the proposed approach on five image restoration tasks using nine datasets. The results show that the proposed networks with properly chosen paired operations outperform previous methods on almost all of the tasks and datasets.
[9] arXiv:1903.08831 [pdf]: Title: Non-target Structural Displacement Measurement Using Reference Frame Based Deepflow

Authors: Jongbin Won, Jong-Woong Park, Do-Soo Moon

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Structural displacement is crucial for structural health monitoring, although it is very challenging to measure in field conditions. Most existing displacement measurement methods are costly, labor intensive, and insufficiently accurate for measuring small dynamic displacements. Computer vision (CV) based methods incorporate optical devices with advanced image processing algorithms to accurately, cost-effectively, and remotely measure structural displacement with easy installation. However, non-target based CV methods are still limited by insufficient feature points, incorrect feature point detection, occlusion, and drift induced by tracking error accumulation. This paper presents a reference frame based Deepflow algorithm integrated with masking and signal filtering for non-target based displacement measurements. The proposed method allows the user to select points of interest for images with a low gradient for displacement tracking and directly calculate displacement without drift accumulated by measurement error. The proposed method is experimentally validated on a cantilevered beam under ambient and occluded test conditions. The accuracy of the proposed method is compared with that of a reference laser displacement sensor for validation. The significant advantage of the proposed method is its flexibility in extracting structural displacement in any region on structures that do not have distinct natural features.
[10] arXiv:1903.08836 [pdf, other]: Title: Towards Robust Curve Text Detection with Conditional Spatial Expansion

Authors: Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, Wang Ling Goh

Comments: This paper has been accepted by IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2019)

Subjects: Computer Vision and Pattern Recognition (cs.CV)

It is challenging to detect curve texts due to their irregular shapes and varying sizes. In this paper, we first investigate the deficiency of the existing curve detection methods and then propose a novel Conditional Spatial Expansion (CSE) mechanism to improve the performance of curve text detection. Instead of regarding the curve text detection as a polygon regression or a segmentation problem, we treat it as a region expansion process. Our CSE starts with a seed arbitrarily initialized within a text region and progressively merges neighborhood regions based on the extracted local features by a CNN and contextual information of merged regions. The CSE is highly parameterized and can be seamlessly integrated into existing object detection frameworks. Enhanced by the data-dependent CSE mechanism, our curve text detection system provides robust instance-level text region extraction with minimal post-processing. The analysis experiment shows that our CSE can handle texts with various shapes, sizes, and orientations, and can effectively suppress the false-positives coming from text-like textures or unexpected texts included in the same RoI. Compared with the existing curve text detection algorithms, our method is more robust and enjoys a simpler processing flow. It also creates a new state-of-art performance on curve text benchmarks with F-score of up to 78.4$\%$.
[11] arXiv:1903.08839 [pdf, other]: Title: Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

Authors: Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Liang Lin

Comments: Accepted as a CVPR 2019 oral paper

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures. However, the generalizability to different environments remains an elusive goal. In this work, we propose a geometry-aware 3D representation for the human pose to address this limitation by using multiple views in a simple auto-encoder model at the training stage and only 2D keypoint information as supervision. A view synthesis framework is proposed to learn the shared 3D representation between viewpoints with synthesizing the human pose from one viewpoint to the other one. Instead of performing a direct transfer in the raw image-level, we propose a skeleton-based encoder-decoder mechanism to distil only pose-related representation in the latent space. A learning-based representation consistency constraint is further introduced to facilitate the robustness of latent 3D representation. Since the learnt representation encodes 3D geometry information, mapping it to 3D pose will be much easier than conventional frameworks that use an image or 2D coordinates as the input of 3D pose estimator. We demonstrate our approach on the task of 3D human pose estimation. Comprehensive experiments on three popular benchmarks show that our model can significantly improve the performance of state-of-the-art methods with simply injecting the representation as a robust 3D prior.
[12] arXiv:1903.08847 [pdf]: Title: Parametic Classification of Handvein Patterns Based on Texture Features

Authors: Harbi AlMahafzah, Mohammad Imranand, Supreetha Gowda H.D.

Comments: 8 pages, International Conference on Electrical, Electronics, Materials and Applied Science (ICEEMAS). AIP: Proceedings International Conference on Electrical, Electronics, Materials and Applied Science (ICEEMAS),22nd and 23rd December 2017

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we have developed Biometric recognition system adopting hand based modality Handvein, which has the unique pattern for each individual and it is impossible to counterfeit and fabricate as it is an internal feature. We have opted in choosing feature extraction algorithms such as LBP-visual descriptor ,LPQ-blur insensitive texture operator, Log-Gabor-Texture descriptor. We have chosen well known classifiers such as KNN and SVM for classification. We have experimented and tabulated results of single algorithm recognition rate for Handvein under different distance measures and kernel options. The feature level fusion is carried out which increased the performance level.
[13] arXiv:1903.08863 [pdf, other]: Title: Learning Disentangled Representations of Satellite Image Time Series

Authors: Eduardo Sanchez (IRIT), Mathieu Serrurier (IRIT), Mathias Ortner

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

In this paper, we investigate how to learn a suitable representation of satellite image time series in an unsupervised manner by leveraging large amounts of unlabeled data. Additionally , we aim to disentangle the representation of time series into two representations: a shared representation that captures the common information between the images of a time series and an exclusive representation that contains the specific information of each image of the time series. To address these issues, we propose a model that combines a novel component called cross-domain autoencoders with the variational autoencoder (VAE) and generative ad-versarial network (GAN) methods. In order to learn disentangled representations of time series, our model learns the multimodal image-to-image translation task. We train our model using satellite image time series from the Sentinel-2 mission. Several experiments are carried out to evaluate the obtained representations. We show that these disentangled representations can be very useful to perform multiple tasks such as image classification, image retrieval, image segmentation and change detection.
[14] arXiv:1903.08888 [pdf, other]: Title: Tensor-Ring Nuclear Norm Minimization and Application for Visual Data Completion

Authors: Jinshi Yu, Chao Li, Qibin Zhao, Guoxu Zhou

Comments: This paper has been accepted by ICASSP 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Tensor ring (TR) decomposition has been successfully used to obtain the state-of-the-art performance in the visual data completion problem. However, the existing TR-based completion methods are severely non-convex and computationally demanding. In addition, the determination of the optimal TR rank is a tough work in practice. To overcome these drawbacks, we first introduce a class of new tensor nuclear norms by using tensor circular unfolding. Then we theoretically establish connection between the rank of the circularly-unfolded matrices and the TR ranks. We also develop an efficient tensor completion algorithm by minimizing the proposed tensor nuclear norm. Extensive experimental results demonstrate that our proposed tensor completion method outperforms the conventional tensor completion methods in the image/video in-painting problem with striped missing values.
[15] arXiv:1903.08890 [pdf, other]: Title: Context-Constrained Accurate Contour Extraction for Occlusion Edge Detection

Authors: Rui Lu, Menghan Zhou, Anlong Ming, Yu Zhou

Comments: To appear in ICME 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Occlusion edge detection requires both accurate locations and context constraints of the contour. Existing CNN-based pipeline does not utilize adaptive methods to filter the noise introduced by low-level features. To address this dilemma, we propose a novel Context-constrained accurate Contour Extraction Network (CCENet). Spatial details are retained and contour-sensitive context is augmented through two extraction blocks, respectively. Then, an elaborately designed fusion module is available to integrate features, which plays a complementary role to restore details and remove clutter. Weight response of attention mechanism is eventually utilized to enhance occluded contours and suppress noise. The proposed CCENet significantly surpasses state-of-the-art methods on PIOD and BSDS ownership dataset of object edge detection and occlusion orientation detection.
[16] arXiv:1903.08923 [pdf, other]: Title: Learning with Batch-wise Optimal Transport Loss for 3D Shape Recognition

Authors: Lin Xu, Han Sun, Yuai Liu

Comments: 10 pages, 4 figures Accepted by CVPR2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Deep metric learning is essential for visual recognition. The widely used pair-wise (or triplet) based loss objectives cannot make full use of semantical information in training samples or give enough attention to those hard samples during optimization. Thus, they often suffer from a slow convergence rate and inferior performance. In this paper, we show how to learn an importance-driven distance metric via optimal transport programming from batches of samples. It can automatically emphasize hard examples and lead to significant improvements in convergence. We propose a new batch-wise optimal transport loss and combine it in an end-to-end deep metric learning manner. We use it to learn the distance metric and deep feature representation jointly for recognition. Empirical results on visual retrieval and classification tasks with six benchmark datasets, i.e., MNIST, CIFAR10, SHREC13, SHREC14, ModelNet10, and ModelNet40, demonstrate the superiority of the proposed method. It can accelerate the convergence rate significantly while achieving a state-of-the-art recognition performance. For example, in 3D shape recognition experiments, we show that our method can achieve better recognition performance within only 5 epochs than what can be obtained by mainstream 3D shape recognition approaches after 200 epochs.
[17] arXiv:1903.08943 [pdf, other]: Title: The CASE Dataset of Candidate Spaces for Advert Implantation

Authors: Soumyabrata Dev, Murhaf Hossari, Matthew Nicholson, Killian McCabe, Atul Nautiyal, Clare Conran, Jian Tang, Wei Xu, François Pitié

Journal-ref: Published in Proc. International Conference on Machine Vision Applications (MVA), 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the advent of faster internet services and growth of multimedia content, we observe a massive growth in the number of online videos. The users generate these video contents at an unprecedented rate, owing to the use of smart-phones and other hand-held video capturing devices. This creates immense potential for the advertising and marketing agencies to create personalized content for the users. In this paper, we attempt to assist the video editors to generate augmented video content, by proposing candidate spaces in video frames. We propose and release a large-scale dataset of outdoor scenes, along with manually annotated maps for candidate spaces. We also benchmark several deep-learning based semantic segmentation algorithms on this proposed dataset.
[18] arXiv:1903.08960 [pdf, other]: Title: Short-Term Prediction and Multi-Camera Fusion on Semantic Grids

Authors: Lukas Hoyer, Patrick Kesper, Volker Fischer

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

An environment representation (ER) is a substantial part of every autonomous system. It introduces a common interface between perception and other system components, such as decision making, and allows downstream algorithms to deal with abstracted data without knowledge of the used sensor. In this work, we propose and evaluate a novel architecture that generates an egocentric, grid-based, predictive, and semantically-interpretable ER. In particular, we provide a proof of concept for the spatio-temporal fusion of multiple camera sequences and short-term prediction in such an ER. Our design utilizes a strong semantic segmentation network together with depth and egomotion estimates to first extract semantic information from multiple camera streams and then transform these separately into egocentric temporally-aligned bird's-eye view grids. A deep encoder-decoder network is trained to fuse a stack of these grids into a unified semantic grid representation and to predict the dynamics of its surrounding. We evaluate this representation on real-world sequences of the Cityscapes dataset and show that our architecture can make accurate predictions in complex sensor fusion scenarios and significantly outperforms a model-driven baseline in a category-based evaluation.
[19] arXiv:1903.09021 [pdf, ps, other]: Title: Localization of Unmanned Aerial Vehicles in Corridor Environments using Deep Learning

Authors: Ram Prasad Padhy, Shahzad Ahmad, Sachin Verma, Pankaj Kumar Sa, Sambit Bakshi

Comments: 9 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Vision-based pose estimation of Unmanned Aerial Vehicles (UAV) in unknown environments is a rapidly growing research area in the field of robot vision. The task becomes more complex when the only available sensor is a static single camera (monocular vision). In this regard, we propose a monocular vision assisted localization algorithm, that will help a UAV to navigate safely in indoor corridor environments. Always, the aim is to navigate the UAV through a corridor in the forward direction by keeping it at the center with no orientation either to the left or right side. The algorithm makes use of the RGB image, captured from the UAV front camera, and passes it through a trained deep neural network (DNN) to predict the position of the UAV as either on the left or center or right side of the corridor. Depending upon the divergence of the UAV with respect to the central bisector line (CBL) of the corridor, a suitable command is generated to bring the UAV to the center. When the UAV is at the center of the corridor, a new image is passed through another trained DNN to predict the orientation of the UAV with respect to the CBL of the corridor. If the UAV is either left or right tilted, an appropriate command is generated to rectify the orientation. We also propose a new corridor dataset, named NITRCorrV1, which contains images as captured by the UAV front camera when the UAV is at all possible locations of a variety of corridors. An exhaustive set of experiments in different corridors reveal the efficacy of the proposed algorithm.
[20] arXiv:1903.09036 [pdf, other]: Title: Megapixel Photon-Counting Color Imaging using Quanta Image Sensor

Authors: Abhiram Gnanasambandam, Omar Elgendy, Jiaju Ma, and Stanley H. Chan

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Quanta Image Sensor (QIS) is a single-photon detector designed for extremely low light imaging conditions. Majority of the existing QIS prototypes are monochrome based on single-photon avalanche diodes (SPAD). Color imaging has not been demonstrated with single-photon detectors due to the intrinsic difficulty of shrinking the pixel size and increasing the spatial resolution while maintaining acceptable intra-pixel cross-talk. In this paper, we present image reconstruction of the first color QIS with a resolution of $1024 \times 1024$ pixels, supporting both single-bit and multi-bit photon counting capability. Our color image reconstruction is enabled by a customized joint demosaicing-denoising algorithm, leveraging truncated Poisson statistics and variance stabilizing transforms. Experimental results of the new sensor and algorithm demonstrate superior color imaging performance for very low-light conditions with a mean exposure of as low as a few photons per pixel.
[21] arXiv:1903.09067 [pdf, other]: Title: An Efficient Solution to Non-Minimal Case Essential Matrix Estimation

Authors: Ji Zhao

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Finding relative pose between two calibrated views is a fundamental task in computer vision. Given the minimal number $5$ of required point correspondences, the classical five-point method can be used to calculate the essential matrix. For the non-minimal cases when $N$ ($N > 5$) correct point correspondences are given, which is called $N$-point problem, methods are relatively less mature. In this paper, we solve the $N$-point problem by minimizing the algebraic error and formulate it as a quadratically constrained quadratic program (QCQP). The formulation is based on a simpler parameterization of the feasible region -- the normalized essential matrix manifold -- than previous approaches. Then a globally optimal solution to this problem is obtained by semidefinite relaxation. This allows us to obtain certifiably global solutions to an important non-convex problem in polynomial time. We provide the condition to recover the optimal essential matrix from the relaxed problems. The theoretical guarantees of the semidefinite relaxation are investigated, including the tightness and local stability. Experiments demonstrate that our approach always finds and certifies (a-posteriori) the global optimum of the cost function, and it is dozens of times faster than state-of-the-art globally optimal solutions.
[22] arXiv:1903.09073 [pdf, other]: Title: Quotienting Impertinent Camera Kinematics for 3D Video Stabilization

Authors: Thomas W. Mitchel, Christian Wuelker, Jin Seob Kim, Sipu Ruan, Gregory S. Chirikjian

Subjects: Computer Vision and Pattern Recognition (cs.CV)

With the recent advent of methods that allow for real-time computation, dense 3D flows have become a viable basis for fast camera motion estimation. Most importantly, dense flows are more robust than the sparse feature matching techniques used by existing 3D stabilization methods, able to better handle large camera displacements and occlusions similar to those often found in consumer videos. Here we introduce a framework for 3D video stabilization that relies on dense scene flow alone. The foundation of this approach is a novel camera motion model that allows for real-world camera poses to be recovered directly from 3D motion fields. Moreover, this model can be extended to describe certain types of non-rigid artifacts that are commonly found in videos, such as those resulting from zooms. This framework gives rise to several robust regimes that produce high-quality stabilization of the kind achieved by prior full 3D methods while avoiding the fragility typically present in feature-based approaches. As an added benefit, our framework is fast: the simplicity of our motion model and efficient flow calculations combine to enable stabilization at a high frame rate.
[23] arXiv:1903.09107 [pdf, other]: Title: Levelling the Playing Field: A Comprehensive Comparison of Visual Place Recognition Approaches under Changing Conditions

Authors: Mubariz Zaffar, Ahmad Khaliq, Shoaib Ehsan, Michael Milford, Klaus McDonald-Maier

Comments: 8 pages, 11 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In recent years there has been significant improvement in the capability of Visual Place Recognition (VPR) methods, building on the success of both hand-crafted and learnt visual features, temporal filtering and usage of semantic scene information. The wide range of approaches and the relatively recent growth in interest in the field has meant that a wide range of datasets and assessment methodologies have been proposed, often with a focus only on precision-recall type metrics, making comparison difficult. In this paper we present a comprehensive approach to evaluating the performance of 10 state-of-the-art recently-developed VPR techniques, which utilizes three standardized metrics: (a) Matching Performance b) Matching Time c) Memory Footprint. Together this analysis provides an up-to-date and widely encompassing snapshot of the various strengths and weaknesses of contemporary approaches to the VPR problem. The aim of this work is to help move this particular research field towards a more mature and unified approach to the problem, enabling better comparison and hence more progress to be made in future research.
[24] arXiv:1903.09115 [pdf, other]: Title: Closed-Form Optimal Triangulation Based on Angular Errors

Authors: Seong Hun Lee, Javier Civera

Subjects: Computer Vision and Pattern Recognition (cs.CV)

In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose. By formulating the triangulation problem as $L_1$ and $L_\infty$ minimization of angular reprojection errors, we derive the exact closed-form solutions that guarantee global optimality under respective cost functions. To the best of our knowledge, we are the first to present such solutions. Since the angular error is rotationally invariant, our solutions can be applied for any type of central cameras, be it perspective, fisheye or omnidirectional. Our methods also require significantly less computation than the existing optimal methods. Experimental results on synthetic and real datasets validate our theoretical derivations.
[25] arXiv:1903.09123 [pdf, other]: Title: PProCRC: Probabilistic Collaboration of Image Patches

Authors: Tapabrata Chakraborti, Brendan McCane, Steven mills, Umapada Pal

Comments: Submitted to IEEE Trans. Pattern Analysis and Machine Intelligence

Subjects: Computer Vision and Pattern Recognition (cs.CV)

We present a conditional probabilistic framework for collaborative representation of image patches. It in-corporates background compensation and outlier patch suppression into the main formulation itself, thus doingaway with the need for pre-processing steps to handle the same. A closed form non-iterative solution of the costfunction is derived. The proposed method (PProCRC) outperforms earlier related patch based (PCRC, GP-CRC)as well as the state-of-the-art probabilistic (ProCRC and EProCRC) models on several fine-grained benchmarkimage datasets for face recognition (AR and LFW) and species recognition (Oxford Flowers and Pets) tasks.We also expand our recent endemic Indian birds (IndBirds) dataset and report results on it. The demo code andIndBirds dataset are available through lead author.
[26] arXiv:1903.09126 [pdf, other]: Title: Progressive Sparse Local Attention for Video object detection

Authors: Chaoxu Guo, Bin Fan, Jie Gu, Qian Zhang, Shiming Xiang, Veronique Prinet, Chunhong Pan

Subjects: Computer Vision and Pattern Recognition (cs.CV)

Transferring image-based object detectors to domain of videos remains a challenging problem. Previous efforts mostly exploit optical flow to propagate features across frames, aiming to achieve a good trade-off between performance and computational complexity. However, introducing an extra model to estimate optical flow would significantly increase the overall model size. The gap between optical flow and high-level features can hinder it from establishing the spatial correspondence accurately. Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressive sparse strides and uses the correspondence to propagate features. Based on PSLA, Recursive Feature Updating (RFU) and Dense feature Transforming (DFT) are introduced to model temporal appearance and enrich feature representation respectively. Finally, a novel framework for video object detection is proposed. Experiments on ImageNet VID are conducted. Our framework achieves a state-of-the-art speed-accuracy trade-off with significantly reduced model capacity.

Cross-lists for Fri, 22 Mar 19

[27] arXiv:1903.08671 (cross-list from cs.LG) [pdf, other]: Title: Online continual learning with no task boundaries

Authors: Rahaf Aljundi, Min Lin, Baptiste Goujaud, Yoshua Bengio

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Continual learning is the ability of an agent to learn online with a non-stationary and never-ending stream of data. A key component for such never-ending learning process is to overcome the catastrophic forgetting of previously seen data, a problem that neural networks are well known to suffer from. The solutions developed so far often relax the problem of continual learning to the easier task-incremental setting, where the stream of data is divided into tasks with clear boundaries. In this paper, we break the limits and move to the more challenging online setting where we assume no information of tasks in the data stream. We start from the idea that each learning step should not increase the losses of the previously learned examples through constraining the optimization process. This means that the number of constraints grows linearly with the number of examples, which is a serious limitation. We develop a solution to select a fixed number of constraints that we use to approximate the feasible region defined by the original constraints. We compare our approach against the methods that rely on task boundaries to select a fixed set of examples, and show comparable or even better results, especially when the boundaries are blurry or when the data distributions are imbalanced.
[28] arXiv:1903.08689 (cross-list from cs.LG) [pdf, other]: Title: Implicit Generation and Generalization in Energy-Based Models

Authors: Yilun Du, Igor Mordatch

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train. We present techniques to scale MCMC based EBM training, on continuous neural networks, and show its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving significantly better samples than other likelihood models and on par with contemporary GAN approaches, while covering all modes of the data. We highlight unique capabilities of implicit generation, such as energy compositionality and corrupt image reconstruction and completion. Finally, we show that EBMs generalize well and are able to achieve state-of-the-art out-of-distribution classification, exhibit adversarially robust classification, coherent long term predicted trajectory roll-outs, and generate zero-shot compositions of models.
[29] arXiv:1903.08858 (cross-list from cs.LG) [pdf, other]: Title: Classification of EEG-Based Brain Connectivity Networks in Schizophrenia Using a Multi-Domain Connectome Convolutional Neural Network

Authors: Chun-Ren Phang, Chee-Ming Ting, Fuad Noman, Hernando Ombao

Comments: 15 pages, 9 figures

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML)

We exploit altered patterns in brain functional connectivity as features for automatic discriminative analysis of neuropsychiatric patients. Deep learning methods have been introduced to functional network classification only very recently for fMRI, and the proposed architectures essentially focused on a single type of connectivity measure. We propose a deep convolutional neural network (CNN) framework for classification of electroencephalogram (EEG)-derived brain connectome in schizophrenia (SZ). To capture complementary aspects of disrupted connectivity in SZ, we explore combination of various connectivity features consisting of time and frequency-domain metrics of effective connectivity based on vector autoregressive model and partial directed coherence, and complex network measures of network topology. We design a novel multi-domain connectome CNN (MDC-CNN) based on a parallel ensemble of 1D and 2D CNNs to integrate the features from various domains and dimensions using different fusion strategies. Hierarchical latent representations learned by the multiple convolutional layers from EEG connectivity reveal apparent group differences between SZ and healthy controls (HC). Results on a large resting-state EEG dataset show that the proposed CNNs significantly outperform traditional support vector machine classifiers. The MDC-CNN with combined connectivity features further improves performance over single-domain CNNs using individual features, achieving remarkable accuracy of $93.06\%$ with a decision-level fusion. The proposed MDC-CNN by integrating information from diverse brain connectivity descriptors is able to accurately discriminate SZ from HC. The new framework is potentially useful for developing diagnostic tools for SZ and other disorders.
[30] arXiv:1903.08871 (cross-list from stat.ML) [pdf, other]: Title: Individualized Multilayer Tensor Learning with An Application in Imaging Analysis

Authors: Xiwei Tang, Xuan Bi, Annie Qu

Subjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)

This work is motivated by multimodality breast cancer imaging data, which is quite challenging in that the signals of discrete tumor-associated microvesicles (TMVs) are randomly distributed with heterogeneous patterns. This imposes a significant challenge for conventional imaging regression and dimension reduction models assuming a homogeneous feature structure. We develop an innovative multilayer tensor learning method to incorporate heterogeneity to a higher-order tensor decomposition and predict disease status effectively through utilizing subject-wise imaging features and multimodality information. Specifically, we construct a multilayer decomposition which leverages an individualized imaging layer in addition to a modality-specific tensor structure. One major advantage of our approach is that we are able to efficiently capture the heterogeneous spatial features of signals that are not characterized by a population structure as well as integrating multimodality information simultaneously. To achieve scalable computing, we develop a new bi-level block improvement algorithm. In theory, we investigate both the algorithm convergence property, tensor signal recovery error bound and asymptotic consistency for prediction model estimation. We also apply the proposed method for simulated and human breast cancer imaging data. Numerical results demonstrate that the proposed method outperforms other existing competing methods.

Replacements for Fri, 22 Mar 19

[31] arXiv:1802.03518 (replaced) [pdf, other]: Title: Hydra: an Ensemble of Convolutional Neural Networks for Geospatial Land Classification

Authors: Rodrigo Minetto, Mauricio Pamplona Segundo, Sudeep Sarkar

Comments: 12 pages, 14 figures, 5 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[32] arXiv:1803.09420 (replaced) [pdf, other]: Title: Multi-scale Processing of Noisy Images using Edge Preservation Losses

Authors: Nati Ofir, Yosi Keller

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[33] arXiv:1808.00739 (replaced) [pdf, other]: Title: Deeply Self-Supervised Contour Embedded Neural Network Applied to Liver Segmentation

Authors: Minyoung Chung, Jingyu Lee, Minkyung Lee, Jeongjin Lee, Yeong-Gil Shin

Comments: 10 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[34] arXiv:1810.05680 (replaced) [pdf, other]: Title: Bottom-up Attention, Models of

Authors: Ali Borji, Hamed R. Tavakoli, Zoya Bylinskii

Comments: arXiv admin note: substantial text overlap with arXiv:1810.03716

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[35] arXiv:1811.03529 (replaced) [pdf, other]: Title: Memorable Maps: A Framework for Re-defining Places in Visual Place Recognition

Authors: Mubariz Zaffar, Shoaib Ehsan, Michael Milford, Klaus Mcdonald Maier

Comments: 13 pages, 25 figures, 1 table

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
[36] arXiv:1811.11286 (replaced) [pdf, other]: Title: Patch-based Progressive 3D Point Set Upsampling

Authors: Wang Yifan, Shihao Wu, Hui Huang, Daniel Cohen-Or, Olga Sorkine-Hornung

Comments: accepted to cvpr2019, code available at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG)
[37] arXiv:1812.00235 (replaced) [pdf, other]: Title: Learning to Caption Images through a Lifetime by Asking Questions

Authors: Kevin Shen, Amlan Kar, Sanja Fidler

Comments: Fixed typos and added contribution list in intro, results remain the same

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
[38] arXiv:1901.09886 (replaced) [pdf, other]: Title: CoCoNet: A Collaborative Convolutional Network

Authors: Tapabrata Chakraborti, Brendan McCane, Steven Mills, Umapada Pal

Comments: Submitted to Machine Vision and Applications

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[39] arXiv:1902.09878 (replaced) [pdf]: Title: MC-ISTA-Net: Adaptive Measurement and Initialization and Channel Attention Optimization inspired Neural Network for Compressive Sensing

Authors: Nanyu Li, Cuiyin Liu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[40] arXiv:1903.01212 (replaced) [pdf]: Title: Unsupervised Domain Adaptation Learning Algorithm for RGB-D Staircase Recognition

Authors: Jing Wang, Kuangen Zhang

Comments: 7 pages, 5 figures, 17 reference

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[41] arXiv:1903.03691 (replaced) [pdf, other]: Title: RoPAD: Robust Presentation Attack Detection through Unsupervised Adversarial Invariance

Authors: Ayush Jaiswal, Shuai Xia, Iacopo Masi, Wael AbdAlmageed

Comments: To appear in Proceedings of International Conference on Biometrics (ICB), 2019

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[42] arXiv:1903.05761 (replaced) [pdf]: Title: LPM: Learnable Pooling Module for Efficient Full-Face Gaze Estimation

Authors: Reo Ogusu, Takao Yamanaka

Comments: FG2019

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[43] arXiv:1903.06846 (replaced) [pdf]: Title: Directional PointNet: 3D Environmental Classification for Wearable Robotics

Authors: Kuangen Zhang, Jing Wang, Chenglong Fu

Subjects: Computer Vision and Pattern Recognition (cs.CV)
[44] arXiv:1811.00401 (replaced) [pdf, other]: Title: Excessive Invariance Causes Adversarial Vulnerability

Authors: Jörn-Henrik Jacobsen, Jens Behrmann, Richard Zemel, Matthias Bethge

Journal-ref: Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)

New submissions
Cross-lists
Replacements

[ total of 44 entries: 1-44 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 1903, contact, help (Access key information)

arXiv.org > cs > cs.CV

Computer Vision and Pattern Recognition

New submissions

New submissions for Fri, 22 Mar 19

Cross-lists for Fri, 22 Mar 19

Replacements for Fri, 22 Mar 19