1Biometrics Based on Hand Synergies and Their Neural Representations
Biometric systems can identify individuals based on their unique characteristics. A new biometric based on hand synergies and their neural representations is proposed here. In this paper, ten subjects were asked to perform six hand grasps that are shared by most common activities of daily living. Their scalp electroencephalographic (EEG) signals were recorded using 32 scalp electrodes, of which 18 task-relevant electrodes were used in feature extraction. In our previous work, we found that hand kinematic synergies, or movement primitives, can be a potential biometric. In this paper, we combined the hand kinematic synergies and their neural representations to provide a unique signature for an individual as a biometric. Neural representations of hand synergies were encoded in spectral coherence of optimal EEG electrodes in the motor and parietal areas. An equal error rate of 7.5% was obtained at the system’s best configuration. Also, it was observed that the best performance was obtained when movement specific EEG signals in gamma frequencies (30–50Hz) were used as features. The implications of these first results, improvements, and their applications in the near future are discussed.
2An Anomaly Detection Approach to Face Spoofing Detection: A New Formulation and Evaluation Protocol
Face spoofing detection is commonly formulated as a two-class recognition problem where relevant features of both positive (real access) and negative samples (spoofing attempts) are utilized to train the system. However, the diversity of spoofing attacks, any new means of spoofing attackers, may invent (previously unseen by the system) the problem of imaging sensor interoperability, and other environmental factors in addition to the small sample size make the problem quite challenging. Considering these observations, in this paper, a number of propositions in the evaluation scenario, problem formulation, and solving are presented. First of all, a new evaluation protocol to study the effect of occurrence of unseen attack types, where the train and test data are produced by different means, is proposed. The new evaluation protocol better reflects the realistic conditions in spoofing attempts where an attacker may come up with new means for spoofing. Inter-database and intra-database experiments are incorporated into the evaluation scheme to account for the sensor interoperability problem. Second, a new and more realistic formulation of the spoofing detection problem based on the anomaly detection concept is proposed where the training data come from the positive class only. The test data, of course, may come from the positive or negative class. Such a one-class formulation circumvents the need for the availability of negative training samples, which, in an in deal case, should be the representative of all possible spoofing types. Finally, a thorough evaluation and comparison of 20 different one-class and two-class systems on the video sequences of three widely employed databases is performed to investigate the merits of the one-class anomaly detection approaches compared with the common two-class formulations. It is demonstrated that the anomaly-based formulation is not inferior as compared with the conventional two-class approach.
3Evaluation on Step Counting Performance of Wristband Activity Monitors in Daily Living Environment.
Wristband-placed physical activity monitors, as a convenient means for counting walking steps, assessing movement, and estimating energy expenditure, are widely used in daily life. There are many consumer-based wristband monitors on the market, but there is not an unified method to compare their performance. In this paper, we designed a series of experiments testing step counting performance under different walking conditions to evaluate these wristband activity monitors. Seven popular brands, including Huawei B1, Mi Band, Fitbit Charge, Polar Loop, Garmin Vivofit2, Misfit Shine, and Jawbone Up, were selected and evaluated with the proposed experiment method in this paper. These experiments include four parts, which are walking in a field at a different walking speed with and without arm swing, walking along a specified complex path, walking on a treadmill, and walking up and down stairs. Experiment results and analysis with nine healthy subjects were reported to show the step counting performance of these seven monitors.
4A Code-Level Approach to Heterogeneous Iris Recognition
Matching heterogeneous iris images in less constrained applications of iris biometrics is becoming a challenging task. The existing solutions try to reduce the difference between heterogeneous iris images in pixel intensities or filtered features. In contrast, this paper proposes a code-level approach in heterogeneous iris recognition. The non-linear relationship between binary feature codes of heterogeneous iris images is modeled by an adapted Markov network. This model transforms the number of iris templates in the probe into a homogenous iris template corresponding to the gallery sample. In addition, a weight map on the reliability of binary codes in the iris template can be derived from the model. The learnt iris template and weight map are jointly used in building a robust iris matcher against the variations of imaging sensors, capturing distance, and subject conditions. Extensive experimental results of matching cross-sensor, high-resolution versus low-resolution and, clear versus blurred iris images demonstrate the code-level approach can achieve the highest accuracy in compared with the existing pixel-level, feature-level, and score-level solutions
5Low-Rank and Joint Sparse Representations for Multi-Modal Recognition
We propose multi-task and multivariate methods for multi-modal recognition based on low-rank and joint sparse representations. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse regression, where sparse and low-rank representations across all modalities are imposed. One of our methods simultaneously couples information within different modalities by enforcing the common low-rank and joint sparse constraints among multi-modal observations. We also modify our formulations by including an occlusion term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve the resulting optimization problems. Extensive experiments on three publicly available multi-modal biometrics and object recognition data sets show that our methods compare favourably with other feature-level fusion methods
6Heart ID: A Multi resolution Convolutional Neural Network for ECG-Based Biometric Human Identification in Smart Health Applications
Body area networks, including smart sensors, are widely reshaping health applications in the new era of smart cities. To meet increasing security and privacy requirements, physiological signal-based biometric human identification is gaining tremendous attention. This paper focuses on two major impediments: the signal processing technique is usually both complicated and data-dependent and the feature engineering is time-consuming and can fit only specific datasets. To enable a data-independent and highly generalizable signal processing and feature learning process, a novel wavelet domain multiresolution convolutional neural network is proposed. Specifically, it allows for blindly selecting a physiological signal segment for identification purpose, avoiding the complicated signal fiducial characteristics extraction process. To enrich the data representation, the random chosen signal segment is then transformed to the wavelet domain, where multiresolution time-frequency representation is achieved. An auto-correlation operation is applied to the transformed data to remove the phase difference as the result of the blind segmentation operation. Afterward, a multiresolution 1-D-convolutional neural network (1-D-CNN) is introduced to automatically learn the intrinsic hierarchical features from the wavelet domain raw data without data-dependent and heavy feature engineering, and perform the user identification task. The effectiveness of the proposed algorithm is thoroughly evaluated on eight electrocardiogram datasets with diverse behaviors, such as with or without severe heart diseases, and with different sensor placement methods. Our evaluation is much more extensive than the state-of-the-art works, and an average identification rate of 93.5% is achieved. The proposed multiresolution 1-D-CNN algorithm can effectively identify human subjects, even from randomly selected signal segments and without heavy feature engineering. This paper is expected to demonstrate the feasibility and effectiveness of applying the blind signal processing and deep learning techniques to biometric human identification, to enable a low algorithm engineering effort and also a high generalization ability.
7Palm print Recognition Based on Complete Direction Representation
Direction information serves as one of the most important features for palmprint recognition. In the past decade, many effective direction representation (DR)-based methods have been proposed and achieved promising recognition performance. However, due to an incomplete understanding for DR, these methods only extract DR in one direction level and one scale. Hence, they did not fully utilize all potentials of DR. In addition, most researchers only focused on the DR extraction in spatial coding domain, and rarely considered the methods in frequency domain. In this paper, we propose a general framework for DR-based method named complete DR (CDR), which reveals DR by a comprehensive and complete way. Different from traditional methods, CDR emphasizes the use of direction information with strategies of multi-scale, multi-direction level, multi-region, as well as feature selection or learning. This way, CDR subsumes previous methods as special cases. Moreover, thanks to its new insight, CDR can guide the design of new DR-based methods toward better performance. Motived this way, we propose a novel palmprint recognition algorithm in frequency domain. First, we extract CDR using multi-scale modified finite radon transformation. Then, an effective correlation filter, namely, band-limited phase-only correlation, is explored for pattern matching. To remove feature redundancy, the sequential forward selection method is used to select a small number of CDR images. Finally, the matching scores obtained from different selected features are integrated using score-level-fusion. Experiments demonstrate that our method can achieve better recognition accuracy than the other state-of-the-art methods. More importantly, it has fast matching speed, making it quite suitable for the large-scale identification applications.
8Authentication of Swipe Fingerprint Scanners
Swipe fingerprint scanners (sensors) can be distinguished based on their scanner pattern-a sufficiently unique, persistent, and unalterable intrinsic characteristic even to scanners of the same technology, manufacturer, and model. We propose a method to extract the scanner pattern from a single image acquired by a widely-used capacitive swipe fingerprint scanner and compare it with a similarly extracted pattern from another image acquired by the same or by another scanner. The method is extremely simple and computationally efficient as it based on moving-average filtering, yet it is very accurate and achieves an equal error rate below 0.1% for 27 swipe fingerprint scanners of exactly the same model. We also show the receiver operating characteristic for different decision thresholds of two modes of the method. The method can enhance the security of a biometric system by detecting an attack on the scanner in which an image containing the fingerprint pattern of the legitimate user and acquired by the authentic fingerprint scanner has been replaced by another image that may still contain the fingerprint pattern of the legitimate user but has been acquired by another, unauthentic fingerprint scanner, i.e., for scanner authentication.
9Soft Biometrics: Globally Coherent Solutions for Hair Segmentation and Style Recognition Based on Hierarchic
Markov Random Fields (MRFs) are a popular tool in many computer vision problems and faithfully model a broad range of local dependencies. However, rooted in the Hammersley-Clifford theorem, they face serious difficulties in enforcing the global coherence of the solutions without using too high order cliques that reduce the computational effectiveness of the inference phase. Having this problem in mind, we describe a multi-layered (hierarchical) architecture for MRFs that is based exclusively in pairwise connections and typically produces globally coherent solutions, with 1) one layer working at the local (pixel) level, modeling the interactions between adjacent image patches; and 2) a complementary layer working at the object (hypothesis) level pushing toward globally consistent solutions. During optimization, both layers interact into an equilibrium state that not only segments the data, but also classifies it. The proposed MRF architecture is particularly suitable for problems that deal with biological data (e.g., biometrics), where the reasonability of the solutions can be objectively measured. As test case, we considered the problem of hair / facial hair segmentation and labeling, which are soft biometric labels useful for human recognition in-the-wild. We observed performance levels close to the state-of-the-art at a much lower computational cost, both in the segmentation and classification (labeling) tasks.
10 Facial biometrics and applicationsal MRFs.
Faces carry a lot of information to distinguish different individuals. In this context, biometrics-based verification systems play a major role in terms of recognizing (or confirming) an individual identity, relying on physiological and/or behavioral characteristics among a set of individual biometric traits. In particular, facial recognition is important because it has a relatively low cost (i.e., it can be carried out using standard cameras) and is one of the least intrusive biometric modalities available, since it does not require physical contact like fingerprint recognition or retina scanning
11Robust Face Recognition With Kernelized Locality-Sensitive Group Sparsity Representation
In this paper, a novel joint sparse representation method is proposed for robust face recognition. We embed both group sparsity and kernelized locality-sensitive constraints into the framework of sparse representation. The group sparsity constraint is designed to utilize the grouped structure information in the training data. The local similarity between test and training data is measured in the kernel space instead of the Euclidian space. As a result, the embedded nonlinear information can be effectively captured, leading to a more discriminative representation. We show that, by integrating the kernelized local-sensitivity constraint and the group sparsity constraint, the embedded structure information can be better explored, and significant performance improvement can be achieved. On the one hand, experiments on the ORL, AR, extended Yale B, and LFW data sets verify the superiority of our method. On the other hand, experiments on two unconstrained data sets, the LFW and the IJB-A, show that the utilization of sparsity can improve recognition performance, especially on the data sets with large pose variation.
12Toward End-to-End Face Recognition Through Alignment Learning
A common practice in modern face recognition methods is to specifically align the face area based on the prior knowledge of human face structure before recognition feature extraction. The face alignment is usually implemented independently, causing difficulties in the designing of end-to-end face recognition models. We study the possibility of end-to-end face recognition through alignment learning in which neither prior knowledge on facial landmarks nor artificially defined geometric transformations are required. Only human identity clues are used for driving the automatic learning of appropriate geometric transformations for the face recognition task. Trained purely on publicly available datasets, our model achieves a verification accuracy of 99.33% on the LFW dataset, which is on par with state-of-the-art single model methods.
13Simultaneous Feature and Dictionary Learning for Image Set Based Face Recognition
In this paper, we propose a simultaneous feature and dictionary learning (SFDL) method for image set-based face recognition, where each training and testing example contains a set of face images, which were captured from different variations of pose, illumination, expression, resolution, and motion. While a variety of feature learning and dictionary learning methods have been proposed in recent years and some of them have been successfully applied to image set-based face recognition, most of them learn features and dictionaries for facial image sets individually, which may not be powerful enough because some discriminative information for dictionary learning may be compromised in the feature learning stage if they are applied sequentially, and vice versa. To address this, we propose a SFDL method to learn discriminative features and dictionaries simultaneously from raw face pixels so that discriminative information from facial image sets can be jointly exploited by a one-stage learning procedure. To better exploit the nonlinearity of face samples from different image sets, we propose a deep SFDL (D-SFDL) method by jointly learning hierarchical non-linear transformations and class-specific dictionaries to further improve the recognition performance. Extensive experimental results on five widely used face data sets clearly shows that our SFDL and D-SFDL achieve very competitive or even better
14Semi-Supervised Sparse Representation Based Classification for Face Recognition With Insufficient Labeled Samples.
This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive nuisance variables such as bad lighting, wearing of glasses) and non-linear (i.e., non-additive pixel-wise nuisance variables such as expression changes). The small number of labeled examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain good recognition performance. To address the problem we propose a method called Semi-Supervised Sparse Representation based Classification (S3RC). This is based on recent work on sparsity where faces are represented in terms of two dictionaries: a gallery dictionary consisting of one or more examples of each person, and a variation dictionary representing linear nuisance variables (e.g., different lighting conditions, different glasses). The main idea is that (i) we use the variation dictionary to characterize the linear nuisance variables via the sparsity framework, then (ii) prototype face images are estimated as a gallery dictionary via a Gaussian Mixture Model (GMM), with mixed labeled and unlabeled samples in a semi-supervised manner, to deal with the non-linear nuisance variations between labeled and unlabeled samples. We have done experiments with insufficient labeled samples, even when there is only a single labeled sample per person. Our results on the AR, Multi-PIE, CAS-PEAL, and LFW databases demonstrate that the proposed method is able to deliver significantly improved performance over existing methods.
15Robust Nuclear Norm-Based Matrix Regression With Applications to Robust Face Recognition
Face recognition (FR) via regression analysis-based classification has been widely studied in the past several years. Most existing regression analysis methods characterize the pixelwise representation error via l1-norm or l2-norm, which overlook the 2D structure of the error image. Recently, the nuclear norm-based matrix regression model is proposed to characterize low-rank structure of the error image. However, the nuclear norm cannot accurately describe the low-rank structural noise when the incoherence assumptions on the singular values does not hold, since it overpenalizes several much larger singular values. To address this problem, this paper presents the robust nuclear norm to characterize the structural error image and then extends it to deal with the mixed noise. The majorization-minimization (MM) method is applied to derive a iterative scheme for minimization of the robust nuclear norm optimization problem. Then, an efficiently alternating direction method of multipliers (ADMM) method is used to solve the proposed models. We use weighted nuclear norm as classification criterion to obtain the final recognition results. Experiments on several public face databases demonstrate the effectiveness of our models in handling with variations of structural noise (occlusion, illumination, and so on) and mixed noise.
16Heterogeneous Face Recognition: A Common Encoding Feature Discriminant Approach
Heterogeneous face recognition is an important, yet challenging problem in face recognition community. It refers to matching a probe face image to a gallery of face images taken from alternate imaging modality. The major challenge of heterogeneous face recognition lies in the great discrepancies between different image modalities. Conventional face feature descriptors, e.g., local binary patterns, histogram of oriented gradients, and scale-invariant feature transform, are mostly designed in a handcrafted way and thus generally fail to extract the common discriminant information from the heterogeneous face images. In this paper, we propose a new feature descriptor called common encoding model for heterogeneous face recognition, which is able to capture common discriminant information, such that the large modality gap can be significantly reduced at the feature extraction stage. Specifically, we turn a face image into an encoded one with the encoding model learned from the training data, where the difference of the encoded heterogeneous face images of the same person can be minimized. Based on the encoded face images, we further develop a discriminant matching method to infer the hidden identity information of the cross-modality face images for enhanced recognition performance. The effectiveness of the proposed approach is demonstrated (on several public-domain face datasets) in two typical heterogeneous face recognition scenarios: matching NIR faces to VIS faces and matching sketches to photographs.
17Learning Bases of Activity for Facial Expression Recognition
The extraction of descriptive features from the sequences of faces is a fundamental problem in facial expression analysis. Facial expressions are represented by psychologists as a combination of elementary movements known as action units: each movement is localised and its intensity is specified with a score that is small when the movement is subtle and large when the movement is pronounced. Inspired by this approach, we propose a novel data-driven feature extraction framework that represents facial expression variations as a linear combination of localised basis functions, whose coefficients are proportional to movement intensity. We show that the linear basis functions of the proposed framework can be obtained by training a sparse linear model with Gabor phase shifts computed from facial videos. The proposed framework addresses generalisation issues that are not tackled by existing learnt representations, and achieves, with the same learning parameters, state-of-the-art results in recognising both posed expressions and spontaneous micro-expressions. This performance is confirmed even when the data used to train the model differ from test data in terms of the intensity of facial movements and frame rate
18Learning Deep Sharable and Structural Detectors for Face Alignment
Face alignment aims at localizing multiple facial landmarks for a given facial image, which usually suffers from large variances of diverse facial expressions, aspect ratios and partial occlusions, especially when face images were captured in wild conditions. Conventional face alignment methods extract local features and then directly concatenate these features for global shape regression. Unlike these methods which cannot explicitly model the correlation of neighbouring landmarks and motivated by the fact that individual landmarks are usually correlated, we propose a deep sharable and structural detectors (DSSD) method for face alignment. To achieve this, we firstly develop a structural feature learning method to explicitly exploit the correlation of neighbouring landmarks, which learns to cover semantic information to disambiguate the neighbouring landmarks. Moreover, our model selectively learns a subset of sharable latent tasks across neighbouring landmarks under the paradigm of the multi-task learning framework, so that the redundancy information of the overlapped patches can be efficiently removed. To better improve the performance, we extend our DSSD to a recurrent DSSD (R-DSSD) architecture by integrating with the complementary information from multi-scale perspectives. Experimental results on the widely used benchmark datasets show that our methods achieve very competitive performance compared to the state-of-the-arts.
19TMAGIC: A Model-free 3D Tracker
Significant effort has been devoted within the visual tracking community to rapid learning of object properties on the fly. However, state-of-the-art approaches still often fail in cases such as rapid out-of-plane rotation, when the appearance changes suddenly. One of the major contributions of this work is a radical rethinking of the traditional wisdom of modelling 3D motion as appearance change during tracking. Instead, 3D motion is modelled as 3D motion. This intuitive but previously unexplored approach provides new possibilities in visual tracking research. Firstly, 3D tracking is more general, as large out-of-plane motion is often fatal for 2D trackers, but helps 3D trackers to build better models. Secondly, the tracker’s internal model of the object can be used in many different applications and it could even become the main motivation, with tracking supporting reconstruction rather than vice versa. This effectively bridges the gap between visual tracking and Structure from Motion. A new benchmark dataset of sequences with extreme out-ofplane rotation is presented and an online leader-board offered to stimulate new research in the relatively underdeveloped area of 3D tracking. The proposed method, provided as a baseline, is capable of successfully tracking these sequences, all of which pose a considerable challenge to 2D trackers (error reduced by 46 %).
20Analysis of Disparity Error for Autofocus
As more and more stereo cameras are installed on electronic devices, we are motivated to investigate how to leverage disparity information for autofocus. The main challenge is that stereo images captured for disparity estimation are subject to defocus blur unless the lenses of the stereo cameras are at the in-focus position. Therefore, it is important to investigate how the presence of defocus blur would affect stereo matching and, in turn, the performance of disparity estimation. In this paper, we give an analytical treatment of this fundamental issue of disparity-based autofocus by investigating the relation between image sharpness and disparity error. A statistical approach that treats the disparity estimate as a random variable is developed. Our analysis provides a theoretical backbone for the empirical observation that, regardless of the initial lens position, disparity-based autofocus can bring the lens to the hill zone of the focus profile in one movement. The insight gained from the analysis is useful for the implementation of an autofocus system.
21Hierarchical Contour Closure-Based Holistic Salient Object Detection
Most existing salient object detection methods compute the saliency for pixels, patches, or superpixels by contrast. Such fine-grained contrast-based salient object detection methods are stuck with saliency attenuation of the salient object and saliency overestimation of the background when the image is complicated. To better compute the saliency for complicated images, we propose a hierarchical contour closure-based holistic salient object detection method, in which two saliency cues, i.e., closure completeness and closure reliability, are thoroughly exploited. The former pops out the holistic homogeneous regions bounded by completely closed outer contours, and the latter highlights the holistic homogeneous regions bounded by averagely highly reliable outer contours. Accordingly, we propose two computational schemes to compute the corresponding saliency maps in a hierarchical segmentation space. Finally, we propose a framework to combine the two saliency maps, obtaining the final saliency map. Experimental results on three publicly available datasets show that even each single saliency map is able to reach the state-of-the-art performance. Furthermore, our framework, which combines two saliency maps, outperforms the state of the arts. Additionally, we show that the proposed framework can be easily used to extend existing methods and further improve their performances substantially.
22Unsupervised Word Spotting in Historical Handwritten Document Images using Document-oriented Local Features
Word spotting strategies employed in historical handwritten documents face many challenges due to variation in the writing style and intense degradation. In this paper, a new method that permits effective word spotting in handwritten documents is presented that it relies upon document-oriented local features, which take into account information around representative keypoints as well a matching process that incorporates spatial context in a local proximity search without using any training data. Experimental results on four historical handwritten data sets for two different scenarios (segmentation-based and segmentation-free) using standard evaluation measures show the improved performance achieved by the proposed methodology.
23Hierarchical Contour Closure based Holistic Salient Object Detection
Most existing salient object detection methods compute the saliency for pixels, patches or superpixels by contrast. Such fine-grained contrast based salient object detection methods are stuck with saliency attenuation of the salient object and saliency overestimation of the background when the image is complicated. To better compute the saliency for complicated images, we propose a hierarchical contour closure based holistic salient object detection method, in which two saliency cues, i.e., closure completeness and closure reliability are thoroughly exploited. The former pops out the holistic homogeneous regions bounded by completely closed outer contours, and the latter highlights the holistic homogeneous regions bounded by averagely highly reliable outer contours. Accordingly, we propose two computational schemes to compute the corresponding saliency maps in a hierarchical segmentation space. Finally, we propose a framework to combine the two saliency maps, obtaining the final saliency map. Experimental results on three publicly available datasets show that even each single saliency map is able to reach the state-of-the-art performance. Furthermore, our framework which combines two saliency maps outperforms the state of the arts. Additionally, we show that the proposed framework can be easily used to extend existing methods and further improve their performances substantially.