Computer Vision

Humans construct a significant portion of their perception of the surrounding environments using their vision sense. Computer vision aims at bringing the power of the human vision sense to the machines. It also expands to analyzing multi-spectral data, e.g. in remote sensing, and spatio-temporal data, such as videos and volumetric medical images.

Optical Character Recognition

Optical Character Recognition (OCR) is a key technology for scanning books, signs, documents, and other real-world texts into digital form for historical purposes, policy purposes (e.g., census documents), and enterprise intelligence/efficiencies.

Leveraging recent neural network advances in the fields of computer vision and speech recognition, VIMAL researchers are developing a new OCR system from scratch. The goal is to shift from statistical hidden Markov models (HMMs) to a more efficient neural network-based system.  

The team’s combination of convolutional neural networks (CNNs) and long short-term memory (LSTM) recurrent networks demonstrated top performance in a pilot United States Census program – achieving 79 percent accuracy on last-name recognition for handwritten names from the 1990’s census – and on the challenging MADCAT Arabic handwriting recognition dataset. Both of these efforts are described in papers presented at the International Conference on Document Analysis (ICDAR), 2017.

VIMAL researchers have also developed a novel text detection algorithm, that models the text detection problem as a three-class problem rather than as a binary classification problem.  By expanding current capabilities, the team aims to create systems capable of recognizing complex document layouts.

Related projects