Paper-based documents have been used ever since paper was invented. Keeping and maintaining paper-based documents is cumbersome and costly, now that we have the option of maintaining electronic versions of documents. Unfortunately paper based records still exists either as relics of legacy software or maintained for statutory purposes. Existence of these paper-based systems provides us with an oppurtunity to build an automated system, which extracts data from printed document or images.
OCR(Optical Character Recognition) as a technology has been existing from more than three decades which allows conversion of image-based documents of typed, handwritten or printed text to machine-readable, extracted text. Given the low accuracy of the OCR solutions on handwriting recognition, a new branch has cropped named, ICR (Intelligent Character Extraction) which combines technologies to pre-process the images for better content extraction and use domain specific business rules and apply Machine Learning techniques to increase the accuracy of extracted content. Coforge has an internally build ICR Framework named SLICE (Self Learning Intelligent Character Extractor) that provides configuration capability to define areas of interest to read along with template classification of images to extract content.