To recognized the handwritten data from document, there are number of steps which are involved while recognition, firstly the document is scanned using scanner1. This scanned document is converted into image. Then image is preprocessed with set of valuable steps and convert it into a character/script as per the environment. The scanned image undergoes number of valuable preprocessing steps so as to increase the ratio of recognition of the handwritten document. The general steps for Handwritten Character recognition is Image Acquisition, preprocessing, feature extraction, classification and recognition3.
Preprocessing consist of various operations performed on image. It enhances the image making it suitable for next level of segmentation. It removes noise from image. All work has been done in MATLAB21. Preprocessing of compound characters have following steps involved:
Binarization: Upgradation of grey scale image in to binary image is Binarization.
Noise Elimination: Noise can occur at any stage like image capturing, transmission or compression. Noise degrades the quality of image. So different filters and morphological operations are available for removal of image noise
Size Normalization: Normalization is applied to obtain characters of uniform size. It reduce the size of image without getting the structure altered.
Thinning: To remove the selected foreground pixels from images, thinning is used. Image thinning extracts a skeleton of image without the loss of topological properties.
Fig: Character Image after Preprocessing
The large number of compound character set with a wide range of variations in the writing style demand a pre-classification of the characters before the final recognition. Commonly found structural features in the characters are the vertical line,horizontal line,end points, junction points etc. The first stage employs classification using global features like presence of vertical line in the character, its position in the character and the presence of enclosed regions in the character. The detection of global features is followed by the detection of the local features like end points and their position in the character. On the basis of global feature, characters are classified on the basis of presence of vertical line i.e character with vertical bar at the end, character with no end bar and character with vertical bar at middle. After global features, local features are obtained by partitioning the character in four quadrants and detection of end points.
Fig: Character Partitioning
1.3 Feature Extraction:
Feature extraction can be considered as finding a set of parameters (features) that define the shape of the underlying character as precisely and uniquely as possible24. The term feature selection refers to algorithms that select the best subset of the input feature set. Methods that create new features based on transformations, or combination of original features are called feature extraction algorithms910. We are using diagonal Feature Extraction technique for proposed work.