Abstract—The output of a scanner is a non editable scanned text image. Though the text is visible but one can neither edit it nor make any change, if required. This provides a basis for the optical character recognition (OCR) theory. OCR consists of generally three major phase; pre processing after image acquisition, segmentation and recognition. The segmentation process is the most crucial phase. The output of this phase decides the outcome of recognition phase. If this output is right then recognition phase would give the right output otherwise not. In this paper, we provide an algorithm which is used to segment the scanned document image as a lines, words and characters. The coordinates of line detected are used to find the word position present in that line. Finally, these words position coordinates are used to find characters present in the word. To detect lines and words, one module is proposed which is used to find both. For character detection, the reverse engineering is used, i.e. one part is extracted from the word present in the line. This extracted part is checked whether it has some meaningful symbol (as per Gurmukhi script). If it has then the extracted part is marked and written in the file, otherwise the extracted part is readjusted to find the symbol. This overall concept was implemented, and got encouraging results.
Index Terms—OCR, Segmentation, Gurmukhi, Handwritten, Feature, Water Reservoir, Line, Word.
Rajiv Kumar, Thapar University,(email: rajiv.patiala@gmail.com)
Amardeep Singh, Pbi University (email: amardeep_dhiman@yahoo.com)
[PDF]
Cite: Rajiv Kumar and Amardeep Singh, "Algorithm to Detect and Segment Gurmukhi Handwritten Text into Lines, Words and Characters,"
International Journal of Engineering and Technology vol. 3, no. 4, pp. 392-395, 2011.