tailieunhanh - Báo cáo khoa học: "Integration Of Visual Inter-word Linguistic Knowledge In Degraded Constraints And Text Recognition"

level knowledge sources can then be used to select a decision from the candidate set for each word image. In this paper, we propose that visual inter-word constraints can be used to facilitate candidate selection. Visual inter-word constraints provide a way to link word images inside the text page, and to interpret t h e m systematically. Introduction The objective of visual text recognition is to transform an arbitrary image of text into its symbolic equivalent correctly. Recent technical advances in the area of document recognition have made automatic text recognition a viable alternative to manual key entry. . | Integration Of Visual Inter-word Constraints And Linguistic Knowledge In Degraded Text Recognition Tao Hong Center of Excellence for Document Analysis and Recognition Department of Computer Science State University of New York at Buffalo Buffalo NY 14260 Abstract Degraded text recognition is a difficult task. Given a noisy text image a word recognizer can be applied to generate several candidates for each word image. High-level knowledge sources can then be used to select a decision from the candidate set for each word image. In this paper we propose that visual inter-word constraints can be used to facilitate candidate selection. Visual inter-word constraints provide a way to link word images inside the text page and to interpret them systematically. Introduction The objective of visual text recognition is to transform an arbitrary image of text into its symbolic equivalent correctly. Recent technical advances in the area of document recognition have made automatic text recognition a viable alternative to manual key entry. Given a high quality text page a commercial document recognition system can recognize the words on the page at a high correct rate. However given a degraded text page such as a multiple-generation photocopy or facsimile performance usually drops abruptly l . Given a degraded text image word images can be extracted after layout analysis. A word image from a degraded text page may have touching characters broken characters distorted or blurred characters which may make the word image difficult to recognize accurately. After character recognition and correction based on dictionary look-up a word recognizer will provide one or more word candidates for each word image. Figure 1 lists the word candidate sets for the sentence Please fill in the application form. Each word candidate has a confidence score but the score may not be reliable because of noise in the image. The correct word candidate is usually in the candidate set but