Data di Pubblicazione:
2004
Abstract:
This paper proposes an integrated system for
the processing and analysis of highly degraded printed
documents for the purpose of recognizing text characters.
As a case study, ancient printed texts are considered. The
system is comprised of various blocks operating sequentially.
Starting with a single page of the document, the
background noise is reduced by wavelet-based decomposition
and filtering, the text lines are detected, extracted,
and segmented by a simple and fast adaptive thresholding
into blobs corresponding to characters, and the various
blobs are analyzed by a feedforward multilayer neural
network trained with a back-propagation algorithm.
For each character, the probability associated with the
recognition is then used as a discriminating parameter
that determines the automatic activation of a feedback
process, leading the system back to a block for refining
segmentation. This block acts only on the small portions
of the text where the recognition cannot be relied on and
makes use of blind deconvolution and MRF-based segmentation
techniques whose high complexity is greatly
reduced when applied to a few subimages of small size.
The experimental results highlight that the proposed system
performs a very precise segmentation of the characters
and then a highly effective recognition of even
strongly degraded texts.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Degraded texts; image restoration; Wavelet denoising; Neural Networks
Elenco autori:
Bedini, Luigi; Tonazzini, Anna
Link alla scheda completa: