Training a shallow NN to erase ink seepage in historical manuscripts based on a degradation model
Articolo
Data di Pubblicazione:
2024
Abstract:
In historical recto-verso manuscripts, very often the text written on the opposite page of the folio penetrates through the fiber of the paper, so
that the texts on the two sides appear mixed. This is a very impairing
damage that cannot be physically removed, and hinders both the
work of philologists and palaeographers and the automatic analysis of
linguistic contents. A procedure based on neural networks (NN) is proposed
here to clean up the complex background of the manuscripts
from this interference. We adopt a very simple shallow NN whose learning
phase employs a training set generated from the data itself using
a theoretical blending model that takes into account ink diffusion and
saturation. By virtue of the parametric nature of the model, various
levels of damage can be simulated in the training set, favoring a generalization
capability of the NN. More explicitly, the network can be
trained without the need for a large class of other similar manuscripts,
but is still able, at least to some extent, to classify manuscripts
with varying degrees of corruption. We compare the performance of
this NN and other methods both qualitatively and quantitatively
on a reference dataset and heavily damaged historical manuscripts.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
Ancient manuscript virtual restoration; Degraded document binarization; Registration of recto-verso documents; Shallow multilayer neural networks
Elenco autori:
Tonazzini, Anna; Savino, Pasquale
Link alla scheda completa:
Link al Full Text:
Pubblicato in: