Training a shallow NN to erase ink seepage in historical manuscripts based on a degradation model
Academic Article
Publication Date:
2024
abstract:
In historical recto-verso manuscripts, very often the text written on the opposite page of the folio penetrates through the fiber of the paper, so
that the texts on the two sides appear mixed. This is a very impairing
damage that cannot be physically removed, and hinders both the
work of philologists and palaeographers and the automatic analysis of
linguistic contents. A procedure based on neural networks (NN) is proposed
here to clean up the complex background of the manuscripts
from this interference. We adopt a very simple shallow NN whose learning
phase employs a training set generated from the data itself using
a theoretical blending model that takes into account ink diffusion and
saturation. By virtue of the parametric nature of the model, various
levels of damage can be simulated in the training set, favoring a generalization
capability of the NN. More explicitly, the network can be
trained without the need for a large class of other similar manuscripts,
but is still able, at least to some extent, to classify manuscripts
with varying degrees of corruption. We compare the performance of
this NN and other methods both qualitatively and quantitatively
on a reference dataset and heavily damaged historical manuscripts.
Iris type:
01.01 Articolo in rivista
Keywords:
Ancient manuscript virtual restoration; Degraded document binarization; Registration of recto-verso documents; Shallow multilayer neural networks
List of contributors:
Tonazzini, Anna; Savino, Pasquale
Full Text:
Published in: