Data di Pubblicazione:
2017
Abstract:
This paper documents our campaign to undertake the large-scale optical character recognition of ancient, or polytonic, Greek. Building upon the Gamera OCR engine and developing a suite of post-processing tools, including automatic spellcheck, we processed 1,200 volumes comprising 329,002,271 Greek words. A sample of 10 pages is studied in detail; they demonstrate the degree to which each step of post-processing improved the results, and with which source documents. These pages attain an average character accuracy of about 96%. These results will provide a basis for further improvements, including the training of other open-source OCR engines.
Tipologia CRIS:
01.01 Articolo in rivista
Keywords:
OCR; Ancient Greek
Elenco autori:
Boschetti, Federico
Link alla scheda completa:
Pubblicato in: