Wednesday, April 4, 2007

Paper: Computational Manuscript Indexing

The 2006 Family History Technology Workshop archives are online. One presentation ("Towards Searchable Indexes for Handwritten Documents") dealt with the difficulties of automating OCR. The conclusion: it's not impossible to pragmatically digitize manuscripts for the purpose of searching. Partial matches between search terms and recognized manuscript letters mean that so long as the user can tolerate imperfect search results, the manuscripts need not be fully transcribed in order to be indexed. Even this requires extensive training and consistent handwriting in the source texts, however.

Here are links to the paper and the slides.

No comments: