Wednesday, April 11, 2012

Crowdsourced Transcription Tool List

When I first started this blog, I spent a lot of time writing detailed reviews of different transcription projects.  This has become difficult as my available time shrinks and the number of crowdsourcing projects grows.  So when Kate Bowers posted to the Society of American Archivists mailing list asking for a directory of transcription tools, I figured it was time to take a different approach.
The link above is a Google Documents spreadsheet listing different tools and the features I thought were relevant. It's been updated several times over the last few weeks, and I'm pleased to see that it's expanded to include a score of technologies. I hope it's useful.

Wednesday, April 4, 2012

French Departmental Archive on Wikisource

While the transcription world buzzes with news of the release of the 1940 US census and the crowdsourced transcription projects that surround it, I'd like to draw your attention to a blog post published last week on La Tribune des Archives: "Edition collaborative de manuscrits sur Wikisource : 1er retour d'expérience".  The post covers the efforts of the archives of the department of Alpes-Maritimes to transcribe 17th- and 18th-century records of episcopal visits to the communes in the diocese.  These records are rich sources on local history, but "readers struggle over the chicken-scratch, and the collection is too large to be edited by a single person."  The archive has used to transcribe these manuscripts with great success, so I'd like to quote and translate extensive portions of their post.

Why Wikisource?

It's already there! (No software to create, maintain, administer, no specs -- just a strong will and a a core of 2-6 people).
It offers features designed for manuscript editions requiring more than one editor.
Particularly useful functions (aside from the collaborative aspect) :
  • Side-by-side display of facsimile and transcription
  • Workflow indicating whether a page is transcribed, corrected, or validated by two administrators.
  • The visualization is very practical for motivating the community of transcribers.
  • Version history control and the ability to comment or discuss difficult issues.
  • Wikisource's high Google page rank.

The article continues to describe the factors they weighed when choosing material for the project (accessibility of the script and local interest, among others), how they got started (the standard GLAMWiki approach), then continues to the community management aspects I find so fascinating:

How do you motivate your paleographers?

 In our experience, transcribers are essentially former university students and internally-trained archivists who want to extend their education (either by making further progress or by avoiding becoming rusty).
Work times and rest times clearly defined in advance.
A regular, fixed-date schedule defined in advance (for example, one month: upload on the 15th and correction every last day of the month) helps the group to make progess and to break up its efforts with relaxation periods (for the eyes, the editors, and the correctors) and lets everyone have rapid feedback (new pages are in fact corrected practically every night).

Findings on the behavior of "students" on Wikisource

The first exercises attracted the kind help support of Wikisource regulars and administrators  (Adrienne Alix, SereinWMfr, Pyb, Hsarrazin), a few new registered paleographers (Cavalié, LINCK, Braxmeyer, Gustave) and some anonymous IPs.  One or two correctors can suffice easily to keep track of the work of 5-10 "students".  Contrary to homework done in class, the "students" apply themselves regularly to the task, and the size and number of contributors does not increase on the night before the deadline.
Writings dating from before 1660 receive fewer volunteers but could very well serve as university exercises graded online (at the rate of one page per student).
For more on the archive's efforts (including their similar outreach on Flickr), take a look at the departmental archive news page.