Comments on Collaborative Manuscript Transcription: 2010: The Year of Crowdsourcing Transcription

the dickens journal online was a very successful p...

2012-01-22T03:10:13.582-06:00

the dickens journal online was a very successful project
http://www.djo.org.uk

Dear Mr. Brumfield, I am a researcher at the Ins...

2011-05-23T00:49:47.542-05:00

Dear Mr. Brumfield,

I am a researcher at the Instute of Applyed Economic Research in Brasil.
I read your post on transcription crowsourcing. It was an excelent source of organized information on transcription projects, as well as good analysis on the topic.
We are working on the Brasilian Statistical History, which aims to extract data on economic, demographic, epidemiology and social statistics from official statistics books (among other sources). At this point we have been just scanning many volumes (222,000 pages with 55,000 tables this far) and posting the images on-line with (at times rudimentary) meta-data. We now want to extract the information in the text and especially the tables.

Although most documents are in print, OCR performs terribly with the tables. So we are interested in crowsourcing the table transcription.
Tables are not standard, basically almost every table is unique. So we are looking to develop a solution to:
1) identify pages that have tables (easy)
2) ask volunteers informe the "table structure" in a way the computer will understand
3) ask volunteers to transcribe the table

Do you know of any project that does that?

regards
Lucas Mation
please contact me at lucasmation (at) gmail (dot) com

One project worthy of note is the National Libary ...

2011-04-26T02:51:46.632-05:00

One project worthy of note is the National Libary of Australia's (NLA) newspaper digitization project.

What is really interesting is that it is open to public users to improve the OCR output through online text correction. Of the 4 million plus pages that are accessible currently the NLA have more than 2 million lines of print corrected each month! The cumulative total is somewhere in excess of 32 million lines corrected. That's an amazing achievement by several thousands of volunteers.
For more details have a look at trove.nla.gov.au

This a great list, thank you! I can't resist a...

2011-02-22T03:08:52.651-06:00

This a great list, thank you! I can't resist adding a current German contribution.. http://de.guttenplag.wikia.com/ - a highly effective, crowd-sourced effort to find plagiarized passages in the Ph.D. dissertation of the German Minister of Defense, Karl-Theodor zu Guttenberg. Results thus far? Over 20% of the content and counting.

Hello! Wiktenauer's transcriptions, at this p...

2011-02-09T09:50:16.787-06:00

Hello!

Wiktenauer's transcriptions, at this point, are mostly taken from freely available, previously completed transcriptions. Although I'm not against it, I don't believe any of the transcription currently hosted is automated.

Great reviews, Ben! Very comprehensive! Here'...

2011-02-05T23:56:37.608-06:00

Great reviews, Ben! Very comprehensive!

Here's another article on crowdsourced translation, http://www.crowdsourcing.org/l/255.

Great overview, Ben! Very useful. About Militiere...

2011-02-02T15:45:28.680-06:00

Great overview, Ben! Very useful.

About Militieregisters.nl: in June our Dutch Society of Archivists will have their annual conference. Together with the team of people in Amsterdam who spearhead the crowdsourcing project, I submitted a proposal to present on crowdsourcing, and the project. Part of the session (two sessions, actually) will be a sneak preview of the software, which should then be developed. Fingers crossed!

I might have more comments later, once I find time to go through your extensive blogpost again. Thanks!

(And by the way, if you need translations of information on Dutch websites, you can always ask me.)

This is a very useful overview -- thank you. You m...

2011-02-02T15:10:12.771-06:00

This is a very useful overview -- thank you. You might also want to check out Columbia University's "Leveraging 'The Wisdom of the Crowds' for Efficient Tagging and Retrieval of Documents from the Historic Newspaper Archive." You can see a short video about the project here: http://is.gd/AxA7Cl