For the fourth year, I'm participating in the Day of DH.
You can follow my day at the Day of DH blog.
Tuesday, May 19, 2015
Friday, May 8, 2015
Best Practices at Engaging the Public at CCLA
This is the text of my talk at the best practices panel at the Crowd Consortium for Libraries and Archives meeting Engaging the Public on May 8, 2015.
Incentives
One caveat: most of
my background is in crowdsourced manuscript transcription, though
with the development of FromThePage 2 I've become involved in the
related fields of collaborative document translation and
crowd-sourced OCR correction. I hope that this is useful to
non-textual projects as well.
The best practice I'd like to talk
about is returning the product of crowd-sourcing to the volunteers
that produced it.
What do I mean by product?
I'm not talking about what project
managers consider the final product, whether that be item-level
finding aids or peer-reviewed papers in the scholarly press. I'm
talking about the raw product – the actual work that comes out of a
volunteer's direct effort, or the efforts of their fellow volunteers
– the transcript of a letter, the corrected text of a newspaper
article, the translated photo captions, the carefully researched
footnotes and often personal comments left on pages.
Why?
First, it's the right thing to do.
Yesterday we talked about reciprocity and social justice. An older
text says “Thou shalt not muzzle the oxen that tread out the corn.”
Crowdsourced transcription projects
vary a lot on this. For wiki-like systems, displaying volunteer
transcripts is built into the system – I know that's the case for
FromThePage, TranscribeBentham and WikiSource, and suspect the same
applies to Scripto and DIYHistory. For others, users can't even see
their own contributions after they have submitted them. However, the
Smithsonian Institute Transcription Center actually added this
feature on purpose – the team implementing the center added the
ability for users to download PDFs of transcribed documents
specifically because they felt it was the Right Thing to Do.
Now that I've quoted the Bible, let's
talk about purely instrumental reasons crowdsourcing projects should
return volunteers' labor to them.
Incentives
For one thing, exposing the raw data
early can better align our projects with the incentives that motivate
many volunteers. Most volunteers are not participating because of
their affiliation with an institution, nor because they treasure
clean library metadata – at least not primarily! What keeps them
coming back and contributing is their connection to the material –
an intrinsic motivation of experiencing life as a bird-watcher in the
1920s, of marching alongside a Civil War soldier as they transcribe
observation cards or diaries.
We should expose the texts volunteers
have worked on in ways that are immediately usable to them – PDFs
they can print out, texts they can email, URLs they can post on
Facebook—to show their friends and families just what they've been
up to, and why they're so excited to volunteer.
In some cases this may provide
extrinsic rewards project managers can't envision. One of the first
projects I worked on, the Zenas Matthews diary of the
Mexican-American War—attracted a super-volunteer early on who
transcribed the entire diary in two weeks. When I interviewed Scott
Patrick, I learned that the biggest reward we could provide – the
thing he'd treasure above over badges or leader boards – would be
the text itself in a printable and publishable format. You see, Mr.
Patrick's heritage organization formally recognizes members who have
written books, including editions of primary sources. His
contribution to the project certainly matched his fellows' for
quality, but access to a usable form of the text—the text he'd
transcribed himself—was the thing that stood in his way.
Recruitment
Exposing raw transcripts online during
the crowdsourcing process can actually enhance recruitment to
crowd-sourcing projects. I've seen this in a personal project I
worked on. in which one super-volunteer found the project by Googling
his own name. You see, a previous volunteer had transcribed a lot of
material that mentioned the a letter carrier named Nat Wooding. So
when Nat Wooding did a vanity search, he found the transcribed
diaries, recognized the letter carrier as his great-uncle, and became
a major contributor to the project. Had the user-generated
transcripts been locked away for expert review, or even published
online somewhere outside of the crowdsourcing tool, we would have
missed the contributions of a new super-volunteer.
Engagement
For the past three years, I've been
involved with an non- called Free UK Genealogy. They have volunteers
around the world transcribe genealogical records using offline,
spreadsheet-like tools so that they can be searched on a freely
accessible website.
I spent several months building a new
system for crowd-sourced transcription of parish registers, but
encountered very little enthusiasm—actually some outright
opposition—from the most active volunteers. They were used to
their spreadsheets, and saw no value at all to changing what they
were doing.
Eventually, we switched from improving
the transcription tool-chain to improving the delivery system. We
re-wrote the public-facing search engine from scratch, focusing on
the product visible to the volunteers and their communities. When we
launched the site in April, it received the most positive reviews of
any software redesign I've been involved with in two decades in the
industry. Best of all—although time frame is too short to have
hard numbers—the volunteer community seems to have been
reinvigorated, as the FreeREG2 database passed 32 million records at
the beginning of the month.
So that's my best practice: expose
volunteer contributions online, within your crowdsourcing system, as
they are produced. It will
improve the quality and productivity of the project, and it's the
right thing to do.
Subscribe to:
Posts (Atom)