Thursday, July 12, 2007

Feature: Transcription Versions

Page/Subject Article Versions
Last week I added versioning to articles and pages. The goal was to allow access to previous edits via a version page akin to the MediaWiki history tab.

Gavin Robinson suggested a system of review and approval before transcription changes go live, but I really think that this doesn't fit well with my user model. For one thing, I don't expect the same kinds of vandalism problems you see in Wikipedia to affect FromThePage works much, since the editors are specifically authorized by the work owner. For another, I can't imagine the solo-owner/scribe would tolerate having to submit and approve each one of their edits for long. Finally, since this is designed for a loosely-coupled, non-institutional user community, I simply can't assume that the work owner will check the site each day to review and approve changes. Projects must be able to keep their momentum without intervention by the work owner for months at a time.

His concerns are quite valid, however. Perhaps an alternative approach to transcription quality is to develop a few more owner tools, like a bulk review/revert feature for contributions made since a certain date or by a certain user.

Work Versions
Later, I'll put up a technical post on how I accomplished this with Rails after_save callbacks, but for now I'd like to talk about "versions" of a perpetually-editable work. What exactly does this mean? If a user prints out or downloads a transcription between one change and the next, how do you indicate that?

To address this, I decided to add the concept of a work's "transcription version". This is an additional attribute of the work itself, and every time an edit is made to any one of the work's pages, the work itself has its transcription version incremented. By recording the transcription version of the work in the page version record as well, I should be able to reconstruct the exact state of the digital work from a number added to an offline copy of the work.

I decided on transcription_version as an attribute name because comments and perhaps subject articles may change independently of the work's transcribed text. A printout that includes commentary needs a comment_version as well as a transcription_version. The two attributes seem orthogonal, because two transcription-only prints of the same work shouldn't appear different because a user has made an unprinted annotation.


Gavin Robinson said...

You're right that that kind of approval mechanism is unnecessary for people who are working as scribes on the project. I was thinking more about ways for readers to suggest corrections after the "finished" version is published.

Ben W. Brumfield said...

Which was the context of the conversation you posted that suggestion to, come to think of it.

The notion of a finished version, or even of publishing is still worthy of further investigation. My initial intent was to allow owners to publish a work for viewing once each page was completely transcribed. This didn't seem to lend itself to encouraging publishing, however, so I toyed with the idea of automatically publishing works that had around 80% of their pages transcribed.

Even that, however, doesn't work for some models. A page-a-day model, such is followed by the successful PapasDiary and PepysDiary projects, involves careful attention to a single page at a time, without regard to the completness of the work itself. So I'm not quite sure what to do about when works should be displayed.