Sunday, March 15, 2009

Feature: Editorial Toolkit

I'm pleased to report that my cousin Linda Tucker has finished transcribing the 1919 diary. I've been trying my best to keep up with her speed, but she's able to transcribe two pages in the amount of time it takes me to edit and annotate a single, simple page. If the editing work requires more extensive research, or (worse) reveals the need to re-do several previous pages, there is really no contest. In the course of this intensive editing, I've come up with a few ideas for new features, as well as a few observations on existing features.

Show All Pages Mentioning a Subject

Currently, the article page for each subject shows a list of the pages on which the subject is mentioned. This is pretty useful, but it really doesn't serve the purposes of the reader or editor who wants to read every mention of that subject, in context. In particular, after adding links to 300 diary pages, I realized that "Paul" might be either Paul Bennett, Julia's 20-year-old grandson who is making a crop on the farm, or Paul Smith, Julia's 7-year-old grandson who lives a mile away from her and visits frequently. Determining which Paul was which was pretty easy from the context, but navigating the application to each of those 100-odd pages took several hours.

Based on this experience, I intend to add a new way of filtering the multi-page view, which would display all the transcriptions of all pages that mention a subject. I've already partially developed this as a way to filter the pages within a work, but I really need to 1) see mentions across works, and 2) make this accessible from the subject article page. I am embarrassed to admit that the existing work-filtering feature is so hard to find, that I'd forgotten it even existed.


The Autolink feature has proven invaluable. I originally developed it to save myself the bother of typing [[Benjamin Franklin Brumfield, Sr.|Ben]] every time Julia mentioned "Ben". However, it's proven especially useful as a way of maintaining editorial consistency. If I decided that "bathing babies" was worth an index entry on one page, I may not remember that decision 100 pages later. However, if Autolink suggests [[bathing babies]] when it sees the string "bathed the baby", I'll be reminded of that. It doesn't catch every instance , but for subjects that tend to cluster (like occurrences of newborns), it really helps out.

Full Text Search

Currently there is no text search feature. Implementing one would be pretty straightforward, but in addition to that I'd like to hook in the Autolink suggester. In particular, I'd like to scan through pages I've already edited to see if I missed mentions of indexed subjects. This would be especially helpful when I decide that a subject is noteworthy halfway through editing a work.

Unannotated Page List

This is more a matter of work flow management, but I really don't have a good way to find out which pages have been transcribed but not edited or linked. It's really hard to figure out where to resume my editing.

[Update: While this blog post was in draft, I added a status indicator to the table of contents screen to flag pages with transcriptions but no subject links.]

Dual Subject Graphs/Searches

Identifying names is especially difficult when the only evidence is the text itself. In some cases I've been able to use subject graphs to search for relationships between unknown and identified people. This might be much easier if I could filter either my subject graphs or the page display to see all occurrences of subjects X and Y on the same page.

Research Credits

Now that the Julia Brumfield Diaries are public, suggestions, corrections, and research is pouring in. My aunt has telephoned old-timers to ask what "rebulking tobacco" refers to. A great-uncle has emailed with definitions of more terms, and I've had other conversations via email and telephone identifying some of the people mentioned in the text. To my horror, I find that I've got no way to attribute any of this information to those sources. At minimum, I need a large, HTML acknowledgments field at the collection level. Ideally, I'd figure out an easy-to-use way to attribute article comments to individual sources.

No comments: