Monday, February 9, 2009

GoogleFight Resolves Unclear Handwriting

I've spent the last couple of weeks as a FromThePage user working seriously on annotation. This mainly involves identifying the people and events mentioned in Julia Brumfield's 1918 diary and writing short articles to appear as hyperlinked pages within the website, or be printed as footnotes following the first mention of the subject. Although my primary resource is a descendant chart in a book of family history, I've also found Google to be surprisingly helpful for people who are neighbors or acquaintances.

Here's a problem I ran into in the entry for June 30, 1918:

In this case, I was trying to identify the name in the middle of the photo. Bo__d Dews. The surname is a bit irregular for Julia's hand, but Dews is a common surname and occurs on the line above. In fact, this name is in the same list as another Mr. Dews, so I felt certain about the surname.

But what to make of the first name? The first two and final letters are clear and consistent: BO and D. The third letter is either an A or a U, and the fourth is either N or R. We can eliminate "Bourd" and "Boand" as unlikely phonetic spellings of any English name, leaving "Bound" and "Board". Neither of these are very likely names... or are they?

I thought I might have some luck by comparing the number and quality of Google search results for each of "Board Dews" and "Bound Dews". This is a pretty common practice used by Wikipedia editors to determine the most common title of a subject, and is sometimes known as a "Google fight". Let's look at the results:

"Bound Dews" yields four search results. The first two are archived titles from FromThePage itself, in which I'd retained a transcription of "Bound(?) Dews" in the text. The next two are randomly-generated strings on a spammer's site. We can't really rule out "Bound Dews" as a name based on this, however.

"Board Dews" yields 104 search results. The first page of results contains one person named Board Dews, who is listed on a genealogist's site as living from 1901 to 1957, and residing in nearby Campbell County. Perhaps more intriguing is the other surnames on the site, all from the area 10 miles east of Julia's home. The second page of results contains three links to a Board Dews, born in 1901 in Pittsylvania County.

At this point, I'm certain that the Bo__d Dews in the diary must be the Board Dews who would have been a seventeen-year-old neighbor. But I'm still astonished that I can resolve a legibility problem in a local diary with a Google search.

Thursday, February 5, 2009

Progress Report: Eight Months after THATCamp

It's been more than half a year since I've updated this blog. During that period, due to some events in my personal life, I was only able to spend a month or so on sustained development, but nevertheless made some real progress.

The big news is that I announced the project to some interested family members and have acquired one serious user. My cousin-in-law, Linda Tucker, has transcribed more than 60 pages of Julia Brumfield's 1919 diary since Christmas. In addition to her amazing productivity transcribing, she's explored a number of features of the software, reading most of the previously-transcribed 1918 diary, making notes and asking questions, and fighting with my zoom feature. Her enthusiasm is contagious, and her feedback -- not to mention her actual contributions -- has been invaluable.

During this period of little development, I spent a lot of time as a user. Fewer than 50 pages remain to transcribe in the 1918 diary, and I've started seriously researching the people mentioned in the diary for elaboration in footnotes. It's much easier to sustain work as a user than as a developer, since I don't need an hour or so of uninterrupted concentration to add a few links to a page.

I've also made some strides on printing. I jettisoned DocBook after too many problems and switched over to using Bruce Williamson's RTeX plugin. After some limited success, I will almost certainly craft my own set of ERb templates that generate LaTeX source for PDF generation. RTeX excels in serving up inline PDF files, which is somewhat antithetical to my versioned approach. Nevertheless, without RTeX, I might have never ventured away from DocBook. Thanks go to THATCamper Adam Solove for his willingness to share some of his hard-won LaTeX expertise in this matter.

Although I'm new to LaTeX, I've got footnotes working better than they were in DocBook. I still have many of the logical issues I addressed in the printing post to deal with, but am pretty confident I've found the right technology for printing.

I'm also working on re-implementing zoom in GSIV, rather than my cobbled-together solution. The ability to pan a zoomed image has been consistently requested by all of my alpha testers, the participants at THATCamp, and its lack is a real pain point for Linda, my first Real User. I really like the static-server approach GSIV takes, and will post when the first mock-up is done.