Carolyn Sheffield chaired a panel (video recording) on crowdsourcing which included Rob Guralnik discussing Notes From Nature, Christina Fidler talking about the Grinnell field notes on FromThePage, my talk, and a long, valuable discussion among all participants. My presentation covered the data model and uses of wiki links as I'm using them in FromThePage.
Video, slides, and transcript are below:
"From The Page" - Ben Brumfield from iDigBio on Vimeo.
I say "amateur" editions because we're not dealing with the kinds of things that textual scholars in the humanities are dealing with, where they're trying to compare different variant manuscript versions of Chaucer. [By contrast, we] have something that's very straightforward, and we're interested in some fairly simple annotations.
It's purpose-built -- free-standing on MySQL and Ruby on Rails, so it's not integrated with MediaWiki or anything like that.
[FromThePage] was built originally for a set of my great-great grandmother's diaries.
Since then it's been used for military diaries by libraries and history departments.
Here's an example: This is an 1859 journal from an expedition in which someone went out and made a number of observations and collected some things to bring back with them. There are scholars interested in mining those.
But it's not a naturalist expedition. This is Viscountess Emily Anne Smyth Strangford, who in this case is touring the Mediterranean and visiting a lot of classical monuments. The folks at the Duke Computational Classics Collaboratory are interested in finding all the places in which she recorded Latin and Greek inscriptions, coming up with her itenerary, and figuring out how [that data] connects to the objects her father-in-law had collected for the British Museum twenty years earlier.
So there's a lot of correspondence, I tend to think, with field notes.
- They've identified ten thousand subjects worth classifying in their system.
- Individual pages have been edited twenty-four thousand times. And this goes back to the wiki-like approach -- people transcribe a page, and then they revisit it. They make a number of edits to a page as they get comfortable with the handwriting.
- And then they've linked individual observations, species mentioned, and people in the field notes to those subjects forty-two thousand times.
Any of us who've edited Wikipedia may be used to this. I followed the same syntax [in FromThePage].
What we have here is a set of double square braces with the canonical name of the subject--this could be a formatted date, this could be a full name that's spelled out--and then the text that's actually used within the verbatim transcript.
So our example here -- this is when Grinnell meets Klauber. The field note actually says "L. M. Klauber", so the person transcribing has expanded this out to "Laurence M. Klauber". So we have the ability to handle variance in references to Klauber, but still identify them as Klauber.
There are a lot of tables in this database.
- We know that there's this page that Klauber is mentioned on. It's S1 Page 3 in the Grinnell field notes that MVZ has online.
- We've got a subject which is Laurence M. Klauber.
- The subject is categorized as a person, which can be used for analysis and filtering, like Christina showed you.
- And then the individual link between the page and the subject, that contains the variation, is also stored.
- You can show all the pages that mention Laurence M. Klauber, and read the pages in context or just get a listing of them.
- More helpfully, as you're transcribing we can mine those links to automatically suggest mark-up. So the next time we encounter "L. M. Klauber", we can push a button and that will automatically expand the mark-up of "L. M. Klauber" to "[[Laurence M. Klauber|L. M. Klauber]]".
- You can also feed this to full-text searches. So if you've got a lot of plain-text transcripts which contain Laurence M. Klauber, we can automatically populate the search with those variations, creating an OR query with "Klauber", "L. M. Klauber"
- And then we can mine the mark-up for correspondences [between subjects] as Christina showed.
We're going to be doing more CMS integrations. We're working on Omeka. The Internet Archive is done. There are a couple of grant applications that involve hooking FromThePage up to Fedora Commons.
We also really want to contextualize links in time and place. We want the ability for people to define where the person writing the journal is where they're writing, and then to apply those geotags and chronotags to the references. So you could map when species were mentioned. You could extract a visual itenerary.
We need more formatting options. One of our volunteers has found all kinds of crazy editorial issues for handling strike-outs and things like that.
And the last thing that we're looking for is more projects.