Collaborative Manuscript Transcription: 2015

On October 22-24 of 2015, I was fortunate to attend the NEH/DFG-sponsored MEDEA workshop in Regensburg, Germany. The workshop gathered together American and European scholars, editors, and technicians working with digital editions of financial records, and often-overlooked type of textual source. I presented along with Anna Agbe-Davies, a faculty member at the University of North Carolina-Chapel Hill, with whom I am collaborating to extend FromThePage to support tabular data within texts. You can read background on the project at our abstract at the MEDEA website.

This document is a composite of the prepared text delivered by Anna Agbe-Davies and a transcript of the ex tempore talk by Ben Brumfield. Each section will be preceded by the name of the speaker in boldface, with editorial interventions in [brackets].

Agbe-Davies:

Neither of us is an historian. I am an archaeologist and, as is usual in the US, also an anthropologist. I came to texts such as the “Slave Ledger” discussed below with a straightforward question: what were enslaved people buying in 19^th-century North Carolina? In this sense, the store records complement the archaeological record, which is my primary interest. Clearly, however, these texts have additional meanings and potential for addressing much more than material culture and consumption. This is exciting for the anthropologist in me. I have experience with the methods of historical analysis, but the technological advances of the last few years mean that I have much to learn about the best techniques for harnessing the potential of such documents.

Ben and I are collaborating to extend the capabilities of his online transcription tool FromThePage, to unleash the full analytical possibilities embodied in such texts, including the archive he will now describe.

Brumfield:
I'd like to introduce the papers of Jeremiah White Graves. These are three volumes that were bound posthumously from approximately thirty notebooks with roughly 1600 pages worth of diaries, formal accounts, and informal accounts that are held at the Alderman Library at the University of Virginia and which may be accessed online at http://tinyurl.com/JWGravesPapers in facsimile edition.

Jeremiah White Graves moved from Louisa County, Virginia to Pittsylvania County, Virginia when he was fifteen years old. In 1823, at the age of 22, using the skills that he learned as a store clerk, he began keeping accounts on his own. These accounts cover his activities trading with his neighbors, but primarily [cover his activities as] a plantation owner. He acquired the plantation of Aspen Grove, as well as inheriting other plantations. Aspen Grove is 120 kilometers north of Stagville, which is the plantation that Anna will discuss, and--like Stagville Plantation--it primarily produced tobacco crops for cash through the work of enslaved laborers.

Some of his accounts are formal. These may look very familiar to many of you. These are how he started his accounts in 1822, but he soon found that a formal accounting system did not serve his needs very well.

He started keeping informal accounts to track other activities, such as (in this case) visits by a doctor to treat members of his household, both slave and free. These informal accounts also covers shipments of logs, corn or cotton to mills. They cover days his children attended school They also cover articles of clothing his children took with them to boarding schools.

One of the most interesting things about these accounts is the light they shed on the relationship between Graves and his enslaved laborers, and the relationships among them and the rest of the community. One of the challenges of the accounts is that they have a very complex topology. Because the accounts are informal, accounts will be written in separate, unrelated [inaudible].

In this case, we have a two-entry account between Graves and "my Henry"--who is one of his primary enslaved laborers--who he loans money to. Henry then pays him back. So we have two entries in this account.

This account is stuck between shipments of cotton and logs to mills in a previous year, sticks of tobacco [stripped], a later account of tobacco cut in fields, and then a much earlier account of tobacco [stripped] in prize barns.

You see a similar challenge over here [points to second page], where--over the intriguing entries on meat sent to laborers at Aspen Grove Plantation from a different plantation--you find this fascinating account with entries between "my Frederic" (another one of Graves's laborers) and Graves. One of the fascinating things about this account is that Frederic dies, and--in one of the only instances in which Graves records women in his informal accounts--Graves settles his account with Malissa, Frederick's enslaved widow.

Another challenge of the accounts is that they have a complex order. Graves began his notebooks with diary entries from front to back. He would write his accounts from back to front. Then when they met in the middle, he would start a new book, [though] sometimes returning to the older books.

As you see here, we have a four-year-long account that starts on the second-to-last page of the book, continues on page 18, then on page 17, and finally finishes up on page 5 of the volume.

While these accounts are complex, they are not unique, so I will hand this over to Anna.

Agbe-Davies:
Stagville was the founding farm for a vast plantation complex assembled by several generations of Bennehans and Camerons¹. A local historian estimates that at their most powerful, the family owned around 900 men, women, and children (Anderson 1985:95). Some of the people at Stagville stayed on after Emancipation, allowing for a fascinating glimpse of the transition from slavery to tenancy and wage labor.

1. The Bennehan/Cameron holdings included nearly 20,000 acres in Durham, Wake, and Granville Counties in 1890 (McDuffie 1890). Anderson estimated a peak 30,000 acres along the Flat, Eno, and Neuse Rivers, not to mention thousands more in western NC, plantations in Alabama and Mississippi, as well as residences in the county seat and the state capital.

Daybooks and ledgers from plantation stores owned by the Bennehan-Cameron family cover the years 1773 to 1895. Many of the men and women whose purchases are recorded therein were the family’s chattel property and, in later years, their tenants or employees. There are forty-five daybooks and twenty ledgers in the family papers, which are collected in the University of North Carolina’s Southern Historical Collection². Eleven volumes are flagged by the finding aid as including purchases by “slaves” or “farm laborers,” though many volumes have no summary and may contain as-yet unidentified African American consumers.

2. In addition to the daybooks and ledgers, there are also cash books, books of ready money sales, and personal/household account books, numbering 142 “financial volumes.” http://www2.lib.unc.edu/mss/inv/c/Cameron_Family.html#

My aim is to digitize and analyze a selection of these daybooks and ledgers. This project augments the Southern Historical Collection’s effort to make important manuscripts available via the Internet. My project not only increases the number of volumes online and in a format that enables analysis by users with varying levels of expertise, but makes the contents of these documents available as data, not merely images.

One of the questions guiding my research is this: What did it mean to shop in a store, if you yourself can be bought and sold? I am interested in both the financial and social aspects of accounting in the plantation context. Daybooks, and ledgers offer an important compliment to the archaeological record at Historic Stagville, in Durham, North Carolina.

[omitted from the oral presentation:] Archaeologists can speculate about, but seldom demonstrate, the paths by which goods reached the quarter. Artifacts may reflect the actions of the owner who issued clothing or tools and passed along hand-me-downs. Conversely, finds may speak to the agency of the owned, as when they hunted or grew food for their own consumption or purchased items of personal adornment with cash earned on the side. However, neither interpretation is evident in the artifacts themselves. Archaeologists need additional sources of information because these distinctions have implications for how we view material aspects of the relationship between owner and owned—how power was wielded, how demands were negotiated. The daybooks and ledgers are one way in which to capture how African American consumers at Stagville—pre-Emancipation and during the years of Jim Crow—fashioned lives with the things that they bought.

Brumfield: What we plan to do is to use the open-source digital edition tool FromThePage--which I run, though I welcome contributions from anyone else--to digitize these documents -- to transcribe them.

FromThePage already handles transcription and presentation online. The core functionality of FromThePage is the wiki-link. FromThePage handles mark-up using a wiki syntax that is backed by a relation database that suggests mark-up. So if a user sees the phrase "Renan" and they transcribe it, this then is expanded to the canonical name "Renan, Virginia".

This this is used for presentation: Users who see Renan can see the explanation. If they explore the subject, they can see an automatically-generated index.

What we plan to do--now we're moving to the draft design--is to add new wiki mark-up to handle sections that will define different blocks within the text. To continue this, to use MarkDown wiki mark-up to describe tables. This addresses data entry. (We're not big fans of hand-coded XML as a user interface; hand-coded wiki? We'll see how that works.)

But what's important and relevant here is that this [mark-up] is interpreted by the software and then displayed -- in HTML we have a display as simple HTML tables. For TEI, we'll expand to TEI tables with the wiki-links expanded using A tags for HTML or references strings to elements within the TEI header.

We have further ideas for exports -- I'm very interested to see other presentations for ideas for those.

However, to serve Anna's analytical needs, we need to export these tables in CSV format. So what we have designed is the ability to export all records from the collection in a single spreadsheet. The spreadsheet will be sparse, so that entries from different tables that contained the same column header when they were encoded will appear in the same column on the spreadsheet. If one table contains an extra column that other tables did not, that will appear in the final spreadsheet, but tables that did not contain that column will [have blank cells] in the spreadsheet. We also plan to expand the data columns to handle the wiki text, so that both canonical subjects and verbatim text will be included.

Agbe-Davies: I have transcribed one document called the “Slave Ledger,” but have found the result to be inadequate for the analyses I would like to perform. The combination of qualitative and quantitative research goals means that neither transcription, nor a spreadsheet can handle the range of analyses necessary.

The many goods listed in the document (spelled variously) need to be categorized in several ways. Sometimes they are purchases, other times, sources of credit. I would like to be able to find both instances of “shoes” but also other instances of “footwear” and “clothing” and “goods made by other members of the plantation community” Not to mention being able to, in various circumstances either merge or separate “shoes” from “repair of shoes.”

Another form of analysis enabled by tags is pulling out purchases by a single canonical individual, even when different names are used. Using my transcription of the Slave Ledger, I still had to pick out individuals for this chart by hand because no text search would pull out all and only references to Frank Kinnon, when there are multiple “Frank”s and his second name appears with several different spellings and grammatical constructions³.

As this slide also shows, the ability to pull together records by categories—with those categories being multiscalar—is important for the quantitative analyses that I perform. In order to examine both trends and change over time, I will be performing analyses within, across, and among manuscripts. Thus, these tags should live somewhere outside any single document.

I will be examining how people spent precious cash or credit to determine whether gaps were left by the provisioning system during slavery times. If the Benehans’ and Camerons’ human property regularly purchased basic staples it would offer an interesting contrast to the paternalistic, “enlightened” slaveowner of their own imaginations (Anderson 1985:96). In addition, I want to know whether people on the Bennehan-Cameron farms were making similar purchases to folks elsewhere in the plantation South (Heath 2004; Martin 1993). Also what (dis)continuities exist between the pre- and post-Emancipation eras, as households assumed greater responsibility for their own sustenance?

3. For example, Frank Kinnon, Kennon Frank, and Frank Kennon, not to be confused with Old Frank/Old Frank Eno.

Because I am not an expert on account books, I don’t know how unusual this is, but I am finding in the Stagville accounts, many instances of debtors trading credits among themselves, using them as a kind of currency unconnected to store purchases, also, instances of someone buying an item for another debtor, and even instances of cooperative purchase or credits. Again, these don’t fit neatly into a standardized recording structure, hence the need for something that is more flexible than a database or spreadsheet, but which nevertheless retains some of the qualities of those kinds of documents. I am as interested in Solomon’s relations with Britain, Mark, Sam, and Ben, as I am in his relationship to R. Bennehan & Son.

At the moment, I have to choose between capturing the qualities of this text as a physical document, or capturing the information that the text contains. It is doubtless significant that Ned’s and Miller George’s entries are off-set here. I don’t want to lose this information in an effort to fit these transactions into a one-size-fits-all structure, such as a database. Likewise, some accounts (like Davy’s, here) are reconciled frequently, others run for long periods of time without a full accounting of what is owed or credited. It will be important to be able to record interim calculations as well as individual debits and credits.

Once digitized, the resulting product will allow users easily to identify seasonal patterns in purchasing, follow individual shoppers, or discover the popularity of store-bought clothing over time, for example. Such resources can reach audiences with different levels of expertise or interest and provide them with rich, attractive materials for their own use, or let them explore the end result as a virtual museum to complement the physical museum experience. Users could easily search on characteristics of the transactions, such as individual account holder, item, or date, to independently answer their own questions about plantation life and modern consumerism. This exploration may even take place on-site. Historic Stagville has had great success with their genealogical database and the staff and board are eager to work together to develop more resources to share with their visitors and other stakeholders, such as the Stagville Descendants Council, an African American heritage group.

My aim is to open transcription up to include friends of, and visitors to, Stagville State Historic Site. My time in the museum world largely predates the blossoming of the digital humanities, but I do know how compelling interactive experiences can be, and that audiences understand and appreciate knowledge so much more when they have a hand in its creation (Smith 2014).

There is no conclusion. This project is an ongoing effort and we feel fortunate to engage with a community of like-minded researchers before we finalize the protocols for transcription and before Ben does additional programming for FromthePage. We have come to this meeting to learn from the successes, mistakes, and experience of others and look forward to many fruitful exchanges with you all.

WORKS CITED

Anderson, Jean Bradley

1985 Piedmont Plantation: the Bennehan-Cameron family and lands in North Carolina. Durham, North Carolina: Historic Preservation Society of Durham.

Heath, Barbara J.

2004 Engendering Choice: Slavery and Consumerism in Central Virginia. In Engendering African American Archaeology: A Southern Perspective. J.E. Galle and A.L. Young, eds. Pp. 19-38. Knoxville: The University of Tennessee Press.

Martin, Ann Smart

1993 Buying into the world of goods: Eighteenth-century consumerism and the retail trade from London to the Virginia frontier Ph.D. dissertation, History, The College of William and Mary.

McDuffie, D. G.

1890 Map of Honorable Paul C. Cameron's Land on Flat, Eno, and Neuse Rivers in Durham, Wake, and Granville Counties, March 1890. http://dc.lib.unc.edu/cdm/singleitem/collection/00133/id/12258: Manuscript map in the Southern Historical Collection, University of North Carolina at Chapel Hill.

Smith, Monica L.

2014 Citizen Science in Archaeology. American Antiquity 79(4):749-762.

This is the text of my talk at the best practices panel at the Crowd Consortium for Libraries and Archives meeting Engaging the Public on May 8, 2015.

One caveat: most of my background is in crowdsourced manuscript transcription, though with the development of FromThePage 2 I've become involved in the related fields of collaborative document translation and crowd-sourced OCR correction. I hope that this is useful to non-textual projects as well.

The best practice I'd like to talk about is returning the product of crowd-sourcing to the volunteers that produced it.

What do I mean by product?

I'm not talking about what project managers consider the final product, whether that be item-level finding aids or peer-reviewed papers in the scholarly press. I'm talking about the raw product – the actual work that comes out of a volunteer's direct effort, or the efforts of their fellow volunteers – the transcript of a letter, the corrected text of a newspaper article, the translated photo captions, the carefully researched footnotes and often personal comments left on pages.

Why?

First, it's the right thing to do. Yesterday we talked about reciprocity and social justice. An older text says “Thou shalt not muzzle the oxen that tread out the corn.”

Crowdsourced transcription projects vary a lot on this. For wiki-like systems, displaying volunteer transcripts is built into the system – I know that's the case for FromThePage, TranscribeBentham and WikiSource, and suspect the same applies to Scripto and DIYHistory. For others, users can't even see their own contributions after they have submitted them. However, the Smithsonian Institute Transcription Center actually added this feature on purpose – the team implementing the center added the ability for users to download PDFs of transcribed documents specifically because they felt it was the Right Thing to Do.

Now that I've quoted the Bible, let's talk about purely instrumental reasons crowdsourcing projects should return volunteers' labor to them.

Incentives

For one thing, exposing the raw data early can better align our projects with the incentives that motivate many volunteers. Most volunteers are not participating because of their affiliation with an institution, nor because they treasure clean library metadata – at least not primarily! What keeps them coming back and contributing is their connection to the material – an intrinsic motivation of experiencing life as a bird-watcher in the 1920s, of marching alongside a Civil War soldier as they transcribe observation cards or diaries.

We should expose the texts volunteers have worked on in ways that are immediately usable to them – PDFs they can print out, texts they can email, URLs they can post on Facebook—to show their friends and families just what they've been up to, and why they're so excited to volunteer.

In some cases this may provide extrinsic rewards project managers can't envision. One of the first projects I worked on, the Zenas Matthews diary of the Mexican-American War—attracted a super-volunteer early on who transcribed the entire diary in two weeks. When I interviewed Scott Patrick, I learned that the biggest reward we could provide – the thing he'd treasure above over badges or leader boards – would be the text itself in a printable and publishable format. You see, Mr. Patrick's heritage organization formally recognizes members who have written books, including editions of primary sources. His contribution to the project certainly matched his fellows' for quality, but access to a usable form of the text—the text he'd transcribed himself—was the thing that stood in his way.

Recruitment

Exposing raw transcripts online during the crowdsourcing process can actually enhance recruitment to crowd-sourcing projects. I've seen this in a personal project I worked on. in which one super-volunteer found the project by Googling his own name. You see, a previous volunteer had transcribed a lot of material that mentioned the a letter carrier named Nat Wooding. So when Nat Wooding did a vanity search, he found the transcribed diaries, recognized the letter carrier as his great-uncle, and became a major contributor to the project. Had the user-generated transcripts been locked away for expert review, or even published online somewhere outside of the crowdsourcing tool, we would have missed the contributions of a new super-volunteer.

Engagement

For the past three years, I've been involved with an non- called Free UK Genealogy. They have volunteers around the world transcribe genealogical records using offline, spreadsheet-like tools so that they can be searched on a freely accessible website.

I spent several months building a new system for crowd-sourced transcription of parish registers, but encountered very little enthusiasm—actually some outright opposition—from the most active volunteers. They were used to their spreadsheets, and saw no value at all to changing what they were doing.

Eventually, we switched from improving the transcription tool-chain to improving the delivery system. We re-wrote the public-facing search engine from scratch, focusing on the product visible to the volunteers and their communities. When we launched the site in April, it received the most positive reviews of any software redesign I've been involved with in two decades in the industry. Best of all—although time frame is too short to have hard numbers—the volunteer community seems to have been reinvigorated, as the FreeREG2 database passed 32 million records at the beginning of the month.

So that's my best practice: expose volunteer contributions online, within your crowdsourcing system, as they are produced. It will improve the quality and productivity of the project, and it's the right thing to do.

Collaborative Manuscript Transcription

Sunday, October 25, 2015

Encoding Account Books Relating to Slavery in the U.S. South at MEDEA Regensburg

Tuesday, May 19, 2015

Day of DH 2015

Friday, May 8, 2015

Best Practices at Engaging the Public at CCLA

Why?

Incentives

Recruitment

Engagement

New Blog Posts are at FromThePage

Posts from the FromThePage Blog

Pages

Upcoming Conference Schedule

Past Conference Talks

Blog Archive

Subjects

Papers

Transcription Systems

Digital Family History