The open questions I have regard the markup itself and its display. The TEI manuscript transcription guidelines refer to an
I've been persuaded by Papa's Diary that probable readings are tremendously useful, so I should probably rephrase
unclear. I want to encourage scribes to be a free as possible with the tags, increasing the transparency of their transcriptions. In fact, the Papa's Diary usage — in which Hebrew is displayed as an image, but transcribed into Latin characters — makes me think that
unclearis not sufficiently generalized. I may need to either come up with other tags for image links, or generalize
unclearinto something with a different name.
Implementing the image manipulation code would not be difficult, except that a lot of work needs to be done in the UI, which is not my strength.
Scribe hits an 'unclear' icon on the page transcription screen.
- This changes their mouse pointer to something that looks appropriate for cropping.
- They click on the image and start dragging the mouse.
- This displays a dashed-rectangle on the image that disappears on mouseUp.
- The mouse pointer changes to a waiting state until a popup appears displaying the cropped image and requesting text to insert.
- A paragraph of explanatory text would explain that the text entry field should contain a possible reading for the unclear text.
- The text entry field defaults to "illegible", so users with low-legibility texts will be able to just draw rectangles and hit enter when the popup appears.
- Hitting OK inserts a tag into the transcription. Hitting cancel returns the user to their transcription.
Clicking the 'unclear' icon triggers an 'I am cropping now' flag on the browser and turns off zoom/unzoom.
- onMouseDown sets a begin coordinate variable
- onMouseUp sets an end coordinate variable, launches an AJAX request to the server, and sets the pointer state to waiting.
- The server receives a requst of the form 'create an illegible tag for page X with this start XY and this end XY.
- It loads up the largest resolution image of that manuscript page, transposes the coordinates if the display image was zoomed, then crops that section of the image.
- A new record is inserted into the cropped_image table for the page and the cropped image itself is saved as a file.
- RJS tells the browser to display the tag generation dialog.
- The tag generation dialog inserts the markup
<illeg image="image_id.jpg">illegible</illeg>at the end of the transcription.
- When the transcription is saved, we parse the transcription looking for illegible tags, deleting any unused tags from the database and updating the db entries with the user-entered text.
I'll need a new
cropped_imagetable with a foreign key to
page, a string field for the display text, a filename for the image, and perhaps a sort order. My existing models will change as follows to support the new relationship:
cropped_image belongs_to :page
page has_many :cropped_images
At display time, the illegible tag is transformed into marked-up HTML. The contents of the HTML should be whatever the user entered, but a formatting difference should indicate that the transcription is uncertain -- probably italicizing the text would work. The cropped images need to be accessible to viewers -- editorial transparency is after all the point of image-based transcription. I'm tempted to just display any cropped images at the bottom of the transcription, above the user-annotations. I could then put in-page anchor tags around the unclear text elements, directing the user to the linked image.
The idea here is to display the cropped images along with the transcription as footnotes. The print version of a transcription is unlikely to include entire page images, so these footnotes would expose the scribe's decision to the reader's evaluation. In this case I'd want to render the unclear text with actual footnote numbers corresponding to the cropped image. Perhaps the printer should also have the option of printing an appendix with all cropped images, their reading, and the page in which they appear.