| April 11, 2022

7 Problems to Expect when You're Transcribing Historical Data and How to Avoid Them

By Dan Howlett and Emily Meyers in workflows tagged howlett, meyers, transcription

So you want to start transcribing data from historical documents? The task seems easy! However, there are quite a few issues that can pop up which can create problems for other parts of the project. Below are some of the expected errors our transcribers on Death By Numbers frequently run into and some tips on how to handle them. The job may sound intimidating with all the potential pitfalls, but we have suggested solutions from all the tips and tricks our team has picked up over the past few months.

Problem Number 1: Forgetting about the Data

Solution: Be Familiar with Your Data

Before getting too far ahead of yourself, familiarize yourself with your transcription project. Our project already has several blog posts on the London Bills of Mortality to assist new transcribers like the Bills 101. If you don’t know what you’re transcribing, you might miss key pieces of information. We primarily transcribe two columns of data: both the plague deaths in each parish as well as non-plague burials. You might overlook a column or skip entries. When the project includes thousands of formulaic documents, it’s easy for eyes to glaze over sections or forget to double check a once-in-one-hundred entry.

Our project has a failsafe for these cases: not only is our team made up of transcribers and reviewers who both check each Bill, at the bottom of each geographic section of the Bills, a separate summary total of non-plague deaths and plague deaths remind our transcribers we need to add a variable. If the Bill says there are 10 plague deaths, and you didn’t transcribe any, you might have missed something. Not every dataset will have built-in reminders, so it’s important that transcribers understand what they’re copying.

Problem Number 2: Inconsiderate Naming Conventions By Historical Actors

Solution: Anticipate and Keep a Running List of Likely Mixups

If the Bills of Mortality transcription team had a time machine, our first stop wouldn’t be to see a Shakespeare play or to save documents from the 1666 fire. We’d go to rename London parishes. We transcribe data based on the Bills’ list of parishes, but 14 St. Mary-somethings are confusing. Even worse, St. Botolph without Aldersgate, St. Botolph without Aldgate, and St. Botolph without Bishopsgate follow each other on the list. You can learn more about Parish naming confusion in our most recent blog post, A Parish By Any Other Name.

It’s easy to mix up names, and it happens to every transcriber. This is why transcribers need to be familiar with the sources and why careful reviewers are essential. Each project will have sneaky mix ups and subtle nuances. A lot of these types of transcription errors can be anticipated, and while forewarning new transcribers will not stop every mistake, maintaining a running list of potential jumbles can help ease transcriptions.

excerpt from a bill of mortality showing all the St Mary parishes within the city walls

Figure 1. Excerpt from a bill of mortality showing all the St Mary parishes within the city walls.

Problem Number 3: It’s… a number?

Solution: Determine a Policy for Illegible Data

Early modern printing and handwriting is often a challenge to read. You can see our weekly #TranscriptionThursday posts on Twitter where we share examples of problem numbers. While our Bills give sum totals, and a few members of our project team know addition, one of the goals of Death By Numbers is to check early modern arithmetic. Your project might allow deduction, but we prefer not to assume a lack of entry is a 0 or that a smudge is a 2, no matter how much it makes sense. On DataScribe, we can mark data entry fields as illegible and encourage transcribers to be cautious. Your project should set clear guidelines prior to onboarding transcribers so every team member knows how to encounter the questionable 7 or the deceitful 4.

excerpt from bill of mortality showing black smears over text that makes it illegible

Figure 2. Excerpt from bill of mortality showing black smears over text that makes it illegible.

Problem Number 4: Damaged Sources: Are They Whole Numbers or Hole Numbers?

Solution: See Above

The early modern period was unkind to paper, and even if you’re transcribing documents from last week, you may find rips, tears, and cutouts. Or the page is fine, but your PI might have taken blurry photographs. Or the page was collected in a bound volume and the binding hides half the data. Prepare your transcribers for these imperfect documents and let them know that your illegible data policy covers more than just smudged writing.

excerpt from a bill of mortality with a circular splotch where the number should be

Figure 3. Excerpt from a bill of mortality with a circular splotch where the number should be.

Problem Number 5: Tech Happens

Solution: Get to Know the Tools and Interface

Death By Numbers is transcribed on DataScribe, a new data transcription module for Omeka S developed by RRCHNM. It’s new, so take time to familiarize yourself with it. Is your browser out of date? File names correct? Do you know how to save and submit for review? If you find something wrong, double check your tech. Search for updates on the software or your device. Read the documentation for your transcription tool. Project supervisors should allow transcribers some time to play around with their tools and gain familiarity with them. Each software will have its own quirks, layout, and intuitive (or not) functions like DataScribe’s Focus mode or hotkeys to aid the work. And always, remember to save early and often.

Problem Number 6: Know When to Check

Solution: You Shouldn’t be Surprised (Small Level of Math Knowledge Required)

As mentioned earlier, Death By Numbers is not checking transcriptions based on math. However, intuitive math may help with data transcription. We referred to the subsection totals on the Bills already, and those can indicate if transcription went poorly. For instance, when we transcribe the Parishes within the Walls, we write down the number of deaths in each individual parish for that section on the Bill. If you transcribed the number one about twenty times, but the subtotal is over 100, something might be off. The totals should not be completely unexpected, so double check you caught all the data entries.

It may also be a weird document rather than transcriber error. Flag it for a closer review so someone else checks it or sends it up the flagpole to the PI. Those second pairs of eyes can determine where the error came from and what to do about special cases.

Problem Number 7: Not Saving

Solution: Please Remember to Save

Save. Do it. Go save something right now. No one wants to start over due to a misclick out of a transcription. DataScribe even allows you to save and continue transcribing the same page. You want to get credit for the work done, so make sure you do. If you forget to save, you’ll miss information, annoy your supervisor, or maybe die of mortification.

You Did It!

If you have made it this far without being scared off, you’re ready to transcribe.. Stay tuned for more from Death By Numbers by following our Twitter account and keeping an eye out for future blog posts!