This is where the rubber meets the road; it’s where hopefully 99.9% of the bugs are worked out; it’s where the message of the author goes from being hidden behind a hundred typographical distractions to being clearly seen because the text itself has gotten out of the way.

Because we’re dealing with a specific set of circumstances with these historic books, there are a specific set of things that we have to watch out for that you wouldn’t in a book written by a person living today. It’s got the complexities of the difference in printing styles 150 years ago, the difference in common book layout and typefaces between then and now, the difference of these books having been written in by their owners or stamped on by libraries, and the difference of a computer trying to transform images from a camera into text we can edit.

Truth be told, it’s a gift from God that the books are in such a good shape by the time that we start editing them! And the fact that we have digital copies of the originals side by side is also an amazing bonus - I can’t imagine how painful the process would be if we had to reference physical copies.. But, there are still challenges that we have to overcome, and I’ll just run through a few of them here.

My Setup

I’ve mentioned this a couple times in the tutorials already, but here it is again for good measure - I find it most convenient to have both the original pages and the editable text open, at about a 40/60 or even 30/70 split respectively, as you can see below. I’ve got the original pages of the book zoomed in just enough that I can see the entire text block of the page without scrolling, and I have the editable text a pretty big size so I’m not at all straining to details, like if something’s an apostrophe (') or a backtick (`)

I start at the beginning and go through the book, scrolling through the text with my right hand on the down arrow and through the original pages with my left hand on the trackpad or mouse. I have found that this is the best way for me to work on the books - I can be analyzing the book text but always have the original book pages to compare.

Proper Paragraphs

We want to make sure that the paragraphs in the new text match the paragraphs in the old. The easiest way I’ve found to check the paragraphs is to look at the first and last words of the paragraph as I scroll - if they’re not the same, there’s a problem! Other clues would be capitalization and spacing errors.

You might be surprised at the regular occurance of paragraph issues - it’s a common problem. Sometimes you might go half a book without any, and then in one chapter there might be 5-6 messed up paragraphs.

Also of note: there are almost always paragraph problems when footnotes are present. So if there are footnotes, be sure to check the paragraphs surrounding and make sure they are correct.

Problem Text

OCR (Optical Character Recognition) is the process by which the text on the page images becomes text we can edit. Normally, it works extraordinarily well, but it doesn’t do well with anything other than normal text. A few things you’ll notice cause problems for our texts:

  • Gothic Text
  • Drop Caps
  • Small Caps
  • Writing on original page margins
  • Library stamps on the pages
  • Random blots of ink.

When you see these kinds of things in the orginal book, there will be issues to fix in the text, so be on the lookout!

Here’s an example, again from Weekday Religion by J.R. Miller. You’ll see that somewhere along the way, one of the book’s readers has highlighted a sentence by surrounding it with parenthesis - not a bad idea (unless it was a library book at the time!), but it does cause us some problems in the text, as seen in the screenshot.

These are the kind of things that we’re looking for in the books - random, non-reproducible things that only a human can fix. Another example would be where one kind of punctuation is mistaken for another, like this very bold semicolon is mistaken for the letter I:

So that pretty much covers the basics of what editing these books looks like - it’s comparing the text to the originals, and fixing errors. I’ve broken down more specific issues into other tutorials: