Proofreading Process
Public domain works, typically books with expired copyright, are scanned by volunteers or culled from digitalization projects, and the images are run through optical character recognition (OCR) software. Since OCR software is far from perfect, often a large number of errors appear in the resulting text. To correct them, pages are made available to volunteers via the Internet; the original page image and the recognized text appear side by side. This process thereby distributes the time-consuming error-correction process, akin to distributed computing.
Each page is proofread and formatted several times, and then a post-processor combines the pages and prepares the text for uploading to Project Gutenberg.
Besides custom software created to support the project, DP also runs a forum and a wiki for project coordinators and participants.
Read more about this topic: Distributed Proofreaders
Famous quotes containing the word process:
“The American, if he has a spark of national feeling, will be humiliated by the very prospect of a foreigners visit to Congressthese, for the most part, illiterate hacks whose fancy vests are spotted with gravy, and whose speeches, hypocritical, unctuous, and slovenly, are spotted also with the gravy of political patronage, these persons are a reflection on the democratic process rather than of it; they expose it in its process rather than of it; they expose it in its underwear.”
—Mary McCarthy (19121989)