Document Processing
The scanning or digitization of paper documents for storage makes different requirements of the scanning equipment used than scanning of pictures for reproduction. While documents can be scanned on general-purpose scanners, it is more efficiently performed on dedicated document scanners.
When scanning large quantities of documents, speed and paper-handling is very important, but the resolution of the scan will normally be much lower than for good reproduction of pictures.
Document scanners have document feeders, usually larger than those sometimes found on copiers or all-purpose scanners. Scans are made at high speed, perhaps 20 to 150 pages per minute, often in grayscale, although many scanners support color. Many scanners can scan both sides of double-sided originals (duplex operation). Sophisticated document scanners have firmware or software that cleans up scans of text as they are produced, eliminating accidental marks and sharpening type; this would be unacceptable for photographic work, where marks cannot reliably be distinguished from desired fine detail. Files created are compressed as they are made.
The resolution used is usually from 150 to 300 dpi, although the hardware may be capable of somewhat higher resolution; this produces images of text good enough to read and for optical character recognition (OCR), without the higher demands on storage space required by higher-resolution images.
Document scans are often processed using OCR technology to create editable and searchable files. Most scanners use ISIS or TWAIN device drivers to scan documents into TIFF format so that the scanned pages can be fed into a document management system that will handle the archiving and retrieval of the scanned pages. Lossy JPEG compression, which is very efficient for pictures, is undesirable for text documents, as slanted straight edges take on a jagged appearance, and solid black (or other color) text on a light background compresses well with lossless compression formats.
While paper feeding and scanning can be done automatically and quickly, preparation and indexing are necessary and require much work by humans. Preparation involves manually inspecting the papers to be scanned and making sure that they are in order, unfolded, without staples or anything else that might jam the scanner. Additionally, some industries such as legal and medical may require documents to have Bates Numbering or some other mark giving a document identification number and date/time of the document scan.
Indexing involves associating relevant keywords to files so that they can be retrieved by content. This process can sometimes be automated to some extent, but it often requires manual labour performed by data-entry clerks. One common practice is the use of barcode-recognition technology: during preparation, barcode sheets with folder names or index information are inserted into the document files, folders, and document groups. Using automatic batch scanning, the documents are saved into appropriate folders, and an index is created for integration into document-management systems.
A specialized form of document scanning is book scanning. Technical difficulties arise from the books usually being bound and sometimes fragile and irreplaceable, but some manufacturers have developed specialized machinery to deal with this. Often special robotic mechanisms are used to automate the page turning and scanning process.
Read more about this topic: Image Scanner
Famous quotes containing the word document:
“... research is never completed ... Around the corner lurks another possibility of interview, another book to read, a courthouse to explore, a document to verify.”
—Catherine Drinker Bowen (18971973)