Document Conversion

Document Conversion

October 1996

ELECTRONIC IMAGING & STORAGE GUIDE

Document Conversion

Scanners are becoming smaller, cheaper, faster and more functional

T

he first step in most imaging applications is to capture text, graphics and photos so that they can be stored for easy reference. Electronic scanners do this by using light to "read" images off paper documents, microfilm or photographic film. Ordinary scanners take pictures of images for archival applications in which images are stored for reference only. Optical Character Recognition (OCR) devices are used to recognize shapes and characters from predefined fields and convert them into digital computer code, so that text can be manipulated once it is scanned.

Until recently, many government imaging projects involved high-end color scanners capable of processing more than 100 pages a minute and costing hundreds of thousands of dollars. These large flatbed units, from companies such as Sharp and Xerox, are used in sophisticated defense and intelligence applications, or to tackle high-volume paperwork such as tax forms or medical records.

But now agencies are turning to scanners for smaller jobs as well. Inexpensive desktop and handheld models are being used to store correspondence and other simple tasks. Sheet-fed units, from companies such as Hewlett-Packard and Microtek, can be as small as a PC mouse and cost as little as $200. Data capture has become so popular that Compaq and Hewlett-Packard recently introduced computers with scanners built into the keyboards.

Increased competition in the scanners market has resulted in less expensive and more sophisticated machines. Time- and money-saving features once considered optional are now standard on many models. Duplex scanning, which enables images on both sides of documents to be captured at the same time, is rapidly replacing simplex scanning. Other features such as automatic feeders, color dropout options and super-high resolutions also are turning up on medium-range and low-end machines. Higher resolutions, however, require more scanning time per page and more storage capacity.

Scanning speeds on lower-resolution units are up to about 150 pages a minute, with an average recognition accuracy of 95 percent. Some models have built-in spell checkers and word-analysis programs to ease the cleanup job when characters are misread. Many OCR devices use "fuzzy logic" to decipher mispelled words.

NEXT STORY: Technology Toolbox