Annotations
The changes or additions made to a document using sticky
notes, a highlighter, or other electronic tools. Document
images or text can be highlighted in different colors,
redacted (blacked-out or whited-out), stamped (e.g.
“FAXED” or “CONFIDENTIAL”), or have electronic
sticky notes attached. Annotations should be overlaid and
not change the original document.
ASCII
American Standard Computer Information Interchange. Used
to define computer text that was built on a set of 255
alphanumeric and control characters. ASCII has been a
standard, non-proprietary text format since 1963.
ASP
Active Server Pages. A technology that simplifies
customization and integration of Web applications. ASPs
reside on a Web server and contain a mixture of HTML code
and server-side scripts. An example of ASP usage includes
having a server accept a request from a client, perform a
query on a database, and then return the results of the
query in HTML format for viewing by a web browser.
Bar Code
A small pattern of vertical lines that is read by a laser
or an optical scanner, and which corresponds to a record
in a database. An add-on component to imaging software,
this feature is designed to increase the speed with which
documents can be archived.
Batch Processing
The name of the technique used to input a large amount of
information in a single step, as opposed to individual
processes.
Bitmap/Bitmapped
See Raster/Rasterized.
BMP
A native file format of Windows for storing images called
“bitmaps.”
Boolean Logic
The use of the terms “AND,” “OR” and “NOT” in
conducting searches. Used to widen or narrow the scope of
a search.
Briefcase
A method to simplify the transport of a group of documents
from one computer to another.
Burn (CDs or DVDs)
To record or write data on a CD or DVD.
Caching (of Images)
The temporary storage of image files on a hard disk for
later migration to permanent storage, like an optical or
CD jukebox.
CD Publishing
An alternative to photocopying large volumes of paper
documents. This method involves coupling image and text
documents with viewer software on CDs. Sometimes search
software is included on the CDs to enhance search
capabilities.
CD-R
Short for CD-Recordable. This is a CD which can be written
(or recorded) only once. It can be copied to distribute a
large amount of data. CD-Rs can be read on any CD-ROM
drive whether on a standalone computer or network system.
This makes interchange between systems easier.
CD-ROM
Compact Disc Read Only Memory. Written on a large scale
and not on a standard computer CD burner (CD writer), they
are an optical disk storage media popular for storing
computer files as well as digitally-recorded music.
CD-ROM Drive
A computer drive that reads compact discs.
Client-Server Architecture vs.
File-Sharing
Two common application software architectures found on
computer networks. With file-sharing applications, all
searches occur on the workstation, while the document
database resides on the server. With client-server
architecture, CPU intensive processes (such as searching
and indexing) are completed on the server, while image
viewing and OCR occur on the client. File-sharing
applications are easier to develop, but they tend to
generate tremendous network data traffic in document
imaging applications. They also expose the database to
corruption through workstation interruptions.
Client-server applications are harder to develop, but
dramatically reduce network data traffic and insulate the
database from workstation interruptions.
COLD
Computer Output to Laser Disk. A computer programming
process that outputs electronic records and printed
reports to laser disk instead of a printer. Can be used to
replace COM (Computer Output to Microfilm) or printed
reports such as green-bar.
COM
Computer Output to Microfilm. A process that outputs
electronic records and computer generated reports to
microfilm.
COM
Object
Component Object Model. COM refers to both a specification
and implementation developed by Microsoft Corporation,
which provides a framework for integrating components of a
software application. COM allows developers to build
software by assembling reusable components from different
vendors.
Compression Ratio
The ratio of the file sizes of a compressed file to an
uncompressed file, e.g., with a 20:1 compression ratio, an
uncompressed file of 1 MB is compressed to 50 KB.
CPU
Central Processing Unit. The “brain” of the computer.
De-shading
Removing shaded areas to render images more easily
recognizable by OCR. De-shading software typically
searches for areas with a regular pattern of tiny dots.
De-skewing
The process of straightening skewed (off-center) images.
De-skewing is one of the image enhancements that can
improve OCR accuracy. Documents often become skewed when
they are scanned or faxed.
De-speckling
Removing isolated speckles from an image file. Speckles
often develop when a document is scanned or faxed.
Dithering
The process of converting grays to different densities of
black dots, usually for the purposes of printing or
storing color or grayscale images as black and white
images.
Document Imaging
Software used to store, manage, retrieve and distribute
documents quickly and easily on the computer.
Drag-and-Drop
The movement of on-screen objects by dragging them across
the screen with the mouse.
Duplex Scanners vs. Double-Sided Scanning
Duplex scanners automatically scan both sides of a
double-sided page, producing two images at once.
Double-sided scanning uses a single-sided scanner to scan
double-sided pages, scanning one collated stack of paper,
then flipping it over and scanning the other side.
DVD
Digital Video Disc or Digital Versatile Disc. A plastic
disc, like a CD, on which data can be written and read.
DVDs are faster, can hold more information, and can
support more data formats than CDs.
Electronic Document Management
Imaging software that helps manage electronic documents.
Erasable Optical Drive
A type of optical drive that uses erasable optical discs.
Flatbed Scanner
A flat-surface scanner that allows users to input books
and other documents.
Folder Browser
A system of on-screen folders (usually hierarchical or
“stacked”) used to organize documents. For example,
the File Manager program in Microsoft Windows is a type of
folder browser that displays the directories on your disk.
Forms Processing
A specialized imaging application designed for handling
pre-printed forms. Forms processing systems often use
high-end (or multiple) OCR engines and elaborate data
validation routines to extract hand-written or poor
quality print from forms that go into a database. This
type of imaging application faces major challenges, since
many of the documents scanned were never designed for
imaging or OCR.
Full-text Indexing and Search
Enables the retrieval of documents by either their word or
phrase content. Every word in the document is indexed into
a master word list with pointers to the documents and
pages where each occurrence of the word appears.
Fuzzy Logic
A full-text search procedure that looks for exact matches
as well as similarities to the search criteria, in order
to compensate for spelling or OCR errors.
GIF
Graphics Interchange Format. CompuServe’s
native file format for storing images.
Gigabyte
One billion bytes. Also expressed as one thousand
megabytes. In terms of image storage capacity, one
gigabyte equals approximately 17,000 81/2" x 11"
pages scanned at 300 dpi, stored as TIFF Group IV images.
Grayscale
See “Scale-to-Gray.”
Hierarchical Storage Management (HSM)
Software that automatically migrates files from on-line to
near-line storage media, usually on the basis of the age
or frequency of use of the files.
ICR
Intelligent Character Recognition. A software process that
recognizes handwritten and printed text as alphanumeric
characters.
Image Enabling
Allows for fast, straightforward manipulation of a client
through third-party applications. In examples with
LaserFiche, image enabling allows for launching the
LaserFiche client, displaying search results in the
client, and bringing up the scan dialogue box, all from
within a third party application.
Image Processing Card (IPC)
A board mounted in either the computer, scanner or printer
that facilitates the acquisition and display of images.
The primary function of most IPCs is the rapid compression
and decompression of image files.
Index Fields
Database fields used to categorize and organize documents.
Often user-defined, these fields can be used for searches.
Internet Publishing
Specialized imaging software that allows large volumes of
paper documents to be published on the Internet or
intranet. These files can be made available to other
departments, offsite colleagues or the public for
searching, viewing and printing.
IPX/SPX
Communications protocol used by Novell networks.
ISIS and TWAIN Scanner Drivers
Specialized applications used for communication between
scanners and computers.
ISO 9660 CD Format
The International Standards Organization format for
creating CD-ROMs that can be read worldwide.
JPEG
Joint Photographic Experts Group (JPEG or JPG). An image
compression format used for storing color photographs and
images.
Jukebox
A mass storage device that holds optical disks and loads
them into a drive.
Key Field
Database fields used for document searches and retrieval.
Synonymous with “index field.”
Magneto-Optical Drive
A drive that combines laser and magnetic technology to
create high-capacity erasable storage.
MAPI
Mail Application Program Interface. This Windows software
standard has become a popular e-mail interface and is used
by MS Exchange, GroupWise, and other e-mail packages.
MFP
Multifunction Printer or Multifunctional Peripheral. A
device that performs any combination of scanning,
printing, faxing, or copy.
Near-Line
Documents stored on optical disks or compact disks that
are housed in the jukebox or CD changer and can be
retrieved without human intervention.
NetWare Loadable Module (NLM)
An application that runs as part of the network operating
system (NOS) of a Novell NetWare server.
NT
Network Technology. Refers to Microsoft Windows NT server
and workstation software.
n-tier
architecture
The term can apply to the physical or logical
architecture of computing. The term refers to a method of
distributed computing in which the processing of a
specific application occurs over “n” number of
machines across a network. Typical tiers include a data
tier, business logic tier, and a presentation tier,
wherein a given machine will perform the individualized
tasks of a tier. Scalability is a primary advantage of
n-tier architecture.
OCR
Optical Character Recognition. A software process that
recognizes printed text as alphanumeric characters.
Off-Line
Archival documents stored on optical disks or compact
disks that are not connected or installed in the computer,
but instead require human intervention to be accessed.
On-Line
Documents stored on the hard drive or magnetic disk of a
computer that are available immediately.
Optical Disks
Computer media similar to a compact disc that cannot be
rewritten. An optical drive uses a laser to read the
stored data.
Optical Jukebox
See “Jukebox.”
Phase Change
A method of storing information on rewritable optical
disks.
Pixel
Picture Element. A single dot in an image. It can be black
and white, grayscale or color.
Portable Volumes
A feature that facilitates the moving of large volumes of
documents without requiring copying multiple files.
Portable volumes enable individual CDs to be easily
regrouped, detached and reattached to different databases
for a broader information exchange.
RAID
Redundant Array of Independent Disks. A collection of hard
disks that act as a single unit. Files on RAID drives can
be duplicated (“mirrored”) to preserve data. RAID
systems may vary in levels of redundancy, with no
redundancy being a single, non-mirrored disk as level 0,
two disks that mirror each other as level 1, on up to
level 5, the most common.
Raster/Rasterized (Raster or Bitmap
Drawing)
A method of representing an image with a grid (or
“map”) of dots or pixels. Typical raster file formats
are GIF, JPEG, TIFF, PCX, BMP, etc.
Redaction
A type of document annotation that provides word-level
security by concealing from view specific portions of
sensitive documents. Like all annotations in a document
imaging system, redactions should be image overlays that
protect information but do not alter original document
images.
Region (of an image)
An area of an image file that is selected for specialized
processing. Also called a “zone.”
Scale-to-Gray
An option to display a black and white image file in an
enhanced mode, making it easier to view. A scale-to-gray
display uses gray shading to fill in gaps or jumps (known
as aliasing) that occur when displaying an image file on a
computer screen. Also known as grayscale.
Scalability
The capacity of a system to expand without requiring major
reconfiguration or re-entry of data. Multiple servers or
additional storage can be easily added.
Scanner
An input device commonly used to convert paper documents
into computer images. Scanner devices are also available
to scan microfilm and microfiche.
SCSI
Small Computer Systems Interface. Pronounced “skuzzy.”
A standard for attaching peripherals (notably mass storage
devices and scanners) to computers. SCSI allows for up to
7 devices to be attached in a chain via cables. The
current SCSI standard is “SCSI II,” also known as
“Fast SCSI.”
SCSI Scanner Interface
The device used to connect a scanner with a computer.
SQL
Structured Query Language. The popular standard for
running database searches (queries) and reports.
TCP/IP
Network communications protocol. This is the protocol used
by the Internet.
Templates, Document
Sets of index fields for documents.
Thumbnails
Small versions of an image used for quick overviews or to
get a general idea of what an image looks like.
TIFF
Tagged Image File Format. A non-proprietary raster image
format, in wide use since 1981, which allows for several
different types of compression. TIFFs may be either single
or multi-page files. A single-page TIFF is a single image
of one page of a document. A multi-page TIFF is a large
single file consisting of multiple document pages.
Document imaging systems that store documents as
single-page TIFFs offer significant network performance
benefits over multi-page TIFF systems.
TIFF Group III (compression)
A one-dimensional compression format for storing black and
white images that is utilized by most fax machines.
TIFF Group IV (compression)
A two-dimensional compression format for storing black and
white images. Typically compresses at a 20-to-1 ratio for
standard business documents.
Video Scanner Interface
A type of device used to connect scanners with computers.
Scanners with this interface require a scanner control
board designed by Kofax, Xionics or Dunord.
Workflow, Ad Hoc
A simple manual process by which documents can be moved
around a multi-user imaging system on an “as-needed”
basis.
Workflow, Rule-Based
A programmed series of automated steps that route
documents to various users on a multi-user imaging system.
WORM Disks
Write Once Read Many Disks. A popular archival storage
media during the 1980s. Acknowledged as the first optical
disks, they are primarily used to store archives of data
that cannot be altered. WORM disks are created by
standalone PCs and cannot be used on the network, unlike
CD-Rs.
ZIP
A common file compression format that allows quick and
easy storage for transport.
Zone OCR
An add-on feature of the imaging software that populates
document templates by reading certain regions or zones of
a document, and then placing the text into a document
index field.