Tech

Not Your Daddy's Database

May 1, 1996

nce a month, the International Price Program at the Labor Department's Bureau of Labor Statistics performs a little magic. The IPP's 130 clerical employees, computer specialists and economists turn 30,000 forms from 7,000 American importing and exporting companies into the Import Price Index and the Export Price Index, which measure and analyze price changes for U.S. imports and exports. The bureau uses the indices, which are available to the public, in its quarterly calculations of the gross domestic product.

The IPP provides its data with confidence, meets its monthly deadline and can even customize the presentation of its price change information on demand. But the program wasn't always so efficient.

A decade ago, says IPP project manager Mike Share, IPP data were stored in six or seven databases in a mainframe computer and in some PC programs. The multitude of databases made it difficult for the IPP's programs to synchronize processes and for computer staff to develop expertise in administering any one system. The IPP's database was inflexible. "We were at the point five years ago, working in an ADP processing environment that was complex and little understood, that it was hard to respond to any demand for new features," says Share.

Perhaps most troubling, the IPP found it difficult to guarantee the integrity of its data, a major problem for a division whose entire value rests in the accuracy of its statistics. For a time, the IPP used software to ensure data consistency between and within databases. IPP staff also produced reports that pointed out data discrepencies. But the bureau decided that in order to truly improve the reliability of its data "we had to have the central database repository itself ensure those integrities," Share says. So, in 1995, the bureau moved to one consolidated, enterprise-wide database management system (DBMS).

The bureau's Sybase SQL Server11 DBMS has given the agency the quality and efficiency it was seeking. In particular, the system gets high marks from Share for its flexibility: "It is so easy to make a design change on the fly-to add new tables, to change the relationships between tables," he says.

These days, many agencies find themselves in a position similar to that of the Bureau of Labor Statistics: They've been using DBMSs for several years-they even have programs, similar to the IPP, that could not operate in the absence of database management technology-and now they are seeking improvements. For managers, the issues have become, "How do we make the DBMS we have run faster and handle more data?" "How can we incorporate new types of data into our system?" "How do we connect to our field officers?"

DBMS Choices

There are four types of database management systems: hierarchical, relational, object-oriented and object-relational. Hierarchical systems are little more than computerized file cabinets, in which simple data are stored in data groups (think manila folders), which may in turn be stored in larger data groups (equivalent to hanging file folders). Agencies typically use hierarchical databases to handle particular tasks, such as updating a mailing list. As in the old-fashioned paper storage system, data retrieval is easy only if the user knows the name of the file in which the data is stored. Otherwise, there is no alternative but to open every file in the database and check for the information-an impractical process-which is why indices are often used to track data stored in hierarchical systems. Even when a user knows the file name and its location, data retrieval from a hierarchical system can be time-consuming because users must follow an inflexible series of steps to get what they want. Also, data can't be shared between files, so the number of functions the database can perform is limited. With a hierarchical system, a user can find a file of agency employees named Brown and another file of employees who live in Washington, but the DBMS can't come up with a list of employees named Brown who live in Washington.

A relational DBMS, however, can produce such information. Relational systems store data in rows and columns that can be cross-referenced. Depending on the software used, relational systems respond to requests made in structured query language (SQL, a federally approved, somewhat esoteric code for querying databases) or plain English, or to requests made by clicking on icons. Most relational systems use parallel processing techniques-processing different transactions in different ways at the same time-to avoid bottlenecks and speed retrieval times. The top relational DBMS vendors-IBM, Informix, Microsoft, Oracle and Sybase-are billion-dollar companies that pioneered relational database management software in the late 1980s.

The biggest drawback to relational DBMSs may be their inability to support complex data such as photographs, audio, video and multimedia. That's because relational systems store complex data as binary large objects (BLOBs) in order to fit them into tabular format. Users can retrieve BLOBs, but they can't index, search or manipulate them within a server.

Software manufacturers solved this problem by inventing another way to store data-the object-oriented DBMS. Object-oriented DBMSs store complex data in clusters called "objects," which the user locates and manipulates by following pointers in applications written in programming languages such as C++ and SmallTalk. Traditionally, object-oriented DMBSs have been used by the few federal employees who manipulate geo-spatial data, such as meteorological information, sites on maps or CAD (computer-aided design) drawings. Object-oriented database applications are more expensive to develop and maintain than relational DBMSs, and the technology does not accommodate large amounts of data or large numbers of users.

However, this is changing. With the arrival of the Internet, the demand for systems that manage complex data has moved from the margins into the mainstream "at a pace no one anticipated," says Dave Nahmias, federal program marketing manager at Informix. "The paradigm has shifted from a form-based view of the world to a document view of the world," he says. "Computer users now expect images."

Hence, some of the major vendors have just given birth to the youngest generation of DBMSs-object-relational. Object-relational systems are relational databases that regard complex data as objects rather than BLOBs. The systems thus enable large numbers of users to manipulate large stores of complex data. Object-relational DBMSs can also respond to queries such as "find me other images that look like this." So far, two companies have unveiled enterprise-capable object-relational DBMSs. In February, Oracle introduced the Oracle Universal Server, which consists of an Oracle7.3 relational database plus geo-spatial, video, messaging and World Wide Web servers. Informix's Universal Server is due out by the fourth quarter of this year. It will combine Informix's relational database with an object-oriented server from Illustra Information Technologies Inc., a company that Informix bought last year.

Agencies also have the option of cobbling together object-relational DBMSs by integrating objected-oriented applications into their existing relational systems. MapInfo Corp., for example, provides mapping software for HUD's grant application database.This approach requires the large amount of processing power found in the new 64-bit computers.

On the Move

While some agencies are consolidating database management systems and incorporating complex data, others are looking to remote and mobile applications to improve their systems.

For example, this month the IRS's excise tax compliance division is loading Oracle's Mobile Agents product on the laptops of its dyed diesel compliance officers. These IRS agents randomly test vehicles on the highways to see if they are powered by dyed diesel fuel, a nontaxed fuel intended for off-road use only. Drivers found using dyed diesel on roads must pay a penalty of $10 per gallon multiplied by the number of their previous violations.

When IRS agents discover dyed diesel in the tank of a truck, they interview the driver, enter the data into their laptops and transmit the information by cellular phone to the agency database. In minutes, the database tells the agents what the driver's fine should be. If the cellular transmission is disrupted, say, by a thundercloud passing overhead, the application remembers how much information was sent before the connection was broken and begins the next transmission mid-document.

Improving data collection out in the field is also a concern at the Bureau of Labor Statistics. "We want to put data capture in the field as close to the point of data collection as possible," says IPP's Share.

DBMS vendors are busy developing remote and mobile applications. Sybase offers a mass deployment product called SQL Anywhere, and Informix is working with MobileWare Corp. to develop mobile architecture for Informix applications.

The type of database management system that works best for an agency can only be determined by the agency, say DBMS experts. Still, one federal information technology specialist suggests a general rule for acquiring new database management technology.

"Analyze, analyze, analyze," says Wilbert Miles, ADP consultant for the Job Corps, a division of the Labor Department that uses an Informix DBMS to track student enrollment and performance at 140 Job Corps training centers nationwide. "And don't stop analyzing while you're implementing the system or after it's up and running. The more you know about what your needs are and what you expect the technology to do for you, the better you'll do."

NEXT STORY: IRM Roundtable