Search Tools

Automated probes find data lost in cyberspace

January 1997

INTERNET AND INTRANETS GUIDE

Search Tools

Automated probes find data lost in cyberspace.

A

lthough an exact count is impossible, experts estimate that there are at least several hundred million Web pages out in cyberspace-and thousands more being created every month. This abundance of information is useless unless Net surfers can find what they need quickly and easily. Fortunately, several search technologies are available for locating a lot of information in a small amount of time.

One of the best-known technologies in this young market is the search engine, which is an automated program designed to explore and catalog Web sites based on simple queries. Users can access search engines on the Net via Web browsers and, once there, type in words or phrases on specified topics. Software "spiders" then crawl through the Web and use algorithm-based search logic to retrieve requested data within a couple of seconds.

Some engines do exhaustive searches of thousands of Web sites while others only go to sites that have the most hypertext links pointing to them. Data content also can vary. Several search engines provide full text from every relevant Web page while others supply summaries, site titles or URLs (Universal Resource Locators, which serve as Web addresses). Most engines take simple English database queries, but a few require users to employ the cryptic Boolean logic language to express search conditions.

Several search engines, such as Excite and InfoSeek, employ a type of artificial intelligence known as fuzzy logic to find Web pages related to keywords even if those exact words are not located on the pages. Since search-engine services are so diverse, the best advice is to sample as many as possible to find the most suitable one. Most are subsidized by advertising so the services are usually free for the surfing, providing users can avoid the increasing number of busy signals caused by Net congestion.

A more expensive but thorough alternative is to use commercial software packages that run Web searches simultaneously on several engines. The idea behind multiple searches is to pick up sites from one database that another may have left behind. WebSeeker from the Forefront Group, for instance, compiles results from 20 search-engine databases, eliminates duplicate listings and indexes the results. Similar programs include Blue Squirrel's Squrl, Iconovex's EchoSearch and Quarterdeck's WebCompass. The software runs between $100 and $400, depending on the level of sophistication.

Another type of search tool is the Net directory, which is essentially an electronic Yellow Pages of Web sites. Directories such as Yahoo categorize Web sites based on descriptions submitted by organizations when the sites are registered. Like search engines, services at the various Net directories vary widely. Some simply list URLs under categories and subcategories while others also include some text in their listings. A few even rank sites according to reports on the number of hits they receive. And one, the Four11 Directory (www.four11.com), lists nothing but Internet e-mail addresses.

Search technology is being employed by those building internal enterprise networks known as intranets. Agencies with bulging Web sites, such as NASA, are using database search software to help surfers find information quickly by typing in keywords. Commercial utility packages from companies such as Architext Software, Fulcrum and Verity can be loaded on Web servers to make intranet explorations as swift as those on the Web. Once text and images are obtained, they can be stored in electronic filing systems, such as Excalibur's EFS Webfile, for easy access.

Alta Vista


Digital Equipment Corp.'s Alta Vista search engine features a database of more than 30 million Web pages.

NEXT STORY: EC Resources