So where is the rest of the information?

Part II. The Invisible Web

By Sue Eipert
2002

Originally published in: Crime Scene: Newsletter of the Northwest Association of Forensic Scientists, 28(3) 2002. Links verified and updated February 2004.

(Go to Part I)

Without a doubt, the World Wide Web has transformed the way we store and find information, and search engines such as Google can provide excellent access to pertinent information. But, as reported in Part I of this series, there is much information in the world that is either not on the Web, or not directly accessible through a general search engine. The major categories of information beyond this "surface web" are:

The Invisible Web can be defined as information on the World Wide Web that is not available through general-purpose search engines, and thus not easily accessible by most Web searchers. The complete explanation of why this Invisible Web exists is beyond the scope of this article; the various technical, economic and social reasons for its existence are covered very well in The Invisible Web: Uncovering Information Sources Search Engines Can’t See, by Sherman and Price (Sherman and Price, 2001). The discussion here will be limited to one of the most important types of data on the Web that is invisible to search engines—the information in databases.

Much of the most detailed and authoritative information on the Web is found in the form of databases with Web interfaces. Only a direct interaction with the database via a query using this Web form will access the contents of the database. A search engine cannot penetrate it. Because information in Invisible Web databases is so voluminous and of such high quality, a serious information seeker cannot afford to overlook it.

A report by BrightPlanet (Bergman) stated that the 60 largest sites on the deep Web (an alternate term for sites with information not found by search engines) "contain about 750 terabytes of information–sufficient by themselves to exceed the size of the surface Web forty times." As an example, one of the largest web sites found by Bergman is RTK Net (http://www.rtk.net/), the Right-to-Know Network. It provides access to many government databases on the environment, including CERCLIS (Superfund sites and data) and the Toxic Release Inventory (EPA’s database of releases of toxic chemicals from manufacturing facilities). The vast amount of extremely useful data within this database will not be found through a search engine.

Aside from its sheer volume, by its very nature the information in Invisible Web databases is of great potential value. In general, databases contain very specialized information, are created by experts, or come from authoritative sources. Some are the equivalent of everyday paper reference materials such as the phone book. For finding a phone number or address, a specialized directory such as Anywho.Com (http://www.anywho.com) or Infospace (http://www.infospace.com)

The following are some examples of Invisible Web databases that could be useful in forensic science:

Real-time databases are also part of the Invisible Web. Two examples are: Flight Tracker (http://www.trip.com/trs/trip/flighttracker/flight_tracker_home.xsl) [Link update: use http://www.cheaptickets.com/trs/cheaptickets/flighttracker/flight_tracker_home.xsl?airlines=all], for tracking commercial airline flights, and AIRNow (http://www.epa.gov/airnow/where/index.html), for finding current local air quality from the EPA.

At this point a searcher with a specific question might wonder how to find a relevant Invisible Web database in order to query it. General search engines can be appropriate for finding a database, even though they wouldn’t be able to find the specific information within that database. One good strategy is to search in very general terms for a type of resource rather than for the precise information needed. For example, a Google search for "federal judges directory" results in finding a link to the Federal Judges Biographical Database at http://air.fjc.gov/history/jabout_frm.html. This database’s specific query system can then be used to find specific facts such as a list of federal judges who have been terminated because of impeachment and conviction.

Another way to find useful Invisible Web databases is through the following directories, which are collections of links chosen by information professionals and organized into topics.

The high quality information in Invisible Web databases is invisible to search engines, but not to searchers who know how to find it.

References:

Bergman, Michael K., The Deep Web: Surfacing Hidden Value, BrightPlanet White Paper. Retrieved from http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp on August 20, 2002. [Link update: page moved to http://www.brightplanet.com/technology/deepweb.asp.]

Sherman, Chris and Gary Price, 2001, The Invisible Web: Uncovering Information Sources Search Engines Can't See, Medford, NJ: CyberAge Books.

About the author:

Sue Eipert provides business and scientific research services, using professional proprietary databases as well as the visible and invisible Web to fulfill the information needs of clients, including engineering companies, environmental consultants, forensics professionals, expert witnesses, manufacturers and Internet e-commerce companies.

Eipert Information Services
seipert@eipertinfo.com

Copyright © 2000-2008 Sue Eipert