The Internet: Resource or Quagmire??

by Chief Warrant Officer Four Alan D. Tompkins, USAR

Is the Internet a useful and productive resource for an intelligence analyst or is it merely an interesting world of time wasters and diversions? News stories, magazine articles and conversations about the Internet bombard us on a daily basis. Friends tell us about the new version of a Web browser they are using. They pass on interesting uniform resource locator (URL) addressses. Students "surf the net" to find information needed for school reports. Exciting as all that may seem, should a professional intelligence analyst working in an environment marked by budget constraints and limited resources devote time and energy to exploring the Internet?
As often happens, the answer to that question contains some good news and some bad news. Beyond that, effective use of the Internet is not a simple matter. As is often said, "the devil is in the details."

The Good News

The Internet provides access to a huge and varied collection of open source information (OSI). OSI contributes to the analysis process by providing an understanding of the broad framework of a situation. Dr. Joseph Nye's jigsaw puzzle analogy explains the relationship between OSI and the classified disciplines. In paraphrase, he has noted that
Open source intelligence provides the outer pieces of the jigsaw puzzle, without which we can neither begin or complete the puzzle. But they are not sufficient of themselves. The precious inner pieces of the puzzle, often the most difficult and most expensive to obtain, come from the traditional intelligence disciplines. Open source intelligence is the critical foundation for the all-source intelligence product, but it cannot ever replace the totality of the all-source effort.1
The Internet is a network of networks of computers, which in the last two years, has exploded in size. Estimating the actual number of people and organizations connected to the Internet is a difficult task but probably at least 30 million individuals and 40,000 networks are connected, and these numbers are expanding rapidly. Certainly, some of the opinions, observations, and publications of a group this size can be both interesting and useful sources of OSI.
The quantity of data available on the Internet is enormous. One estimate is that the total content is from 2 to 10 terabytes. A terabyte is a million megabytes, or one million, million characters. By comparison, a typical public library with 300,000 books has about 3 terabytes of data.2
The contents of the Internet range from newspaper and journal archives and scholarly papers to messages exchanged between rock star fans. Many companies, governments, and organizations publish information on the Internet. Thousands of newsgroups organized by topic contain messages and discussions about almost every aspect of the human condition; from pickle recipes to heated discussions of cultural and political issues. Because publishing information on the Internet is inexpensive when compared with traditional print methods, organizations, military units, and governments increasingly use the Internet to provide information to their customers, members and citizens.
In summary, the Internet can be a useful, readily available and inexpensive resource that can provide a significant part of the total information required to produce all-source intelligence products. For areas and topics not normally targetted by classified means of collection, open sources in general and the Internet in particular can provide a broad, continually updated view to the all-source analyst.

The Bad News

It is important to note, however, that the Internet is just one source of OSI. Thousands of databases are available from commercial providers such as LEXIS/NEXIS, Reuters, and Dialog. These commercial sources are often better organized and indexed than material available on the Internet, but they are also more expensive. Writing in 1993, Admiral William Studeman, then Deputy Director of Central Intelligence stated
We have identified some 8,000 commercial databases and the vast majority have potential intelligence value. The number of worldwide periodicals has grown from 70,000 in 1972 to 116,000 last year. The explosion of open source information is most apparent in the Commonwealth of Independent States [the former Soviet Union], where today, there are some 1,700 newspapers that were not published three years ago.3
It is also important to understand that despite rapid growth in the availability of OSI in digital, on-line format, probably less than 10 percent of all OSI is available in that format. The remainder exists in printed, hard-copy formats: books, newspapers, journals and reports.
As mentioned above, the total estimated content of the Internet is roughly equivalent to two or three typical public libraries each containing 300,000 volumes. However, the largest OSI repository in the world is the U.S. Library of Congress which has more than 107 million books, newspapers, journals, microforms, other special format materials in 470 languages, and approximately 200,000 current periodicals, 80,000 of which are in foreign languages. Unlike a traditional library, the contents of the Internet are constantly changing. Sites appear and disappear and the contents of individual sites can change daily. A University of Colorado study published in 1992 when content on the Internet was far less dynamic, found that the average life of a document on the network was only 44 days. Currently, document life is probably much shorter.4
Also, unlike libraries or commercial databases, the Internet is not maintained by professional Librarians. There is no standard subject list like the standard Library of Congress subject headings. Different terms describe similar topics on different sites. Searching for information on the Internet requires skill, persistence, and powerful searching tools.5 The challenge in making effective use of the Internet is finding the needle in the haystack. OSI on the Internet is so diverse, scattered, and voluminous that it can easily overwhelm the analyst. Effective use of OSI demands skilled analysts following clearly defined search plans and using increasingly sophisticated computerized tools.
Small wonder that a group of senior Marine Corps officers, led by the Assistant Commandant of the Corps, visited the New York Stock Exchange recently to learn how brokers absorb, process and transmit the vast quantities of perishable information that are the lifeblood of the financial markets.6 So, the bad news about the Internet is that it contains only a portion of the total available OSI resources, its contents change very rapidly, and it is not very well organized or indexed.

The Details

Searching the Internet can easily be like a trip to the shopping mall to buy a pair of shoelaces. You start out by knowing what you want, but there are distractions along the way. First, you smell pizza, so you buy a slice. Then, there are some interesting books on display in the book store, so you browse for a while. After that, you buy some frozen yogurt and a T-shirt with a catchy phrase on the back. After an hour and a half, you hurry back to work with a slightly upset stomach, a few little packages, and no shoelaces. The Internet has become something of an information Nirvana, an enormous, multicultural library that is open all day, every day, to ordinary computer users. Taking advantage of this amazing institution, however, requires some savvy. Users need to know not only how to get in the door but also which aisle to choose from in a maze of thousands that will take them to the document they seek.
Clearly defined and managed collection plans are essential. Also, a stable, well-understood set of searching, storage and retrieval tools is necessary to ensure high analyst productivity. Because of the high level of activity and growth on the Internet, new tools are released on almost a an daily basis. It is tempting to try each new program that becomes available. If the analyst's tools change continuously, more effort will be spent in understanding tools than in collecting information. Major Mats Bjoere of the Swedish Army has published an excellent paper that outlines the main elements of a well managed and organized OSI collection effort.7 It contains a number of lessons learned in running an OSI operation.


The Internet does contain information useful to the all-source analyst. It can provide the critical outer pieces of the jigsaw puzzle described by Dr. Nye. However, it is only one part of the vast world of OSI and it can only be exploited successfully and efficiently by well-trained analysts using a proven tool set following clearly defined collection plans.


1. Dr. Joseph Nye, the Chairman of the National Intelligence Council, speaking to members of the Security Affairs Support Association at Fort Meade, Maryland, on 24 April 1993.
2. Rajiv Chandrasekaran, "In California, Creating a Web of the Past," The Washington Post, 22 September 1996, H1.
3. Admiral William O. Studeman, "Teaching the Giant to Dance: Contradictions and Opportunities in Open Source Information within the Intelligence Community," American Intelligence Journal, Spring/Summer 1993, 19.
4. Ibid, H14.
5. Ed Krol, The Whole Internet User's Guide & Catalog, (Canada:O'Reilly & Associates, Inc.), Second Edition, 1994, 235.
6. Eliot A. Cohen, "A Revolution in Warfare," Foreign Affairs, March/April 1996, 43.
7. Mats Bjoere, Major, Six Years of OSI,
Mr. Tompkins is currently a consultant providing advice on training and custom software development. He is also a reservist with the 434th Military Intelligence Detachment (Strategic) in New Haven, Connecticut. Mr. Tompkins has taught software design methodology at Beijing University and worked as a systems engineer with International Business Machines. He has a bachelor of arts degree in Russian Studies from Yale College and a master of science degree in Computer Science from Rensselaer Polytechnic Institute. Readers can contact him at 76314.2052@ or (802) 862-2240.