The Internet: Resource or Quagmire??
by Chief Warrant Officer Four Alan D. Tompkins, USAR
Is the Internet a useful and productive resource for
an intelligence analyst or is it merely an interesting world of
time wasters and diversions? News stories, magazine articles and
conversations about the Internet bombard us on a daily basis.
Friends tell us about the new version of a Web browser they are
using. They pass on interesting uniform resource locator (URL)
addressses. Students "surf the net" to find information needed for
school reports. Exciting as all that may seem, should a
professional intelligence analyst working in an environment marked
by budget constraints and limited resources devote time and energy
to exploring the Internet?
As often happens, the answer to that question contains some good
news and some bad news. Beyond that, effective use of the Internet
is not a simple matter. As is often said, "the devil is in the
The Good News
The Internet provides access to a huge and varied collection of
open source information (OSI). OSI contributes to the analysis
process by providing an understanding of the broad framework of a
situation. Dr. Joseph Nye's jigsaw puzzle analogy explains the
relationship between OSI and the classified disciplines. In
paraphrase, he has noted that
Open source intelligence provides the outer pieces of the
jigsaw puzzle, without which we can neither begin or complete the
puzzle. But they are not sufficient of themselves. The precious
inner pieces of the puzzle, often the most difficult and most
expensive to obtain, come from the traditional intelligence
disciplines. Open source intelligence is the critical foundation
for the all-source intelligence product, but it cannot ever replace
the totality of the all-source effort.1
The Internet is a network of networks of computers,
which in the last two years, has exploded in size. Estimating the
actual number of people and organizations connected to the Internet
is a difficult task but probably at least 30 million individuals
and 40,000 networks are connected, and these numbers are expanding
rapidly. Certainly, some of the opinions, observations, and
publications of a group this size can be both interesting and
useful sources of OSI.
The quantity of data available on the Internet is enormous. One
estimate is that the total content is from 2 to 10 terabytes. A
terabyte is a million megabytes, or one million, million
characters. By comparison, a typical public library with 300,000
books has about 3 terabytes of data.2
The contents of the Internet range from newspaper and journal
archives and scholarly papers to messages exchanged between rock
star fans. Many companies, governments, and organizations publish
information on the Internet. Thousands of newsgroups organized by
topic contain messages and discussions about almost every aspect of
the human condition; from pickle recipes to heated discussions of
cultural and political issues. Because publishing information on
the Internet is inexpensive when compared with traditional print
methods, organizations, military units, and governments
increasingly use the Internet to provide information to their
customers, members and citizens.
In summary, the Internet can be a useful, readily available and
inexpensive resource that can provide a significant part of the
total information required to produce all-source intelligence
products. For areas and topics not normally targetted by classified
means of collection, open sources in general and the Internet in
particular can provide a broad, continually updated view to the
The Bad News
It is important to note, however, that the Internet is just one
source of OSI. Thousands of databases are available from commercial
providers such as LEXIS/NEXIS, Reuters, and Dialog. These
commercial sources are often better organized and indexed than
material available on the Internet, but they are also more
Writing in 1993, Admiral William Studeman, then Deputy Director of
Central Intelligence stated
We have identified some 8,000 commercial databases and the
vast majority have potential intelligence value. The number of
worldwide periodicals has grown from 70,000 in 1972 to 116,000 last
year. The explosion of open source information is most apparent in
the Commonwealth of Independent States [the former Soviet Union],
where today, there are some 1,700 newspapers that were not
published three years ago.3
It is also important to understand that despite rapid
growth in the availability of OSI in digital, on-line format,
probably less than 10 percent of all OSI is available in that
format. The remainder exists in printed, hard-copy formats: books,
newspapers, journals and reports.
As mentioned above, the total estimated content of the Internet is
roughly equivalent to two or three typical public libraries each
containing 300,000 volumes. However, the largest OSI repository in
the world is the U.S. Library of Congress which has more than 107
million books, newspapers, journals, microforms, other special
format materials in 470 languages, and approximately 200,000
current periodicals, 80,000 of which are in foreign languages.
Unlike a traditional library, the contents of the Internet are
constantly changing. Sites appear and disappear and the contents of
individual sites can change daily. A University of Colorado study
published in 1992 when content on the Internet was far less
dynamic, found that the average life of a document on the network
was only 44 days. Currently, document life is probably much
Also, unlike libraries or commercial databases, the Internet is not
maintained by professional Librarians. There is no standard subject
list like the standard Library of Congress subject headings.
Different terms describe similar topics on different sites.
Searching for information on the Internet requires skill,
persistence, and powerful searching tools.5
The challenge in making effective use of the Internet is finding
the needle in the haystack. OSI on the Internet is so diverse,
scattered, and voluminous that it can easily overwhelm the analyst.
Effective use of OSI demands skilled analysts following clearly
defined search plans and using increasingly sophisticated
Small wonder that a group of senior Marine Corps officers, led by
the Assistant Commandant of the Corps, visited the New York Stock
Exchange recently to learn how brokers absorb, process and transmit
the vast quantities of perishable information that are the
lifeblood of the financial markets.6
So, the bad news about the Internet is that it contains only a portion of the total available OSI
its contents change very rapidly, and it is not very well organized or indexed.
Searching the Internet can easily be like a trip to the shopping mall to buy a pair of shoelaces.
You start out
by knowing what you want, but there are distractions along the way. First, you smell pizza, so
you buy a slice.
Then, there are some interesting books on display in the book store, so you browse for a while.
After that, you
buy some frozen yogurt and a T-shirt with a catchy phrase on the back. After an hour and a half,
you hurry back
to work with a slightly upset stomach, a few little packages, and no shoelaces.
The Internet has become something of an information Nirvana, an enormous, multicultural
library that is open
all day, every day, to ordinary computer users. Taking advantage of this amazing institution,
some savvy. Users need to know not only how to get in the door but also which aisle to choose
from in a maze
of thousands that will take them to the document they seek.
Clearly defined and managed collection plans are essential. Also, a stable, well-understood
set of searching,
storage and retrieval tools is necessary to ensure high analyst productivity. Because of the high
activity and growth on the Internet, new tools are released on almost a an daily basis. It is
tempting to try
each new program that becomes available. If the analyst's tools change continuously, more effort
will be spent
in understanding tools than in collecting information.
Major Mats Bjoere of the Swedish Army has published an excellent paper that outlines the main
elements of a well
managed and organized OSI collection effort.7 It contains a number of lessons learned in running
The Internet does contain information useful to the all-source analyst. It can provide the
critical outer pieces
of the jigsaw puzzle described by Dr. Nye. However, it is only one part of the vast world of OSI
and it can only
be exploited successfully and efficiently by well-trained analysts using a proven tool set
defined collection plans.
1. Dr. Joseph Nye, the Chairman of the National Intelligence Council, speaking to members of
the Security Affairs Support Association at Fort Meade, Maryland, on 24 April 1993.
2. Rajiv Chandrasekaran, "In California, Creating a Web of the Past," The Washington Post, 22
September 1996, H1.
3. Admiral William O. Studeman, "Teaching the Giant to Dance: Contradictions and
in Open Source Information within the Intelligence Community," American Intelligence Journal,
4. Ibid, H14.
5. Ed Krol, The Whole Internet User's Guide & Catalog, (Canada:O'Reilly & Associates, Inc.),
Second Edition, 1994, 235.
6. Eliot A. Cohen, "A Revolution in Warfare," Foreign Affairs, March/April 1996, 43.
7. Mats Bjoere, Major, Six Years of OSI, http://www.eajardines.com/mats2.html.
Mr. Tompkins is currently a consultant providing advice on training and custom
software development. He is also a reservist with the 434th Military Intelligence Detachment
(Strategic) in New Haven, Connecticut. Mr. Tompkins has taught software design methodology
at Beijing University and worked as a systems engineer with International Business Machines.
He has a bachelor of arts degree in Russian Studies from Yale College and a master of science
degree in Computer Science from Rensselaer Polytechnic Institute. Readers can
contact him at 76314.2052@ compuserve.com or (802) 862-2240.