News

Military Review, March-April 1997

Meeting the Open-Source Acquisition and Exploitation Challenge

by Colonel Edward F. Dandar Jr., US Army Reserve, Retired

THE INTELLIGENCE COMMUNITY (IC) recently published an assessment of information technology and its impact on the intelligence process. This article summarizes the parts of the assessment that deal with the challenges and opportunities the IC faces due to dramatic increases in the availability and volume of open-source information. Expanding partnerships with industry and academia, effectively exploiting commercially developed information technology, re-engineering the IC organization and more outsourcing of selected IC open-source requirements are necessary for coping with the dynamic information environment.1

Changes in today's world affect how people access and use information. This information revolution goes beyond wide-scale personal computer use to encompass a growing use of wide-area, large- scale computer networks that provide the infrastructure for accessing and sharing information internationally. Although this global information infrastructure is in its infancy, it provides new frameworks and approaches for obtaining information. Corporations must use these new tools to compete effectively, and nations must use them to develop the information infrastructure needed to provide health care, education, economic viability and national security.

Although spreading unevenly throughout the world, this infrastructure enables domestic and cross-border information transmission in minutes. Information does not need a "visa" to enter or leave a state. The global information environment (GIE) is being used increasingly in diplomacy and as an educational tool for academia, governments and activists, as well as for groups whose goals and methods are unorthodox. GIE communication and information technologies can promote new forms of international grass roots cooperation and advocacy or increase social fragmentation within state borders or between countries.

Increased information source availability and accompanying technological developments-highlighted by the "Year of the Internet and Web" in 1995-provide the opportunity and challenge to rethink how information-processing tasks are performed. This rethinking requires vision and decisive leadership over the next few years. Our ability to exploit new information-access technologies and information stores is complicated by having to change how organizations and people do business. The associated re-engineering challenges are as significant within the IC as in society.

The IC must be able to exploit large amounts of disparate information. Analysts face greater work loads due to the increase in issues to address and information to manage-particularly from open sources-and due to resource constraints resulting from a reduced work force and competing tasks. The direct availability of essential information in original languages, replete with cultural and societal perspectives and biases, places even more emphasis on the specialized regional and language training analysts need to rapidly and effectively use information. In addition, they work within a fragmented systems environment with uneven connectivity to resources and widely varying practices, methods and tools for managing, accessing and exploiting information.

In this environment, analysts' requirements go beyond simple information access. They need an integrated information environment where they can seamlessly exploit information repositories, expert knowledge and necessary tools and services. This would facilitate collaboration across the IC, industry and academia and provide a basis for sharing information and disseminating it to consumers and decision makers.

Industry and academia can give IC analysts access to a wider range of open sources and experts, thus improving the information exploitation process. However, even the typical nongovernment information broker's open-source acquisition and exploitation business process, as depicted in Figure 1, must be re-engineered and automated to handle increasingly large volumes of disparate and globally distributed multimedia and multilingual open-source information (OSI).

The rapidly expanding GIE depicted in Figure 2 is challenging the government's information exploitation resources. IC analysts cannot be experts in all political, economic, technological, military operations other than war (OOTW) and major regional contingency (MRC) areas, nor can they master all the languages in which high-quality OSI is quickly accessible. They must get information from a wide variety of sources on a continuous basis to respond in a timely manner to decision makers' critical information needs. Given the preponderance of information sources, the IC analyst does not have the time, expertise or training to continuously and exhaustively collect information on multiple targets of interest.

A critical problem facing IC analysts in the GIE is access to OSI and tools to help them deal with large volumes of information. Today's OSI challenges and shortfalls are aptly described in the Report of the Commission on the Roles and Capabilities of the United States Intelligence Community. The report aptly describes analysts' needs and the status of the current OSI environment:

OSI Acquisition/Exploitation Solutions

Addressing these OSI environment problems depends on effectively monitoring key information technology developments and inserting mature commercial software products into the OSI acquisition/exploitation process.

Government and nongovernment managers can work together to guide the use of the technology and information.3

Commercial software products can help automate current open-source business process functions, such as OSI search management, language translation, analysis and product generation. The exception is automated data base generation. A hybrid man-machine interface still is needed to generate even noncomplex data bases.

The IC is integrating maturing analytical software products from various organizations. Project Pathfinder and its tactical counterpart, Sentinel, are evolutionary, user-driven, Army-sponsored software research and development (R&D) projects pursuing the development of advanced tools for analysts. Pathfinder, a deployed system used by more than 30 IC organizations, enables analysts to translate their requirements into software tools right at their workstations, allowing them to interpret enormous amounts of electronic information. The IC TIPSTER program, an interagency effort begun in 1981, includes at least 15 projects with industry and academia aimed at improving text processing.4 As these R&D investments mature, they will become available in government and commercial off-the-shelf products. In addition, collaborative work and OSI sharing can be enhanced by establishing an IC-wide open-source directory service, which will require adopting IC-wide information access and sharing policies.

Existing commercial solutions offer significant enhancements to operational capabilities through incremental improvements and technology. Where those gains are possible, IC policies must not inhibit improved capabilities.

While the National Foreign Intelligence Program is addressing OSI shortfalls to varying degrees, more IC OSI budget reductions will curtail progress. The IC must continue exploring other OSI acquisition and exploitation alternatives, such as using commercial vendors, military Reservists and universities that can handle the information explosion and can support several IC and military core business areas, as shown in Figure 3.

Outsourcing as a partial solution. Commercial vendors, universities and military Reservists have the background and experience to continuously monitor and receive data to support IC transnational, OOTW and MRC issues. Industry and academic centers have information specialists with expertise on various world regions, cultures and related subjects. These nongovernment analysts can acquire and preprocess OSI to help satisfy many civil, political, law enforcement, economic and military community information requirements.5

US responsiveness to natural and manmade disasters relies heavily on a variety of open sources, especially information from humanitarian relief organizations. OSI from previous or existing IC external research contracts related to a country's or region's national religions, customs, personalities and basic infrastructures-such as food, water and health care availability; communications; transportation; power generation; and distribution systems-is invaluable in obtaining a realistic "picture" of the crisis to guide appropriate action.

Added OSI exploitation capabilities. Two objectives for meeting policy makers' and commanders' needs will be satisfied when incorporating OSI vendors into the intelligence flow:

A thorough understanding of available and useful information sources is essential for meeting OSI research requirements. Many commercial vendors, academics and Reservists maintain their GIE knowledge through memberships in professional organizations dedicated to information research. They network and attend professional, international symposiums, conventions and trade shows to keep abreast of new OSI avenues and to pursue commercial business interests. In addition, these information specialists often build domestic and international networks that can leverage academic and professional contacts.

Industry and university centers maintain contemporary technical libraries with topical reference books, specialized publications and journals from around the world. They rigorously evaluate information sources to minimize bias and unsubstantiated claims that may have been reported or published. These OSI providers also acquire information not readily available through data services by soliciting "nonelectronic" information from sources such as embassies, trade missions and foreign libraries and organizations.

Because focused data acquisition is a fundamental part of their business, these OSI providers know how to acquire gray literature-publicly available information not distributed through normal publishing channels. Examples include academic writings, conference proceedings, trade show literature, video and still imagery reports, marketing research studies, international tender documents and industry-sponsored research. Knowing what information is available and how to obtain it requires a staff experienced in nontraditional research methods with broad commercial contacts.

Foreign Language Challenges

Foreign language open-source documents can be translated by available OSI exploitation vendor language support centers, usually staffed by translators familiar with a myriad of languages and dialects. Some vendor translators have security clearances from other government contracts. Contractors, academics and Reservists represent a large pool of subject-matter expertise and foreign-language capabilities that can be quickly tapped to meet current IC needs.

Processing text from multiple languages is of increasing importance to intelligence analysis. Historically, foreign-language processing required human translators and was limited to languages and domains with high mission priority. Increased access to foreign-language sources, especially on-line open literature, has created new requirements for a whole range of tools. The overall goal is to provide a multilingual text analysis capability for foreign-language information.

Analysts need tools to facilitate handling foreign-language text, especially when the analysts are not language experts. These tools may range from automatic language classification capabilities to identify the source material language, to tailorable information extraction and summarization tools for abstracting foreign-language documents, to presentation tools for handling specialized character sets. Machine translation (MT) capabilities are key to providing wide-ranging language skills and domain expertise to a broad user population.

A number of IC and Department of Defense (DOD) components are researching MT. DOD is doing most of the basic work, and some IC organizations are doing additional research. The IC's Open-Source Information System (OSIS), the Intelink-TS network and, soon, the Intelink-S network all host and maintain MT software that automatically translates text into language pairs, such as Chinese to English, for example. Other DOD agencies are working on developing machine translators for "low-density" languages.

The US Air Force National Air Intelligence Center (NAIC), Wright-Patterson Air Force Base, Ohio, has been using MT for more than 40 years, starting with the world-famous Systran Russian-to-English MT system developed during the Cold War. The system still supports the IC's translation needs. There are now 11 Systran MT systems in use throughout the IC and the US government. They include: Russian, French, German, Spanish, Italian, Portuguese, Japanese, Serbo-Croatian, Chinese and Korean to English and English to Korean. The last three systems are in very early development. In addition, Ukrainian and Cantonese systems will be developed this year, and operational prototypes will be available within two years.

The Systran MT systems no longer require mainframe computers, and the software is available for UNIX computers and computers using DOS/Windows. NAIC owns unlimited rights for free use by US government agencies. Government organizations with appropriate computer systems will soon be able to download certain Windows versions of Systran from the OSIS and Intelink networks. The languages that will be available include Russian, French, German, Spanish, Italian and Portuguese. Shrink-wrapped versions of Systran software are available from the NAIC.6

OSI Strategy

Four levels of effort are necessary to stay abreast of the exponential OSI growth:

One way to implement this strategy while leveraging the strengths and availability of IC and non-IC resources is to divide the four levels between IC personnel and contractors. For example, contractors might provide continuous coverage of events in a particular region or topical area (sustained) and, on occasion, produce special-focus studies (intensified). If large, current data bases were built and maintained by this process, IC personnel could exploit these OSI repositories and other classified sources to respond to the ad hoc (directed) and indications and warning (cued) information requirements.

In acquiring more information above the sustaining level in this approach, customers must be aware of the limitations, especially the time allotted for ad hoc requirements and OSI availability. Availability will be partially based on the level of technology development and potential external information gateways in the country or area of interest. In cases where external access to the country is denied for publications, data bases and on-line vehicles, local access to libraries and other open sources must be used.

Greater access to open-source material puts a premium on critical evaluation and elevates the potential for deliberate and highly sophisticated deception. This problem rests more heavily with the analytical, rather than the hardware and software, OSI systems' components and processes. In addition, access to some open-source data bases may be fragile-quickly denied or lost due to conflict, natural disaster or, simply, system failure.

OSI road maps. OSI road maps identify potentially available information sources in a target country or mission area. The road maps are particularly pertinent to OSI acquisition through the Internet, where they identify relevant uniform resource locators (URLs) and details such as the general information content at each source, an assessment of its general accuracy level and the timeliness of data normally found at each site.

The need for OSI road maps is obvious in this information revolution era. The road map establishes directions for OSI acquisition processing and dissemination of evaluated information on specific IC topics of interest. Road maps become particularly important when planning ahead for international political or military crises, which do not always occur overnight and often are known days, weeks or even months in advance. Developing IC/industry/academic/private expert crisis teams well in advance enables us to better prepare for crises because all information sources and expertise are exploited.

The IC Open-Source Program Office began an initial commercial Internet reconnaissance in October 1995. The main purpose of the project, concluded in April 1996, was to investigate the Internet as a resource for OSI pertaining to Africa and Latin America and to research the availability of commercial data bases containing information on Latin America. The primary deliverables were a directory of bibliographic records of relevant sites and data bases and a report detailing the commercial vendor's research methodology and findings.

The study's importance is that it was one of the first known attempts to broadly explore the vast and inadequately charted Internet as a source of valuable intelligence information. Conducted just at the time the Internet was rapidly expanding its effects, especially in less-developed areas of the world, the study reached three conclusions:

As the study states, "The first conclusion is based on the speed and extent of migration of publishing (especially gray literature) to electronic networked media. The second is based on the fact that [the] Internet is still very volatile in its nature and the search technology is not refined enough for the average user to use it in a cost-effective way. The third is based on the fact that a majority of the effort on this project was [on] developing methods and applying existing standards which would not have to be repeated for further directory development. With some additional application of automated tools to OSI processes, the cost per directory or OSI road map should decrease over the next five years."7

The need for more macro and domain-specific road maps should be pragmatically approached by building on lessons learned, methodology and recommendations.

OSI pilot projects. OSI pilots develop macro or micro information road maps; produce domain-specific products; establish directories of government, industry, academia and other private subject-matter experts; and contribute to re-engineering the OSI business process by automating as much of it as possible by inserting and integrating the best available software.

Figure 4 proposes a new and increasingly automated OSI business process that reuses and enhances information captured in various IC domain repositories through electronic profiling, normalizing, tagging and indexing. The IC information specialist's and/or vendor's role is to postprocess more acquired data by filtering it through available visualization and preanalysis tools. This new process provides the all-source analyst with a pertinent OSI working file. The IC end goal is the rapid access, processing and integration of OSI into timely, all-source products or validated open-source products for public or coalition force use.

The primary intelligence operations within OSI pilots should focus on enhancing one or more of the following components: acquisition (road map strategies), data preprocessing, data manipulation, preanalysis processing, knowledge base development and transmission of results to the all-source analyst(s).8 Pilot projects also provide a controlled way of testing the use of non-IC resources to satisfy IC OSI requirements.

OSI Acquisition/Exploitation Pilot (OAEP) process criteria. Any OAEP can benefit from the lessons learned and criteria used in the Central Intelligence Agency's (CIA's) Project Overture.9 Applicable parameters follow:

OSI vendor, academia and Reservist selection. A prospective OSI vendor or Reservist supporting the IC should have extensive experience with and understanding of the intelligence arena and various world regions, as well as specific topical expertise pertinent to customer needs. These organizations or individuals should have access to trained information specialists and/or intelligence analysts with a wide range of experience in applications software. Individuals must keep current with new developments in data base applications, information retrieval methods, fusion, information validation and collection. Small, high-quality consultant teams and larger vendors should be able to present a list of experts and specific skills available to fulfill tasks.

The following considerations should be addressed in developing a statement of work before employing a vendor to execute an OSI pilot project:

While technology can offer some help to the IC facing a torrent of OSI, non-IC resources can provide essential support in dealing with the open-source acquisition and exploitation issue. The potentials and limitations of concurrently employing technology and OSI vendor solutions can best be determined by a series of carefully designed, controlled, limited and assessed pilot projects.

In addition to implementing the available solutions cited, more focus is needed in guiding the re-engineering of IC analysis, collection and production processes for the 21st century and the retraining of the IC work force away from the lingering Cold War mold to enhanced exploitation of GIE resources. Many intelligence business skills and trainers needed to accomplish this 21st-century transformation are in the commercial and academic arenas. However, the current IC work force culture provides some serious challenges that must be met if we are to adequately meet our customers' information needs. Working closely with our business, academic and Reserve colleagues on OSI road maps and other projects should facilitate the IC work place transformation.

Forging improved government partnerships with industry and academia also should contribute to developing and identifying the "best of breeds" in emerging information technologies. Their timely insertion into IC OSI programs and subsequent evaluation should contribute to establishing broader-based IC OSI strategies and exploitation capabilities. Government, with business and academia, must seize this window of opportunity for leveraging the dynamic GIE into innovative, real-time knowledge bases to maintain its competitive edge in global economics, technology and information.10 MR

NOTES

1. Joint Technical Office Working Group, Intelligence Systems Support Office, Deputy Assistant Secretary of Defense (DASD) for Intelligence and Security (I&S), "An Intelligence Community Information Technology Assessment: Recommendations for the Future" (McLean, VA: The MITRE Corporation, 31 July 1996).

2. Congressional Bipartisan Executive and Legislative Commission on the Roles and Capabilities of the United States Intelligence Community, Preparing for the 21st Century: An Appraisal of US Intelligence, Report of the Commission on the Roles and Capabilities of the United States Intelligence Community (Washington, DC: US Government Printing Office, 1 March 1996), 88-89.

3. An intelligence community (IC)-developed "Technology Navigator" for tracking "information technology areas of interest" is available in "alpha test" mode on the Internet at: http://www.mews.org/jto. Suggestions for improving this service are solicited.

4. Further application of Pathfinder, IC analysts' tool requirements and TIPSTER can be found within the Technology Navigator described in note 3.

5. PSC Inc., "The Paper-Use of Open-Source Vendors (Reston, VA: April 1996); International Trade and Technology Directorate, The MITRE Corporation, interviews by author, McLean, Virginia, 12 and 15 April 1996; and Alan D. Tompkins, Computer Consultants, Hinesburg, Vermont, numerous phone interviews by author, April 1996.

6. All the languages noted are available. IC analysts' exploitation of foreign electronic open sources in available languages can be facilitated by pasting an Internet hypertext markup language page into the OSIS MT system or other MT-equipped system. The entire web page will be returned, translated into English for quick content evaluation. The NAIC, the Federal Intelligent Document Understanding Laboratory and other IC elements are working together to develop optical character reader (OCR) technology that can be integrated with Systran and other MT systems. A government-sponsored Easter Computers Inc. Chinese OCR package is integrated with Systran Chinese. The Cuneiform OCR package, which includes seven Germanic and Romance languages, Russian Cyrillic, Serbian Cyrillic and Croatian roman languages, also has been integrated with Systran, as has E-Typist, a commercial off-the-shelf Japanese OCR package. Another IC organization is developing Arabic and Farsi OCRs. These MT and OCR capabilities are major steps in dealing with the GIE that will become more regionally and linguistically focused.

7. Information International Associates Inc., "COSPO [IC Open-Source Program Office]Africa/Latin American Project: Electronic Sources" (Oak Ridge, TN: 22 April 1996).

8. Computer Sciences Corporation, "OSI Acquisition and Exploitation Pilot," White Paper (Falls Church, VA: 15 March 1996). I also used information from my discussions with several vendors.

9. Additional information on the CIA's Project Overture can be found within the Technology Navigator (see note 3).

10. The author acknowledges specific contributions and editing support from Richard Peze of DASD (I&S) Intelligence Systems Support Office, Alan D. Tompkins of Computer Consultants and Graham H. Turbiville Jr. of the US Army Foreign Military Studies Office. A special thanks to Robert D. Steele of Open-Source Solutions Inc. for his valuable contributions on open-source intelligence. Steele's national and international lectures, symposiums and papers continue to expand the knowledge base for open-source intelligence.

Colonel Edward F. Dandar Jr., USAR, Retired, is an Army Intelligence and Security Command intelligence staff officer, office of the Deputy Assistant Secretary of Defense for Intelligence and Security, Washington, D.C. He received a B.S. from Duquesne University and an M.A. from Georgetown University and is a graduate of the National War College, Defense Systems Management College and Armed Forces Staff College. His past civilian positions include deputy director, Policy and Operations Directorate, office of the deputy chief of staff for Intelligence, Department of the Army, Washington, D.C.; senior Army representative, Intelligence Community's Producers' Council Staff, Washington, D.C.; and deputy director, Army Intelligence Agency, Washington, D.C. His active duty positions included deputy commander of civil affairs (CA) units during Operations Just Cause, Desert Shield, Desert Storm and Provide Comfort. He also served as deputy commander, 5th Psychological Operations Group, USAR, Maryland; and commander, 439th Military Intelligence Detachment (Strategic), USAR, Maryland. During Desert Storm, he commanded the 354th CA Brigade, which was attached to VII Corps.