Is Outsourcing
the Answer?

Acquiring and exploiting open source information is a challenge partially solved through partnerships and perseverance

By Edward F. Dandar Jr.

     Commercial vendors, universities, and military reservists have the background and experience to accomplish several actions. They can perform continuous data monitoring and acquisition tasks in a responsive manner which supports intelligence community trans-national, military operations other-than-war, and major regional conflict issues. Industry and academic centers have information specialists with expertise on various regions of the world and subject areas. These non-government analysts can acquire and pre-process open source information which will help satisfy many civil, political, law enforcement, economic and military community information requirements.
     Using these analysts through an outsourcing program may be a partial solution to acquiring and exploiting open source information. It is one of several avenues being studied this past year by an intelligence community working group charged with producing an information technology assessment. The author studied various commercial businesses, drawing information from several sources. Additionally, several information gathering discussions were held with defense contractors. A review was completed of some small business open source contract efforts by several small businesses. The intelligence community continues to study ways to take advantage of these non-government open-source acquisition and exploitation assets.
     U.S. responsiveness to natural and man-made disasters relies heavily on a variety of open sources. Humanitarian relief organizations provide valuable information. Open source information from previous or existing intelligence community external research contracts also help to obtain a realistic "picture" of the crisis and taking appropriate action. This "picture" includes information on a country’s or region’s national cultures (religions, customs), personalities, and basic infrastructures (food, water, medical, communications, transportation, critical supplies, power generation, and distribution systems).
     Two objectives for meeting the needs of policy makers and commanders will be satisfied when incorporating open source information vendors into the flow of intelligence. First, in the near term, they can assist in fulfilling short-suspense contingency requirements through accessing, filtering, and maintaining on-call source data. Secondly, they can provide strategic-level open source research to alert the intelligence community about activities that indicate an abnormal or potentially alarming situation.
     To meet open source information research requirements, one must first understand its availability and utility. Many commercial vendors, academics, and reservists maintain their current knowledge of the global information environment by holding membership in professional organizations dedicated to information research. They network and attend professional, international symposiums, conventions and trade shows. These activities help them stay abreast of new avenues of open source information and to pursue commercial business interests. In addition, these information specialists have created both domestic and international networks (academic and professional) which can be leveraged.
     Industry and university centers maintain contemporary technical libraries with reference books, specialized publications and journals from around the world. They rigorously evaluate information sources to minimize bias and unsubstantiated facts which may have been reported or published. The open source information providers also acquire information not readily available through data services. They also solicit information from non-electronic sources such as embassies, trade missions, foreign libraries and organizations.
     Because focused data acquisition is a fundamental part of their business, the open source information providers are experienced at acquiring gray literature (publicly available information which is not distributed through normal publishing channels). Examples of gray literature are academic writings, conference proceedings and trade show literature, video and still imagery reports, marketing research studies, international tender documents, and industry-sponsored research. Knowing what information is available and obtaining it requires a staff experienced in nontraditional research methods with a broad base of commercial contacts.

Foreign Language Hurdles

     Foreign language open source documents can be translated by the open source information exploitation vendor’s language support centers where available. Translators who cover several languages and dialects normally staff the centers. Contractors, academics, and reservists represent a large pool of both subject matter expertise and foreign language capabilities which can be tapped quickly to meet current intelligence community needs.
     The need to process text from multiple languages is increasingly important to intelligence analysis. Historically, foreign language processing needed human translators and was constrained by languages and domains with high mission priority. Increased access to foreign language sources, especially on-line open source literature, has created requirements for a range of tools to handle multiple languages. The overall goal is to provide a multilingual text analysis capability for foreign language information.
     Tools must be developed to facilitate analysts’ handling of foreign language text in multilingual environments, especially when analysts may not be language experts. These tools may range from automatic language classification capabilities to identify the source materials’ language, to tailorable information extraction and summarization tools for abstracting foreign language documents. The range can extend to presentation tools for handling specialized character sets as well. Machine translation capabilities are key to supporting a broad user population with wide-ranging language skills and domain expertise.
     A number of components in the intelligence community and DoD are working on machine translation research. DoD personnel are completing the majority of the basic work while various intelligence community organizations are performing additional work.
     The hosting and maintenance of machine translation software, which automatically translates text into language pairs (e.g., Chinese-English), is available on the intelligence community’s Open Source Information System and the Intelink-TS network. It will be available shortly on the Intelink-S network. This machine translation capability has been a major success story for the U.S. Air Force’s National Air Intelligence Center at Wright-Patterson Air Force Base, Ohio. Other DoD agencies are developing machine translators for "low-density" languages.
     Officials at the National Air Intelligence Center have been engaged in machine translation for over 40 years. They began with the world-famous Systran Russian to English machine translation system which was developed during the Cold War and continues to support today’s intelligence community translation needs. There are 11 Systran machine translation systems in use throughout the U.S. Government. They are: Russian to English, French to English, German to English, Spanish to English, Italian to English, Portuguese to English, Japanese to English, Serbo-Croatian to English, Chinese to English, Korean to English and English to Korean. The last three systems are in very early development stages. Officials will begin developing Ukrainian and Cantonese this year and host operational prototypes within two years.
     The Systran machine translation systems no longer require main frame computers. The software is available for UNIX and DOS/Windows. The National Air Intelligence Center owns unlimited rights for free use by U.S. Government agencies. Soon, U.S. Government organizations with appropriate computer systems can download certain windows versions of Systran from the Open Source Information System and Intelink networks. Systems which can be downloaded include Russian, French, German Spanish, Italian and Portuguese.
     Shrink-wrapped versions of Systran software are available from Dale Bostad at the National Air Intelligence Center. Direct questions regarding machine translation capabilities and software access to NAIC/DXLT, Dale Bostad ([email protected])or call (513) 257-6538 or DSN 787-6538 or FAX: 656-1669. The request for software is sent to Systran Software Inc., logged on a government database and immediately sent to the requester. All languages noted above are available.
     Intelligence community analysts can exploit foreign electronic open sources in the above available languages. Analysts can exploit the source by pasting an Internet hypertext markup language page into the Open Source Information System machine translation system or other machine translation equipped U.S. system. The entire web page will be translated into English and returned for quick evaluation of its content. Contact Bostad at the address above for the latest procedures for accomplishing electronic machine translation on the fly.
     The National Air Intelligence Center, along with the Federal Intelligent Document Understanding Laboratory and other intelligence community members, are working together to develop optical character reader technology which integrates Systran and other machine translation systems. A government-sponsored Eastern Computers Incorporated Chinese optical character reader package is now integrated with Systran Chinese. The Cuneiform optical character reader package which includes seven Germanic and Romance languages, Russian Cyrillic, Serbian Cyrillic and Croatian roman languages has been integrated with Systran. E-Typist, a commercial off-the-shelf Japanese optical character reader package also has been integrated with Systran Japanese. Direct inquiries on the status of machine translation or optical character reader capabilities to Bostad at the address given previously.
     Another intelligence community organization is developing Arabic and Farsi optical character readers. These machine translation and optical character reader capabilities are major steps forward in dealing with the global information environment which is expected to become more regionally and linguistically focused.
     Editor’s Note: Part III of this series of articles will examine a possible open source information strategy for dealing with the flood of open source material.

     Ed Dandar is a civilian employee of the Army Intelligence and Security Command and is assigned to the Deputy Assistant Secretary of Defense for Intelligence and Security, Intelligence Systems Support Office. Comments can be provided to him at: [email protected]


Go to Journal Contents

   Last Updated: May 29, 1997