The intelligence community’s TIPSTER program involves at least 15 projects with industry and academia aimed at improving text processing capabilities. The TIPSTER Text Program is a DARPA led government effort to advance the state of the art in text handling technologies an to deploy the resulting advanced capability into the workplace rapidly. The Architecture has been designed to meet a large number of text handling requirements for CIA, DIA, and NSA. The TIPSTER Architecture has been designed to enable a variety of different text applications to use a set of common text processing modules. However, it meets only those requirements having to do with Document Detection and Information Extraction functions. Requirements for other functions, such as Machine Translation or Optical Character Recognition must be met outside the TIPSTER Architecture.
As of 1996 several TIPSTER applications were either underway or in the planning process. The most significant of these included PRIDES, ADEPT, NDIC Pilot, and Hookah. The TIPSTER User Interface Toolkit, or TUIT, is a toolkit for producing multilingual text.
In its efforts to improve document processing and make it more easily and inexpensively available to a variety of users, TIPSTER initially focused on two underlying technologies. Document Detection is the capability to locate documents containing the type of information the user wants from either a text stream or a store of documents; the automatic marking of all personal or organizational names within a text, or the extraction of information to fill a biographical database are types of information extraction. included under the efforts in this area. Information Extraction is the capability to locate specified information within a text; thus, the automatic marking of all personal or organizational names within a text, or the extraction of information to fill biographical database are types of information extraction.
During the first phase of TIPSTER research efforts, the participants made major advances in creating the algorithms for document detection and information extraction and in improving the techniques for measuring those advances, through activities such as the Message Understanding Conferences (MUC) and the Text Retrieval Conferences (TREC). Document Detection technologies improved Recall from roughly 30% to as high as 75% and the improvement in the processing of natural language queries was also significant. Improvements in Information Extraction produced increases in Recall from roughly 49% to 65% and in Precision from 55% to 59%, and dramatic gains were made in the ability to automatically identify a wide range of items such as names (both personal and organizational), dates, locations, times, phone numbers, etc.
During the second phase (April 1994-September 1996), the TIPSTER community turned its attention to the development of a software architecture in order to standardize the technology components, enable "plug and play" capabilities among the various tools being developed, and permit the sharing of software amount the various participants. Based on feedback from the researchers, developers, and users of the existing prototype and implementation systems, the architecture continues to evolve.
The overall accuracy of the highest-performing the Tipster Text program systems that have been developed so far has been measured on news stories at 96% on the relatively simple task of extracting isolated organization and person names (including acronyms, novel names, multiword names, etc.), at 80% on the task of merging various types of information about each organization or person that is mentioned in a given text into a single output, and at 56% on the difficult task of identifying events of interest in a given text and merging various pieces of information about each event into a single output.
Since its beginning in 1991, the TIPSTER Program has sponsored multiple efforts to advance text handling technologies and deploy the resulting advanced capabilities into the workplace. The Defense Advanced Research Projects Agency (DARPA), the Department of Defense (DoD) and the Central Intelligence Agency (CIA) have jointly funded and managed the program, in close collaboration with the National Institute of Standards and Technology (NIST) and Naval Command Control Ocean Surveillance Center (NCCOSC). A TIPSTER advisory Board was recently formed with members representing users from other Government agencies interested in automated text processing, such as Department of Energy (DOE), Federal Bureau of Investigation (FBI), Internal Revenue Service (IRS), National Science Foundation (NSF), Treasury Department and other Government agencies.