Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.
Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains. An example is the extraction from news wire reports of corporate mergers, such as denoted by the formal relation:
- ,
from an online news sentence such as:
- "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp."
A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context.
Read more about Information Extraction: History, Present Significance, Tasks and Subtasks, World Wide Web Applications, Approaches, Free or Open Source Software and Services
Famous quotes containing the words information and/or extraction:
“I am the very pattern of a modern Major-Gineral,
Ive information vegetable, animal, and mineral;
I know the kings of England, and I quote the fights historical,
From Marathon to Waterloo, in order categorical;”
—Sir William Schwenck Gilbert (18361911)
“Logic is the last scientific ingredient of Philosophy; its extraction leaves behind only a confusion of non-scientific, pseudo problems.”
—Rudolf Carnap (18911970)