Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction.
Due to the difficulty of the problem, current approaches to IE focus on narrowly restricted domains. An example is the extraction from news wire reports of corporate mergers, such as denoted by the formal relation:
- ,
from an online news sentence such as:
- "Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp."
A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow logical reasoning to draw inferences based on the logical content of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context.
Read more about Information Extraction: History, Present Significance, Tasks and Subtasks, World Wide Web Applications, Approaches, Free or Open Source Software and Services
Famous quotes containing the words information and/or extraction:
“Many more children observe attitudes, values and ways different from or in conflict with those of their families, social networks, and institutions. Yet todays young people are no more mature or capable of handling the increased conflicting and often stimulating information they receive than were young people of the past, who received the information and had more adult control of and advice about the information they did receive.”
—James P. Comer (20th century)
“Logic is the last scientific ingredient of Philosophy; its extraction leaves behind only a confusion of non-scientific, pseudo problems.”
—Rudolf Carnap (18911970)