Information Extraction - World Wide Web Applications

World Wide Web Applications

IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people to cope with the enormous amount of data that is available online. Systems that perform IE from online text should meet the requirements of low cost, flexibility in development and easy adaptation to new domains. MUC systems fail to meet those criteria. Moreover, linguistic analysis performed for unstructured text does not exploit the HTML/XML tags and layout format that are available in online text. As a result, less linguistically intensive approaches have been developed for IE on the Web using wrappers, which are sets of highly accurate rules that extract a particular page's content. Manually developing wrappers has proved to be a time-consuming task, requiring a high level of expertise. Machine learning techniques, either supervised or unsupervised, have been used to induce such rules automatically.

Wrappers typically handle highly structured collections of web pages, such as product catalogues and telephone directories. They fail, however, when the text type is less structured, which is also common on the Web. Recent effort on adaptive information extraction motivates the development of IE systems that can handle different types of text, from well-structured to almost free text -where common wrappers fail- including mixed types. Such systems can exploit shallow natural language knowledge and thus can be also applied to less structured text.

Read more about this topic:  Information Extraction

Famous quotes containing the words world, wide and/or web:

    The idealists dream and the dream is told, and the practical men listen and ponder and bring back the truth and apply it to human life, and progress and growth and higher human ideals come into being and so the world moves ever on.
    Anna Howard Shaw (1847–1919)

    Alas for America as I must so often say, the ungirt, the diffuse, the profuse, procumbent, one wide ground juniper, out of which no cedar, no oak will rear up a mast to the clouds! It all runs to leaves, to suckers, to tendrils, to miscellany. The air is loaded with poppy, with imbecility, with dispersion, & sloth.
    Ralph Waldo Emerson (1803–1882)

    Thou blind man’s mark, thou fool’s self-chosen snare,
    Fond Fancy’s scum and dregs of scattered thought,
    Band of all evils, cradle of causeless care,
    Thou web of will whose end is never wrought;
    Desire! desire, I have too dearly bought
    With price of mangled mind thy worthless ware;
    Sir Philip Sidney (1554–1586)