Dealing With Unstructured Data
Techniques such as data mining and text analytics and noisy-text analytics provide different methods to find patterns in, or otherwise interpret, this information. Common techniques for structuring text usually involve manual tagging with metadata or Part-of-speech tagging for further text mining-based structuring. UIMA provides a common framework for processing this information to extract meaning and create structured data about the information.
Software that creates machine-processable structure exploits the linguistic, auditory, and visual structure inherent in all forms of human communication. Algorithms can infer this inherent structure from text, for instance, by examining word morphology, sentence syntax, and other small- and large-scale patterns. Unstructured information can then be enriched and tagged to address ambiguities and relevancy-based techniques then used to facilitate search and discovery. Examples of "unstructured data" may include books, journals, documents, metadata, health records, audio, video, analog data, files, and unstructured text such as the body of an e-mail message, Web page, or word-processor document. While the main content being conveyed does not have a defined structure, it generally comes packaged in objects (e.g. in files or documents, ...) that themselves have structure and are thus a mix of structured and unstructured data, but collectively this is still referred to as "unstructured data". For example, an HTML web page is tagged, but HTML mark-up typically serves solely for rendering. It does not capture the meaning or function of tagged elements in ways that support automated processing of the information content of the page. XHTML tagging does allow machine processing of elements, although it typically does not capture or convey the semantic meaning of tagged terms.
Since unstructured data commonly occurs in electronic documents, the use of a content or document management system which can categorize entire documents is often preferred over data transfer and manipulation from within the documents. Document management thus provides the means to convey structure onto document collections.
Read more about this topic: Unstructured Data
Famous quotes containing the words dealing with, dealing and/or data:
“The economic dependence of woman and her apparently indestructible illusion that marriage will release her from loneliness and work and worry are potent factors in immunizing her from common sense in dealing with men at work.”
—Mary Barnett Gilson (1877?)
“In my dealing with my child, my Latin and Greek, my accomplishments and my money stead me nothing; but as much soul as I have avails. If I am wilful, he sets his will against mine, one for one, and leaves me, if I please, the degradation of beating him by my superiority of strength. But if I renounce my will, and act for the soul, setting that up as umpire between us two, out of his young eyes looks the same soul; he reveres and loves with me.”
—Ralph Waldo Emerson (18031882)
“Mental health data from the 1950s on middle-aged women showed them to be a particularly distressed group, vulnerable to depression and feelings of uselessness. This isnt surprising. If society tells you that your main role is to be attractive to men and you are getting crows feet, and to be a mother to children and yours are leaving home, no wonder you are distressed.”
—Grace Baruch (20th century)