Relevance (information Retrieval) - Clustering and Relevance

Clustering and Relevance

The cluster hypothesis, proposed by C. J. van Rijsbergen in 1979, asserts that two documents that are similar to each other have a high likelihood of being relevant to the same information need. With respect to the embedding similarity space, the cluster hypothesis can be interpreted globally or locally. The global interpretation assumes that there exist some fixed set of underlying topics derived from inter-document similarity. These global clusters or their representatives can then be used to relate relevance of two documents (e.g. two documents in the same cluster should both be relevant to the same request). Methods in this spirit include,

  • cluster-based information retrieval
  • cluster-based document expansion such as latent semantic analysis or its language modeling equivalents. It is important to ensure that clusters – either in isolation or combination – successfully model the set of possible relevant documents.

A second interpretation, most notably advanced by Ellen Voorhees, focuses on the local relationships between documents. The local interpretation avoids having to model the number or size of clusters in the collection and allow relevance at multiple scales. Methods in this spirit include,

  • multiple cluster retrieval
  • spreading activation and relevance propagation methods
  • local document expansion
  • score regularization

Local methods require an accurate and appropriate document similarity measure.

Read more about this topic:  Relevance (information Retrieval)

Famous quotes containing the word relevance:

    Wherever the relevance of speech is at stake, matters become political by definition, for speech is what makes man a political being.
    Hannah Arendt (1906–1975)