Vector Space Model - Example: Tf-idf Weights

Example: Tf-idf Weights

In the classic vector space model proposed by Salton, Wong and Yang the term specific weights in the document vectors are products of local and global parameters. The model is known as term frequency-inverse document frequency model. The weight vector for document d is, where


w_{t,d} = \mathrm{tf}_{t,d} \cdot \log{\frac{|D|}{|\{d' \in D \, | \, t \in d'\}|}}

and

  • is term frequency of term t in document d (a local parameter)
  • is inverse document frequency (a global parameter). is the total number of documents in the document set; is the number of documents containing the term t.

Using the cosine the similarity between document dj and query q can be calculated as:

In a simpler Term Count Model the term specific weights do not include the global parameter. Instead the weights are just the counts of term occurrences: .

Read more about this topic:  Vector Space Model

Famous quotes containing the word weights:

    I have often inquired of myself, what great principle or idea it was that kept this Confederacy so long together. It was not the mere matter of the separation of the colonies from the mother land; but something in that Declaration giving liberty, not alone to the people of this country, but hope to the world for all future time. It was that which gave promise that in due time the weights should be lifted from the shoulders of all men, and that all should have an equal chance.
    Abraham Lincoln (1809–1865)