A hapax legomenon ( /ˈhæpɨks lɨˈɡɒmənɒn/ also /ˈhæpæks/ or /ˈheɪpæks/; pl. hapax legomena; sometimes abbreviated to hapax, pl. hapaxes) is a word which occurs only once within a context, either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works, even though it occurs more than once in that work. Hapax legomenon is a transliteration of Greek ἅπαξ λεγόμενον, meaning "(something) said (only) once".
The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively refer to double, triple, or quadruple occurrences, but are far less commonly used.
Hapax legomena are quite common, as predicted by Zipf's law, which states that the frequency of any word in a work (corpus) is inversely related to its rank in the frequency table. For large corpora, about 40% to 60% of the words (counting by type) are hapax legomena, and another 10% to 15% are dis legomena. Thus, in the Brown Corpus of American English, about half of the 50,000 words are hapax legomena within that corpus.
Note that hapax legomenon refers to a word's appearance in a body of text and to neither its origin nor its prevalence in speech. It thus differs from a nonce word, which may never be recorded, or which may find currency and may be widely recorded, or which may appear several times in the work which coins it, and so on.
Read more about Hapax Legomenon: Significance, Computer Science