Moby Project - Pronunciator

Pronunciator

The Moby Pronunciator II contains 177,267 words with corresponding pronunciations. The Project Gutenberg distribution also contains a copy of the cmudict v0.3. The file follows the format word pronunciation. The part-of-speech field is used to disambiguate 770 of the words which have differing pronunciations depending on their part-of-speech. For example for the words spelled close, the verb has the pronunciation /ˈkloʊz/, whereas the adjective is /ˈkloʊs/. The parts-of-speech have been assigned the following codes:

Part-of-speech Code
Noun n
Verb v
Adjective aj
Adverb av
Interjection interj

Following this is the pronunciation. Several special symbols are present:

Symbol Meaning
/ Used to separate phonemes
_ Used to separate words
' Primary stress on the following syllable
, Secondary stress on the following syllable

The rest of the symbols are used to represent IPA characters, according to the following table:

Symbol IPA
& æ
- ə
@ ʌ, ə
@r ɜr, ər
A ɑː
aI
Ar ɑr
AU
b b
d d
D ð
dZ
E ɛ
eI
f f
g ɡ
h h
hw hw
i
I ɪ
j j
k k
l l
m m
n n
N ŋ
O ɔː
Oi ɔɪ
oU
p p
r r
s s
S ʃ
t t
T θ
tS
u
U ʊ
v v
w w
z z
Z ʒ

Read more about this topic:  Moby Project