XHTML+Voice
XHTML+Voice (commonly X+V) is an XML language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of ECMAScript, JavaScript, and XML Events.
Read more about Xhtml+voice.