Perceptual processes mediating recognition, including the recognition of objects and spoken words, is inherently multisensory. This is true in spite of the fact that sensory inputs are segregated in early stages of neuro-sensory encoding. In face-to-face communication, for example, auditory information is processed in the cochlea, encoded in auditory sensory nerve, and processed in lower cortical areas. Eventually, these “sounds” are processed in higher cortical pathways such as the auditory cortex where it is perceived as speech. Likewise, visual information obtained from observing a talker’s articulators is encoded in lower visual pathways. Subsequently, this information undergoes processing in the visual cortex prior to the extraction of articulatory gestures in higher cortical areas associated with speech and language. As language perception unfolds, information garnered from visual articulators interacts with language processing in multiple brain regions. This occurs via visual projections to auditory, language, and multisensory brain regions. The association of auditory and visual speech signals makes the speech signal a highly “configural” percept. An important direction for the field is thus to provide ways to measure the extent to which visual speech information influences auditory processing, and likewise, assess how the unisensory components of the signal combine to form a configural/integrated percept. Numerous behavioral measures such as accuracy (e.g., percent correct, susceptibility to the “McGurk Effect”) and reaction time (RT) have been employed to assess multisensory integration ability in speech perception. On the other hand, neural based measures such as fMRI, EEG and MEG have been employed to examine the locus and or time-course of integration. The purpose of this Research Topic is to find converging behavioral and neural based assessments of audiovisual integration in speech perception. A further aim is to investigate speech recognition ability in normal hearing, hearing-impaired, and aging populations. As such, the purpose is to obtain neural measures from EEG as well as fMRI that shed light on the neural bases of multisensory processes, while connecting them to model based measures of reaction time and accuracy in the behavioral domain. In doing so, we endeavor to gain a more thorough description of the neural bases and mechanisms underlying integration in higher order processes such as speech and language recognition.
The Processing of Information and Structure. Potomac, MD, Erlbaum. ... IEEE Transactions on Speech and Audio Processing 2(2): 291—298. Gay, T. (1980). ... “A 3D dynamical biomechanical tongue model to study speech motor control.
This book presents the state of the art in recognition in the presence of uncertainty, offering examples that utilize uncertainty information for noise robustness, reverberation robustness, simultaneous recognition of multiple speech ...
"This book introduces the readers to the various aspects of visual speech recognitions, including lip segmentation from video sequence, lip feature extraction and modeling, feature fusion and classifier design for visual speech recognition ...
Auditory and visual speech recognition unfolds in real time and occurs effortlessly for normal hearing listeners.
This book presents the state of the art in recognition in the presence of uncertainty, offering examples that utilize uncertainty information for noise robustness, reverberation robustness, simultaneous recognition of multiple speech ...
... 48 high-context condition 269 high-dimensional pattern recognition 341 hit rate 458 HMM see hidden Markov model homopheneity 169, 173, 177 hue-filtered (HF) image 392 human pattern recognition 79 human-computer interface 153, 239, ...
This book offers an overview of audio processing, including the latest advances in the methodologies used in audio processing and speech recognition.
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, ...
Human speech is conveyed through both acoustic and visual channels and is therefore inherently multi-modal. Further, the two channels are largely complementary in that the acoustic signal typically contains information...
This book presents a summary of the cognitively inspired basis behind multimodal speech enhancement, covering the relationship between audio and visual modalities in speech, as well as recent research into audiovisual speech correlation.