13 Aug 2015
The 2015 JSALT workshop just wrapped up. I was lucky enough to be invited as a consultant on the probabilistic transcription team. The problem that the team set out to solve is simply stated, but hard to answer: how can you do automatic speech recognition in a language in which you have no transcribed corpora to train your recognizer? Our approach was twofold: first, to use multiple English speakers to transcribe the foreign speech using English orthography, and generate possible transcripts as probability mass functions over the set of foreign phones. Second, we tried to use electrophysiological data to help disambiguate pairs of foreign phones that are likely to get mapped to a single English orthographic representation. We did this by building a classifier for EEG signals that was trained on responses to English speech sounds, testing it on foreign speech sounds, and using the result to generate weights for the foreign-phone-to-English-phoneme confusion matrix. I’m proud to say that we did see gains from both the probabilistic transcripts and the integration of physiological signals into the ASR system, even within the confines of the six-week workshop!