Skip to main content
— Presentation at IEEE SLT 2016 —
January 25, 2017
We participated in the IEEE Workshop on Spoken Language Technology (IEEE SLT) held in San Diego, USA on December 13 to 16, 2016. The IEEE SLT is a biennial international workshop where many state-of-art research achievements are presented in the area of speech recognition, natural language understanding, spoken dialog system, and speech signal processing. One hundred two presentations were accepted in the workshop. The participants enjoyed discussing about their research achievements.
Fig. 1 Presentation poster
We made a presentation titled "Robust utterance classification using multiple classifiers in the presence of speech recognition errors" (Fig. 1). This presentation is about our proposed natural language understanding method which works robustly even if the input sentence from the speech recognizer has misrecognized words.
In recent years, the speech recognition technology has improved drastically. Speech input interfaces have become popular in many devices such as smartphones, robots, and car navigation. In addition, the recent natural language understanding technology enables these devices to know what the user wants to do (e.g. the kind of function the user wants to do, the query words the user wants to search) from the user's various utterances including various words and grammatical expressions. However, it is difficult to avoid speech recognition errors completely. Therefore, it is necessary to establish a natural language understanding method that can understand what the user wants to do correctly whether or not the speech recognition sentence has misrecognized words.
In our study, we focused on a "speech utterance classifier" in the car navigation. Our research objective is to achieve an utterance classifier that can predict the car navigation function which the user wants to do from a speech recognition sentence which may have misrecognized words.
Fig. 2 Proposed method
Our utterance classifier is trained from many training sentences by using a machine learning technique. First, we trained the utterance classifier by both error-free sentences and recognized sentences with errors (Fig. 2). Second, we utilized two speech signals, raw and enhanced, to get different speech recognition results. These speech recognition results are then input to the utterance classifier independently. Third, we utilized not only word information but also phoneme information extracted form a speech recognition sentence to predict a car navigation function. The most confident car navigation function is chosen from outputs of these utterance classifiers.
These methods enable us to maintain high prediction accuracies whether or not the speech recognition sentence has recognition errors. We evaluated our methods for a car navigation, in which the user controls the car navigation with voice input during driving. Experimental results showed our method cuts 55% of prediction errors about the car navigation function from user's utterances.
We plan to improve our methods and to apply them for car navigation systems and robots.
(By HOMMA Takeshi)