Skip to main content


Corporate InformationResearch & Development

January 20, 2016

Report from Presenter

Automatic speech recognition is essential for robot interaction systems and call center analytics. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) is an international workshop held every two years on speech recognition technologies. Our team attends the meeting every time from 2011 to develop state-of-the-art methods. ASRU 2015 was held in Scottsdale, Arizona, USA, on December 13-17, 2015. The workshop was mainly sponsored by Microsoft, Amazon, and Google and brought together researchers from academia and industry to discuss the new problems and solutions in automatic speech recognition and understanding.

Fig. 1 Proposed method

In the meeting, we participated in the third CHiME challenge (CHiME3), which is the speech recognition challenge event. The task is focused on English speech recognition in public noisy environments (bus, café, street, and pedestrian area) using a tablet device which have a 6-channel microphone array. This task was very difficult that the recognition accuracy with the standard beam-forming based speech separation method and the state-of-the-art deep neural network based speech recognition technique was only 66.6%.

We proposed the unified system incorporating three kinds of noise-robust speech recognition methods developed in Hitachi as shown in figure 1. The first method is the local Gaussian model based speech source separation. The second one is noise-robust feature extraction. The third one is the word hypothesis selection. The evaluation results show that the proposed system reduced the recognition error significantly and achieved 88.2%.

In the CHiME3, we used standard approaches for acoustic modeling and language modeling. Other research institutes proposed the improvement in both acoustic modeling and language modeling and achieved further improved result. We plan to apply those new methods together with our proposed system for developing the state-of-the-art speech recognition system and applications for robots and call center systems in a timely manner.

(By FUJITA Yusuke)

Related Papers

  • Y. Fujita, R. Takashima, T. Homma, R. Ikeshita, Y. Kawaguchi, T. Sumiyoshi, T. Endo, and M. Togami, "Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection," in Proc. IEEE ASRU, 2015.
  • Page top