Peter Vary
Institut für Nachrichtengeräte und Datenverarbeitung (IND), RWTH Aachen
Abstract
Like in the 1985 comic science fiction film "Back to the Future", the participants of the 2014 ITG Fachtagung Sprachkommunikation will be sent back in time by 30 years. In a smooth passage through the space-time continuum they will find out how the future of digital speech processing and its applications was shaped in the past.
The feasibility of the early concepts of speech coding, echo cancellation and noise reduction was proven with elaborate but bulky multi-DSP systems. These pioneering exercises were groundwork for the forthcoming digital mobile radio system GSM.
Moore’s law of the exponential growth of the processing speed and the memory capacity was expected to be still valid for many years. Thus, the engineers were keen to invent more and more complex combinations of sophisticated signal processing and coding algorithms for incredible appliances as multi-microphone mobile phones, distributed wireless microphone arrays, binaural conferencing equipment or smart phone assisted hearing aids. The vision of smart devices such as Kinect or Google Glass for interaction and gaming with computers by voice and gestures became reality and opened further amazing perspectives.
Short Biography
Peter Vary received the Dipl.- Ing. degree in electrical engineering from the Technical University of Darmstadt, Germany, in 1972 and the Dr.-Ing. degree from the University of Erlangen-Nuernberg, Germany, in 1978. |
Akihiko K. Sugiyama
NEC Information and Media Processing Labs.
Abstract
This talk presents importance and possibilities of phase information in signal enhancement. Phase has not been given as much attention in signal enhancement as its counterpart; i.e. magnitude. This is partially because Wang and Lim experimentally showed in 1982 that except for some conditions, accurate phase does not help improve the SNR. After three decades, signal processing applications have significantly expanded and with these new applications, some of the conditions Wang and Lim did not take into consideration have come up in the spotlight. Two examples of those new applications covered in this talk are impact noise suppression and mechanical noise suppression. Consumer products that need these applications have a huge market today. Different uses of phase information in those applications are presented with audio and real-time PC demonstrations.
Short Biography Akihiko Sugiyama (a.k.a. Ken Sugiyama), affiliated with NEC Information and Media Processing Labs, has been engaged in a wide variety of research projects in signal processing such as audio coding and interference/noise control. His team developed the world's first Silicon Audio in 1994, the ancestor of iPod. He served as Chair of Audio and Acoustic Signal Processing Tech. Committee, IEEE Signal Processing Society (SPS) [2011-2012], as associate editors for several journals such as IEEE Trans. SP [1994-1996], as the Secretary and a Member at Large to the Conference Board of SPS, and as the Chair of Japan Chapter of SPS [2010-2011]. He was a Technical Program Chair for ICASSP2012. He has contributed to 15 chapters of books and is the inventor of over 150 registered patents with more pending applications in the field of signal processing in Japan and overseas. He received 13 awards such as the 2002 IEICE Best Paper Award, the 2006 IEICE Achievement Award, and the 2013 Ichimura Industry Award. He is Fellow of IEEE and IEICE, and a Distinguished Lecturer for IEEE SPS. He is also known as a big host for a total of over 70 internship students. |
Ralf Schlüter
Lehrstuhl für Informatik 6, RWTH Aachen
Abstract
Recently, automatic speech recognition (ASR) underwent a remarkable change. Based on pioneering work on connectionist speech recognition, deep neural networks finally emerged as the primary modeling paradigm, now building the state of the art in acoustic modeling. This presentation will give an overview over various aspects of neural network modeling relevant to ASR.
A discussion of the basic modeling approaches will include an attempt at comparing neural network output as input feature set for standard Gaussian mixture HMMs (tandem approach) to explicitly modeling HMM state distributions using neural nets (hybrid approach). Different network topologies will be investigated w.r.t. ASR, incl. hierarchical, bottleneck, and recurrent neural networks. Optimization methods will be discussed that can cope with the large amount of training data usually faced in ASR. Further, the choice of input features will be considered, down to using the raw input signal in the time domain, as well as feature combination. The discussion of neural network based acoustic modeling will be concluded by a discussion of speaker normalization and adaptation in relation to neural networks.
Finally, also language modeling using neural networks will be discussed, with specific focus on recurrent neural network language models.
Short Biography Ralf Schlüter studied physics at RWTH Aachen University, Germany, and Edinburgh University, UK. He received the Dipl. degree in physics in 1995 and the Dr. rer. nat. degree in computer science in 2000, from RWTH Aachen University. From November 1995 to April 1996, Ralf Schlueter was with the Institute for Theoretical Physics B at RWTH Aachen, where he worked on statistical physics and stochastic simulation techniques. Since May 1996, Ralf Schlueter is with the Computer Science Department at RWTH Aachen University, where he currently is Academic Director and leads the Automatic Speech Recognition group at the Human Language Technology and Pattern Recognition chair. His research interests cover speech recognition, neural network modeling, discriminative training, decision theory, stochastic modeling, and signal analysis. |