My research started in industry back in 1991 and continued in various companies (check my patents...) but apart from a short period in 2000/2001, my only 'open' or 'public' research dates from 2006 onwards. This work has spanned the areas of speech communications, enhancement, intelligibility, quality, hearing, joint audio/video quality, ASR, dialogue, whisper speech and various other languages, including Mandarin Chinese.

I have had the great pleasure to have worked with some good people, excellent researchers, and wonderful students who have my eternal gratitude. Obviously, most of the work on this page would not be here without the efforts of my many students over the years, so thank you to them (both the many who are named on this page and the few who are unnamed here).


For a general speech/hearing overview, please see the Cambridge University Press book:
McLoughlin I
, Applied Speech and Audio Processing, Cambridge University Press, Jan. 2009, ISBN 978-0521519540


Some of my major contributions relating to speech research:



Breakdown of my current research (with paper references – note that the best papers are highlighted in blue)


Speech communications-related research

The Bionic Voice Project & Whisper-to-speech conversion

This aims to return the power of speech to laryngectomy patients who have lost the ability to speak normally. Please click on the link above to go to the separate project page for this work.





Whisper-to-speech conversion continues the Bionic voice work, because laryngectomy speech is a class of whisper. We are actually working on a number of techniques:

  • CELP based methods (which we pioneered)

  • GMM based systems (see Toda et.al. and please also see our Oct. 2014 Electronics Letters paper for a refined method based on Toda's statistical voice conversion method)

  • New parametric methods, including an interesting Sinewave Speech-based system that is showing some promise at the moment: find out more about that and even download the MATLAB code for it from here:
    http://www.lintech.org/Reconstruction.



**EXPECT MORE HERE SOON** Look out for another paper soon which combines all three techniques mentioned above!

  • Li J.-J., McLoughlin I, Dai L.-R., Ling Z.-H., “Whisper-to-speech conversion using restricted Boltzmann machine arrays”, accepted for IET Electronics Letters, Oct. 2014

  • Li Jingjie, McLoughlin I, Song Y, “Reconstruction of pitch for whisper-to-speech conversion of Chinese”, ISCSLP2014, Singapore, Sept. 2014

  • McLoughin I, Lee Jingjie, Song Yan ,”Reconstruction of continuous voiced speech from whispers”, Interspeech 2013 Lyon, France

  • Sharifzadeh HR, McLoughlin I, Ahmadi F, “Reconstruction of Normal Sounding Speech for Laryngectomy Patients through a Modified CELP Codec”, IEEE Trans. Biomedical Engineering, issue 99, June 2010, DOI=10.1109/TBME.2010.2053369

  • Sharifzadeh HR, McLoughlin I, “Bionic voice for laryngectomees with an insight into whispered vowels”, Cutting Edge Laryngology 2012 Conference, Kuala Lumpur, Malaysia, 01 June 2012.

  • H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Regeneration of speech in voice-loss patients”, IFMBE Proceedings, Vol. 23, pp.1065-1068. ISBN 978-3-540-92840-9

  • H.R.Sharifzadeh, McLoughlin I, F. Ahmadi, “Speech Rehabilitation Methods for Laryngectomised Patients,” Electronic Engineering and Computing Technology, Springer, Netherlands, April 2010, pp. 597-607

  • Sharifzadeh HR, McLoughlin I, ”Reconstruction of normal sounding speech for laryngectomy patients through a modified CELP codec”, EPS International Forum on Rehabilitation Medicine, Nanjing, China, July 2011.

  • Sharifzadeh HR, McLoughlin I, Ahmadi F, “Artificial Phonation for Patients Suffering Voice Box Lesions”, International Conference on Bioengineering, Singapore, Jan 2011.

  • Sharifzadeh HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of Whispered Speech Based on Probability Mass Function”, accepted for The Sixth Advanced International Conference on Telecommunications, AICT2010, Barcelona, Spain, May 2010.

  • Sharifzadeh HR, F. Ahmadi, McLoughlin I, “Speech reconstruction in post-laryngectomised patients by formant manipulation and pitch profile generation”, World Congress on Engineering, London, UK, July 2009. Received best paper award

  • Sharifzadeh HR, McLoughlin I, Ahmadi F, “Regeneration of Speech in Voice-Loss Patients”, The 13th International Conference on Biomedical Engineering, Singapore, Dec. 2008.

  • Sharifzadeh HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of Whispered Speech Based on Probability Mass Function”, accepted for The Sixth Advanced International Conference on Telecommunications, AICT2010, Barcelona, Spain, May 2010.

  • Ahmadi F, McLoughlin I, Sharifzadeh HR, “Analysis-by-Synthesis Method for Whisper-Speech Reconstruction”, 2008 IEEE Asia Pacific Conference on Circuits and Systems, APCCAS 2008, Macau, Nov. 2008.

Mandarin Chinese speech coding, quality and intelligibility

I studied Mandarin Chinese quite intensively from around 1991 to 2003, and then gradually folded this into my ongoing speech research. There are three main focus areas to this work:

  • The intelligibility of Mandarin Chinese (including defining an effective subjective quality evaluation methodology)

  • The relationship between speech coders (plus other speech devices) and Chinese

  • Chinese tone – in all its aspects!

**EXPECT MORE HERE SOON**

  • McLoughlin I, Xu Y, Song Y, “Tone confusion in spoken and whispered Mandarin Chinese”, ISCSLP2014, Singapore, Sept. 2014

  • McLoughlin I, Ding ZQ, Tan EC, "Extension of proposal of standard for intelligibility tests of Chinese Speech - CDRT-Tone”, IEE Proceedings - Vision, Image & Signal Processing, Vol.150, Issue. 1, Feb. 2003.

  • McLoughlin I, Ding ZQ, Tan EC, "Intelligibility evaluation of GSM coder for Mandarin speech using CDRT", Speech Communication, Vol.38, issue 1-2, September 2002

  • Li Z, Tan EC, McLoughlin I, Teo T. T, "Proposal of standards for intelligibility tests of Chinese speech", IEE Proceedings - Vision, Image & Signal Processing, Vol. 147, No. 3, Jun 2000, pp. 254-260, UK.

  • McLoughlin I , Ding ZQ, "Evaluation of the GSM speech coder using the proposed Chinese Diagnostic Rhyme test speech intelligibility measure", 19th International Conference on Computer Processing of Oriental Languages, Seoul, Korea, Journal of Computer Processing of Oriental Languages, Vol. 1, pp. 421-424, May 2001.

  • McLoughlin I, Ding ZQ, "Mandarin Speech Coding using a Modified RPE_LTP Technique", 2000 IEEE Asia Pacific Conference on Circuits and Systems, 04 Dec 2000, Tianjin, China, IEEE, No. 245.

  • McLoughlin I, Ding ZQ, "Variable Rate Coding Techniques for Mandarin Speech over Packet Networks", 9th IEEE Digital Signal Processing Workshop, Texas USA, 05 Oct 2000, USA, IEEE, No. 074.

  • Chong FL, McLoughlin I, Pawlikowski K, “A methodology for improving PESQ accuracy for Chinese speech”, IEEE TENCON2005, Melbourne, Australia, Nov. 2005

  • Fong Chong, McLoughlin I, Pawliowski Krysztoff, “Evaluation of ITU-T G.728 as a voice over IP codec for Chinese speech”, Australian Telecomms. and Networking Apps. Conference, Melbourne, December 2003

  • Ding, ZQ, Tan EC, McLoughlin I and Shi D, “Evaluation of Mandarin and English speech intelligibility using CDRT and DRT”, International Coordinating Committee on Speech Databases and Speech I/O System Assessment, Singapore, Nov. 2003.

Work on speech quality and intelligibility

This was a collaboration with a top NTU researcher, Prof. Lin Weisi. The original idea was to work on joint audio-video quality evaluation. This work, by him and his student, covers the audio part quite well: it uses SVM to define a working single-ended objective quality test which correlates well with subjective evaluations.

  • Manish N, Lin WS, McLoughlin I, Emmanuel S, Chia, LT “Nonintrusive Quality Assessment of Noise Suppressed Speech with Mel-Filtered Energies and Support Vector Regression”, IEEE Transactions on Audio, Speech and Language Processing, (accepted Oct 2011)

Computer speech input-related research

Dialogue for human-computer interaction

This is for smart-home computer research... basically the idea to have natural speech interaction with a computer. This is much more than simply a working ASR system.

Lack of this kind of consideration is why Apple's Siri or Google Voice have not (yet) taken over the world!

  • McLoughin I, Xie ZP, “Speech Playback Geometry for Smart Homes”, IEEE Int. Symp. On Consumer Electronics Jeju, Korea, 21 June 2014

  • McLoughlin I, H. R. Sharifzadeh, “Speech Recognition for Smart Homes”, in Speech Recognition, by ITech Book publishers, Vienna, Austria, ISBN 978-9537619299 (invited chapter)

  • McLoughlin I, Sharifzadeh H. R., “Speech Recognition Engine Adaptions for Smart Home Dialogues”, 6th Int. Conf. on Information, Communications and Signal Processing, Singapore, Dec.2007.

Lip/mouth state detection

This is an extremely promising new area arising out of the low-frequency ultrasonic research. It allows a voice activity detector (VAD) or speech activity detector (SAD) to be defined which is extremely robust to acoustic noise, including the presence of single and multi-speaker babble.

  • McLoughlin I, “Super-audible Voice Activity Detection”, IEEE Trans. Audio, Speech and Language Processing, vol. 22, no. 8, Sept. 2014, pp. 1424-1433

  • McLoughlin I, Song Y, “Mouth State Detection From Low-Frequency Ultrasonic Reflection”, Journal of Circuits, Systems & Signal Processing, (final acceptance Sept. 2014)

  • Ahmadi F, McLoughin I. “Human Mouth State Detection Using Low Frequency Ultrasound”, Interspeech 2013, Lyon, France

Speech analysis and related research

Whispers

This again stems from the Bionic Voice project (above). In this case, we are analysing whisper speech – something that was not covered well in existing research literature – as a precursor to methods of converting whisper to speech. In effect, we need to know exactly what we are dealing with in whispers.

Recently, this research has merged with the Chinese speech work shown below:

  • Sharifzadeh HR, McLoughlin I, “Whisper Vowel Diagrams for Singapore English”, 9th International Conference on Communications, COMM 2012, Bucharest, Romania, 12 June 2012.

  • McLoughlin I, Sharifzadeh HR, “Toward a comprehensive vowel space for whispered speech”, The 7th International Symposium on Chinese Spoken Language Processing, Tainan, Taiwan, Dec. 2010.

Chinese speech

I love working on Chinese speech!! Look a few boxes above for my work on Chinese speech coding and tone! Now I'm concentrating on whispers and Chinese.



**EXPECT MORE HERE SOON** Look out for another paper soon on Chinese tone

  • McLoughlin I, Xu Y, Song Y, “Tone confusion in spoken and whispered Mandarin Chinese”, ISCSLP2014, Singapore, Sept. 2014

  • McLoughlin I, “Vowel Intelligibility in Chinese”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 1, Jan 2010, pp.117-125,

  • McLoughlin I, “Subjective intelligibility testing of Chinese speech”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, Issue 1, pp.23-33, 2008.

  • McLoughlin I, “Tone Discrimination in Mandarin Chinese”, 14th International Conference on systems, Signals and Image Processing IWSSIP 2007 and 6th EURASIP Conference Focused on Speech and Image Processing, Multimedia Communications and Services EC-SIPMCS 2007, June 2007, Maribor, Slovenia

Low-frequency ultrasonic speech

This also stems from the Bionic Voice project. In this case it is mainly the work of Farzaneh Ahmadi (with help from Hamid Reza Sharifzadeh and others... including quite a few students I hired to assist her in the experiments!)

So far, this research has spawned some very robust VAD techniques (see “Lip/mouth state detection”), but more is coming soon!

  • McLoughlin I, “The Use of Low-Frequency Ultrasound for Voice Activity Detection”, Proc. Interspeech 2014, Singapore, Sept. 2014

  • Ahmadi F, McLoughlin I, Sharifzadeh HR, “Ultrasonic propagation through the human vocal tract”, Electronics Letters, Vol. 46, Issue 6, April 2010, pp. 387-388.

Line Spectral Pairs (aka line spectral frequencies)

My PhD topic.

No longer an active research area for me.

  • McLoughlin I, “A Review: Line Spectral Pairs”, Signal Processing Journal, doi:10.1016/j.sigpro.2007.09.003

  • McLoughlin I, Chance R.J, "LSP analysis and processing for speech coders", IEE Electronics Letters, Vol. 33, No. 12, May 1997, pp. 743-744, IEE, UK.

  • McLoughlin I, Hui F, "Novel Dynamic Bit Allocation Method for LSP Quantization", 2000 TENCON, IEEE Region 10 Conference, 25 Sep 2000, Malaysia, IEEE, No. 079.

  • McLoughlin I, Fang Hui, "Adaptive bit allocation for LSP parameter quantization", 2000 IEEE Asia-Pacific Conference on Circuits and Systems, 04 Dec 2000, Tianjin, China, IEEE, No. 231.

  • McLoughlin I, "Switched-basis LSP quantization", Second International Conference on Information, Communications and Signal Processing, Dec 1999, Singapore.

  • McLoughlin I, "LSP parameter interpretation for speech coders", IEEE International Conference on Electronics, Circuits and Systems, Sep 1999, Pafos, Cyprus, IEEE.

  • McLoughlin I, Chance R.J., "LSP-based speech enhancement", 13th International Conference on DSP, Jul 1999, Santorini, Greece, IEE/IEEE.

Hearing-related research

Hearing-related research

Back in 1999 I read an IEEE magazine article about image and video steganography (data hiding inside an image). It made me wonder if it could be done in audio too... So I defined some projects to find out. I was funded for this by the Singapore government.

One project was to design and build embedded hardware able to demonstrate the technique. The hardware simply inserted the data into the least significant bits of the audio data stream – the number of bits was adjustable. This was done using FPGA.

The second project, purely computation, did something similar, but used a psychoacoustic model of my own design to 'decide' which bits could be used for insertion. This worked extremely well, and was evaluated by an expert listening panel with various kinds of sound and music as being imperceptible in use.

**EXPECT MORE HERE SOON** Now I'm working on bio-mimetic computer hearing systems. Two papers already submitted on this.

  • Tio C M M, McLoughlin I, Adi R W, "Perceptual audio data concealment and watermarking scheme using direct frequency domain substitution", IEE Proceedings - Vision, Image & Signal Processing, Vol.149, Issue 6, Dec. 2002.

  • McLoughlin I, Adi RW, Tio M.M. Cedric, "Hardware architecture for LSB data concealment using subband filterbank coding", 2000 IEEE Asia-Pacific Conference on Circuits and Systems, 04 Dec 2000, Tianjin, China, IEEE, No. 249.

  • McLoughlin I, Tio M M Cedric, Adi RW, "Data concealment in audio using a nonlinear frequency distribution of PRBS coded data and frequency-domain LSB insertion", 2000 TENCON IEEE Region 10 Conference, 27 Sep 2000, Malaysia, IEEE, No. 085.

  • McLoughlin I, Adi RW, Cedric M M Tio, "Hardware architecture for data concealment using sub-band coding,LSB coding and pseudo-random bit stream generators", 2000 TENCON IEEE Region 10 Conference, 26 Sep 2000, Malaysia, IEEE, No. 084.

Other speech-related things that don't fit above


  • Jiang B, Song Y, Wei S., Liu J.-H., McLoughlin I, Dai L.-R., “Deep Bottleneck Features for Spoken Language Identification”, PLoS-One, 9(7), e100795, 1 July 2014

  • Jiang B, Song Y, Wei S., McLoughlin I and Dai L.-R., “Task-aware Deep Bottleneck Features for Spoken Language Identification”, Proc. Interspeech 2014, Singapore, Sept. 2014

  • Meng-Ge Wang, Song Y, Jiang B, Li-Rong Dai, Ian McLoulghlin, “Exemplar Based Language Recognition Method For Short-Duration Speech Segments”, 2013 International Conference on Acoustics, Speech and Signal Processing, ICASSP Vancouver, Canada

  • Ahmadi F, McLoughlin, I, Chauhan S, Ter-Haar G., “Bio-effects and safety of low intensity, low frequency ultrasonic exposure”, Progress in Biophysics & Molecular Biology, (accepted Jan 23 2012).

  • Ahmadi F, McLoughin I. “A new mechanical index for gauging the human bio-effects of low frequency ultrasound”, The 35th annual international conference of the IEEE Engineering in Medicine and Biology Society Osaka, Japan, June 2013

  • Rajapakse J, Loh V K, McLoughlin I, Wei L, "Performance Comparison of ICA Neural Networks Separating Audio Signals", International Conference on Information, Communication, and Signal Processing, Dec 1999, Singapore.

  • McLoughlin I, Ding ZQ, "Joint time-Frequency Distribution Analysis of Pitch Pulses", 2001 Int. Conf. on Integrated Circuits and Systems, ICICS, Singapore, IEEE, Oct. 2001.

  • McLoughlin I, Ding ZQ, Tan Eng Chong, "How to trackpitch pulses? - Joint time-frequency distribution approach", Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM'01), Victoria, B.C., Canada, Sept. 2001.

  • Wei Zeng, Xianfeng Huang , Stefan M uller Arisona, McLoughlin I, “Classifying watermelon ripeness by analysing acoustic signals using mobile devices”, Accepted 08 Jul 2013 for Personal and Ubiquitous Computing Journal



© 2013, 2014 by Professor Ian McLoughlin of NELSLIP and USTC.