Speech
communications-related research

|
The Bionic
Voice Project
& Whisper-to-speech conversion
This
aims to return the power of speech to laryngectomy patients who
have lost the ability to speak normally. Please click on the link
above to go to the separate project page for this work.
Whisper-to-speech conversion
continues the Bionic voice work, because laryngectomy speech is a
class of whisper. We are actually working on a number of
techniques:
CELP based methods
(which we pioneered)
GMM based systems (see
Toda et.al. and please also see our Oct. 2014 Electronics Letters
paper for a refined method based on Toda's statistical voice
conversion method)
New parametric methods,
including an interesting Sinewave Speech-based system that is
showing some promise at the moment: find out more about that and
even download the MATLAB code for it from
here: http://www.lintech.org/Reconstruction.
**EXPECT
MORE HERE SOON** Look out
for another paper soon which combines all three techniques
mentioned above!
|
Li
J.-J., McLoughlin I, Dai L.-R., Ling Z.-H., “Whisper-to-speech
conversion using restricted Boltzmann machine arrays”, accepted
for IET Electronics Letters, Oct. 2014
Li
Jingjie, McLoughlin I, Song Y, “Reconstruction of pitch for
whisper-to-speech conversion of Chinese”, ISCSLP2014,
Singapore, Sept. 2014
McLoughin
I, Lee Jingjie, Song Yan ,”Reconstruction of continuous voiced
speech from whispers”, Interspeech 2013 Lyon, France
Sharifzadeh
HR, McLoughlin I, Ahmadi F, “Reconstruction of Normal Sounding
Speech for Laryngectomy Patients through a Modified CELP Codec”,
IEEE Trans. Biomedical Engineering, issue 99, June 2010,
DOI=10.1109/TBME.2010.2053369
Sharifzadeh
HR, McLoughlin I, “Bionic voice for laryngectomees with an
insight into whispered vowels”, Cutting Edge Laryngology 2012
Conference, Kuala Lumpur, Malaysia, 01 June 2012.
H.R.Sharifzadeh,
McLoughlin I, F. Ahmadi, “Regeneration of speech in voice-loss
patients”, IFMBE Proceedings, Vol. 23, pp.1065-1068. ISBN
978-3-540-92840-9
H.R.Sharifzadeh,
McLoughlin I, F. Ahmadi, “Speech Rehabilitation Methods for
Laryngectomised Patients,” Electronic Engineering and Computing
Technology, Springer, Netherlands, April 2010, pp. 597-607
Sharifzadeh
HR, McLoughlin I, ”Reconstruction of normal sounding speech for
laryngectomy patients through a modified CELP codec”, EPS
International Forum on Rehabilitation Medicine, Nanjing, China,
July 2011.
Sharifzadeh
HR, McLoughlin I, Ahmadi F, “Artificial Phonation for Patients
Suffering Voice Box Lesions”, International Conference on
Bioengineering, Singapore, Jan 2011.
Sharifzadeh
HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of
Whispered Speech Based on Probability Mass Function”, accepted
for The Sixth Advanced International Conference on
Telecommunications, AICT2010, Barcelona, Spain, May 2010.
Sharifzadeh
HR, F. Ahmadi, McLoughlin I, “Speech reconstruction in
post-laryngectomised patients by formant manipulation and pitch
profile generation”, World Congress on Engineering, London, UK,
July 2009. Received best paper award
Sharifzadeh
HR, McLoughlin
I,
Ahmadi F, “Regeneration of Speech in Voice-Loss Patients”,
The 13th International
Conference on Biomedical Engineering,
Singapore, Dec. 2008.
Sharifzadeh
HR, McLoughlin I, Ahmadi F, “Spectral Enhancement of
Whispered Speech Based on Probability Mass Function”, accepted
for The Sixth Advanced International Conference on
Telecommunications, AICT2010, Barcelona, Spain, May 2010.
Ahmadi
F, McLoughlin
I,
Sharifzadeh HR, “Analysis-by-Synthesis Method for
Whisper-Speech Reconstruction”, 2008 IEEE Asia Pacific
Conference on Circuits and Systems, APCCAS 2008, Macau, Nov.
2008.
|
Mandarin Chinese speech
coding, quality and intelligibility
I
studied Mandarin Chinese quite intensively from around 1991 to
2003, and then gradually folded this into my ongoing speech
research. There are three main focus areas to this work:
The
intelligibility of Mandarin Chinese (including defining an
effective subjective quality evaluation methodology)
The
relationship between speech coders (plus other speech devices)
and Chinese
Chinese
tone – in all its aspects!
**EXPECT
MORE HERE SOON**
|
McLoughlin
I, Xu Y, Song Y, “Tone confusion in spoken and whispered
Mandarin Chinese”, ISCSLP2014, Singapore, Sept. 2014
McLoughlin
I, Ding ZQ, Tan EC, "Extension of proposal of standard for
intelligibility tests of Chinese Speech - CDRT-Tone”, IEE
Proceedings - Vision, Image & Signal Processing, Vol.150,
Issue. 1, Feb. 2003.
McLoughlin
I, Ding ZQ, Tan EC, "Intelligibility evaluation of GSM coder
for Mandarin speech using CDRT", Speech Communication,
Vol.38, issue 1-2, September 2002
Li
Z, Tan EC, McLoughlin I, Teo T. T, "Proposal of standards
for intelligibility tests of Chinese speech", IEE
Proceedings - Vision, Image & Signal Processing, Vol. 147,
No. 3, Jun 2000, pp. 254-260, UK.
McLoughlin
I , Ding ZQ, "Evaluation of the GSM speech coder using the
proposed Chinese Diagnostic Rhyme test speech intelligibility
measure", 19th International Conference on Computer
Processing of Oriental Languages, Seoul, Korea, Journal of
Computer Processing of Oriental Languages, Vol. 1, pp. 421-424,
May 2001.
McLoughlin
I, Ding ZQ, "Mandarin Speech Coding using a Modified RPE_LTP
Technique", 2000 IEEE Asia Pacific Conference on Circuits
and Systems, 04 Dec 2000, Tianjin, China, IEEE, No. 245.
McLoughlin
I, Ding ZQ, "Variable Rate Coding Techniques for Mandarin
Speech over Packet Networks", 9th IEEE Digital Signal
Processing Workshop, Texas USA, 05 Oct 2000, USA, IEEE, No. 074.
Chong
FL, McLoughlin I, Pawlikowski K, “A
methodology for improving PESQ accuracy for Chinese speech”,
IEEE TENCON2005, Melbourne, Australia, Nov. 2005
Fong
Chong, McLoughlin I, Pawliowski Krysztoff, “Evaluation of ITU-T
G.728 as a voice over IP codec for Chinese speech”, Australian
Telecomms. and Networking Apps. Conference, Melbourne, December
2003
Ding,
ZQ, Tan EC, McLoughlin I and Shi D, “Evaluation of Mandarin and
English speech intelligibility using CDRT and DRT”,
International Coordinating Committee on Speech Databases and
Speech I/O System Assessment, Singapore, Nov. 2003.
|
Work on speech quality
and intelligibility
This
was a collaboration with a top NTU researcher, Prof. Lin Weisi.
The original idea was to work on joint audio-video quality
evaluation. This work, by him and his student, covers the audio
part quite well: it uses SVM to define a working single-ended
objective quality test which correlates well with subjective
evaluations.
|
Manish
N, Lin WS, McLoughlin I, Emmanuel S, Chia, LT “Nonintrusive
Quality Assessment of Noise Suppressed Speech with Mel-Filtered
Energies and Support Vector Regression”, IEEE Transactions on
Audio, Speech and Language Processing, (accepted Oct 2011)
|
Computer
speech input-related research

|
Dialogue for
human-computer interaction
This
is for smart-home computer research... basically the idea to have
natural speech interaction with a computer. This is much more than
simply a working ASR system.
Lack
of this kind of consideration is why Apple's Siri or Google Voice
have not (yet) taken over the world!
|
McLoughin
I,
Xie ZP,
“Speech Playback Geometry for Smart Homes”, IEEE Int. Symp.
On Consumer Electronics Jeju, Korea, 21 June 2014
McLoughlin
I, H. R. Sharifzadeh, “Speech Recognition for Smart Homes”,
in Speech
Recognition,
by ITech Book publishers, Vienna, Austria, ISBN 978-9537619299
(invited
chapter)
McLoughlin
I, Sharifzadeh H. R., “Speech Recognition Engine Adaptions for
Smart Home Dialogues”, 6th
Int.
Conf. on Information, Communications and Signal Processing,
Singapore, Dec.2007.
|
Lip/mouth state detection
This
is an extremely promising new area arising out of the
low-frequency ultrasonic research. It allows a voice activity
detector (VAD) or speech activity detector (SAD) to be defined
which is extremely robust to acoustic noise, including the
presence of single and multi-speaker babble.
|
McLoughlin
I, “Super-audible Voice Activity Detection”, IEEE Trans.
Audio, Speech and Language Processing, vol. 22, no. 8, Sept.
2014, pp. 1424-1433
McLoughlin
I, Song Y, “Mouth State Detection From Low-Frequency Ultrasonic
Reflection”, Journal of Circuits, Systems & Signal
Processing, (final acceptance Sept. 2014)
Ahmadi
F, McLoughin I. “Human Mouth State Detection Using Low
Frequency Ultrasound”, Interspeech 2013, Lyon, France
|
Speech
analysis and related research

|
Whispers
This
again stems from the Bionic Voice project (above). In this case,
we are analysing whisper speech – something that was not covered
well in existing research literature – as a precursor to methods
of converting whisper to speech. In effect, we need to know
exactly what we are dealing with in whispers.
Recently,
this research has merged with the Chinese speech work shown below:
|
McLoughlin
I,
Xu Y, Song Y, “Tone confusion in spoken and
whispered Mandarin Chinese”, ISCSLP2014, Singapore, Sept. 2014
Sharifzadeh
HR, McLoughlin I, Russell M, “A
comprehensive vowel space for whispered speech”, J. Voice,
2011, (accepted late 2011)
Sharifzadeh
HR, McLoughlin I, “Whisper Vowel Diagrams for Singapore
English”, 9th International Conference on Communications, COMM
2012, Bucharest, Romania, 12 June 2012.
McLoughlin
I, Sharifzadeh HR, “Toward a comprehensive vowel space for
whispered speech”, The 7th International Symposium on Chinese
Spoken Language Processing, Tainan, Taiwan, Dec. 2010.
|
Chinese speech
I love working on Chinese
speech!! Look a few boxes above for my work on Chinese speech
coding and tone! Now I'm concentrating on whispers and Chinese.
**EXPECT
MORE HERE SOON** Look out
for another paper soon on Chinese tone
|
McLoughlin
I,
Xu Y, Song Y, “Tone confusion in spoken and whispered Mandarin
Chinese”, ISCSLP2014, Singapore, Sept. 2014
McLoughlin
I, “Vowel
Intelligibility in Chinese”, IEEE Transactions on Audio,
Speech and Language Processing, Vol. 18, No. 1, Jan 2010,
pp.117-125,
McLoughlin
I,
“Subjective
intelligibility testing of Chinese speech”, IEEE
Transactions on Audio, Speech and Language Processing, Vol. 16,
Issue 1, pp.23-33, 2008.
McLoughlin
I,
“Tone Discrimination in Mandarin Chinese”, 14th
International Conference on systems, Signals and Image Processing
IWSSIP 2007 and
6th
EURASIP Conference Focused on Speech and Image Processing,
Multimedia Communications and Services EC-SIPMCS 2007, June 2007,
Maribor, Slovenia
|
Low-frequency ultrasonic
speech
This
also stems from the Bionic Voice project. In this case it is
mainly the work of Farzaneh Ahmadi (with help from Hamid Reza
Sharifzadeh and others... including quite a few students I hired
to assist her in the experiments!)
So
far, this research has spawned some very robust VAD techniques
(see “Lip/mouth state detection”), but
more is coming soon!
|
McLoughlin
I, “The Use of Low-Frequency Ultrasound for Voice Activity
Detection”, Proc. Interspeech 2014, Singapore, Sept. 2014
Ahmadi
F, McLoughlin I, Sharifzadeh HR, “Ultrasonic propagation
through the human vocal tract”, Electronics Letters, Vol. 46,
Issue 6, April 2010, pp. 387-388.
Dyball
H. (Ed): Ahmadi F, McLoughlin I, Sharifzadeh HR, “Audible
ultrasonic speech”, a feature article for IET Electronics
Letters, 18 March 2010.
Ahmadi
F, McLoughlin I,, “Measuring
resonances of the vocal tract using frequency sweeps at the
lips”, Int. Symp. Comms., Control and Signal Processing,
Rome, Italy, May 2012.
Ahmadi
F, McLoughlin I, Sharifzadeh HR, "Autoregressive
modelling for linear prediction of ultrasonic speech",
INTERSPEECH2010, Japan, Sept. 2010.
F.
Ahmadi, McLoughlin I, “Ultrasonic mapping of human vocal tract
for speech synthesis”, chapter in the book Recent
Advances in Signal processing,
ISBN 978-953-7619-41-1
|
Line Spectral Pairs (aka
line spectral frequencies)
My
PhD topic.
No
longer an active research area for me.
|
McLoughlin
I, “A Review: Line
Spectral Pairs”, Signal Processing Journal,
doi:10.1016/j.sigpro.2007.09.003
McLoughlin
I, Chance R.J, "LSP analysis and processing for speech
coders", IEE Electronics Letters, Vol. 33, No. 12, May 1997,
pp. 743-744, IEE, UK.
McLoughlin
I, Hui F, "Novel Dynamic Bit Allocation Method for LSP
Quantization", 2000 TENCON, IEEE Region 10 Conference, 25
Sep 2000, Malaysia, IEEE, No. 079.
McLoughlin
I, Fang Hui, "Adaptive bit allocation for LSP parameter
quantization", 2000 IEEE Asia-Pacific Conference on Circuits
and Systems, 04 Dec 2000, Tianjin, China, IEEE, No. 231.
McLoughlin
I, "Switched-basis LSP quantization", Second
International Conference on Information, Communications and
Signal Processing, Dec 1999, Singapore.
McLoughlin
I, "LSP parameter interpretation for speech coders",
IEEE International Conference on Electronics, Circuits and
Systems, Sep 1999, Pafos, Cyprus, IEEE.
McLoughlin
I, Chance R.J., "LSP-based speech enhancement", 13th
International Conference on DSP, Jul 1999, Santorini, Greece,
IEE/IEEE.
|
Hearing-related
research

|
Hearing-related research
Back in 1999 I read an IEEE
magazine article about image and video steganography (data hiding
inside an image). It made me wonder if it could be done in audio
too... So I defined some projects to find out. I was funded for
this by the Singapore government.
One project was to design and
build embedded hardware able to demonstrate the technique. The
hardware simply inserted the data into the least significant bits
of the audio data stream – the number of bits was adjustable.
This was done using FPGA.
The
second project, purely computation, did something similar, but
used a psychoacoustic model of my own design to 'decide' which
bits could be used for insertion. This worked extremely well, and
was evaluated by an expert listening panel with various kinds of
sound and music as being imperceptible in use.
**EXPECT
MORE HERE SOON** Now I'm
working on bio-mimetic computer hearing systems. Two papers
already submitted on this.
|
Tio
C M M, McLoughlin I, Adi R W, "Perceptual audio data
concealment and watermarking scheme using direct frequency domain
substitution", IEE Proceedings - Vision, Image & Signal
Processing, Vol.149, Issue 6, Dec. 2002.
McLoughlin
I, Adi RW, Tio M.M. Cedric, "Hardware architecture for LSB
data concealment using subband filterbank coding", 2000 IEEE
Asia-Pacific Conference on Circuits and Systems, 04 Dec 2000,
Tianjin, China, IEEE, No. 249.
McLoughlin
I, Tio M M Cedric, Adi RW, "Data concealment in audio using
a nonlinear frequency distribution of PRBS coded data and
frequency-domain LSB insertion", 2000 TENCON IEEE Region 10
Conference, 27 Sep 2000, Malaysia, IEEE, No. 085.
McLoughlin
I, Adi RW, Cedric M M Tio, "Hardware architecture for data
concealment using sub-band coding,LSB coding and pseudo-random
bit stream generators", 2000 TENCON IEEE Region 10
Conference, 26 Sep 2000, Malaysia, IEEE, No. 084.
|
Other
speech-related things that don't fit above
|
|
Jiang
B, Song Y, Wei S., Liu J.-H., McLoughlin I, Dai L.-R., “Deep
Bottleneck Features for Spoken Language Identification”,
PLoS-One, 9(7), e100795, 1 July 2014
Jiang
B, Song Y, Wei S., McLoughlin I and Dai L.-R., “Task-aware Deep
Bottleneck Features for Spoken Language Identification”, Proc.
Interspeech 2014, Singapore, Sept. 2014
Meng-Ge
Wang, Song Y, Jiang B, Li-Rong Dai, Ian McLoulghlin, “Exemplar
Based Language Recognition Method For Short-Duration Speech
Segments”, 2013 International Conference on Acoustics, Speech
and Signal Processing, ICASSP Vancouver, Canada
Ahmadi
F, McLoughlin, I, Chauhan S, Ter-Haar G., “Bio-effects
and safety of low intensity, low frequency ultrasonic exposure”,
Progress in Biophysics & Molecular Biology, (accepted Jan 23
2012).
Ahmadi
F, McLoughin I. “A new mechanical index for gauging the human
bio-effects of low frequency ultrasound”, The 35th annual
international conference of the IEEE Engineering in Medicine and
Biology Society Osaka, Japan, June 2013
Rajapakse
J, Loh V K, McLoughlin I, Wei L, "Performance Comparison of
ICA Neural Networks Separating Audio Signals", International
Conference on Information, Communication, and Signal Processing,
Dec 1999, Singapore.
McLoughlin
I, Ding ZQ, "Joint time-Frequency Distribution Analysis of
Pitch Pulses", 2001 Int. Conf. on Integrated Circuits and
Systems, ICICS, Singapore, IEEE, Oct. 2001.
McLoughlin
I, Ding ZQ, Tan Eng Chong, "How to trackpitch pulses? -
Joint time-frequency distribution approach", Pacific Rim
Conference on Communications, Computers and Signal Processing
(PACRIM'01), Victoria, B.C., Canada, Sept. 2001.
Wei
Zeng, Xianfeng Huang , Stefan M uller Arisona, McLoughlin I,
“Classifying watermelon ripeness by analysing acoustic signals
using mobile devices”, Accepted 08 Jul 2013 for Personal and
Ubiquitous Computing Journal
|