Parametric resynthesis of speech
from whispers (associated with the wider Bionic Voice Project)
By Professor
I. V. McLoughlin
(with help from his USTC
students, especially Mr Jingjie Li, thank you!)
These are the demo files referred to in our ISCSLP2014 paper:
Sample 1 - original speech recording (Chinese)
Sample 1 - reconstruction using sinewave-speech based system with 'plausible' pitch
Sample 2 - original speech recording (Chinese)
Sample 2 - reconstruction using sinewave-speech based system with 'plausible' pitch
Here
is a simplified block diagram of the system (taken from the paper).
Clearly, this is an LPC-based structure similar to a CELP codec. The
structure allows us to select the pitch source – which is the way
we can evaluate different methods of pitch regeneration for
comparative purposes.
This is the latest release of data only as of 30th October 2013. The method details and MATLAB code will be released just as soon as it's been published! But as a teaser, listen to these samples:
Original speech recording (in Chinese)
Artificially whisperised version... this is the ONLY input to the reconstruction system
Just as in the original paper, the pitch is harmonically related to the formants, but you will have to wait a while longer to get the full details!
This relates to the following paper... please cite this paper if you use the samples or MATLAB code:
McLoughin I, Lee Jingjie, Song Yan, "Reconstruction of continuous voiced speech from whispers", Proc. Interspeech 2013, Lyon, France, August 2013.
Here are some samples for you to listen to. You can hear that the quality of the reconstruction is poor - although it is MUCH better than electrolarynx speech - but it has many of the vocal characteristics of 'real' speech, and that is what we are aiming for:
Artificially whisperised version... this is the ONLY input to the reconstruction system
The INTERSPEECH2013 paper (click here for a PDF copy of the draft (not final) version of the paper) did provide many of the details about how to perform such a reconstruction. However we have decided to release full MATLAB code for download in order to help other researchers to get up to speed on parametric resynthesis. We need more people working in this area to make some big breakthroughs before this type of technology can be made ready for day-to-day use:
Download the MATLAB codes from here
How to use this
Everything starts with:
reconstruct_speech.m
It reads an example TIMIT file, which uses the readsph() function. If you don't have that or you don't have TIMIT, just comment out that line and instead use wavread() to load speech from a mono (single channel, 16 bit) .wav file in the current directory.
The code resamples whatever input file you give it to 8kHz. Next, it artificially whisperises the speech using my function speech_to_whisper_shift() in speech_to_whisper_shift.m which I have provided free online in the same archive. Again, if you find this function useful in your own work, please cite the above paper when you publish!
rootformanttracker(), synthtrax() and some other functions have been derived, sometimes with slight modifications, sometimes used as-is, by code that other researchers have kindly made available online. These are not mine: the authors are acknowledged clearly in the files (and please respect the needs of those authors to have their work cited also).
You can see a copy of the spectrogram obtained from this system for the three example files (original, whisperised and reconstructed) in the following picture: