Parametric resynthesis of speech from whispers (associated with the wider Bionic Voice Project)
By Professor I. V. McLoughlin
(with help from his USTC students, especially Mr Jingjie Li, thank you!)


Release 3 - 2014.09.14 (ISCSLP paper)

These are the demo files referred to in our ISCSLP2014 paper:


Here is a simplified block diagram of the system (taken from the paper). Clearly, this is an LPC-based structure similar to a CELP codec. The structure allows us to select the pitch source – which is the way we can evaluate different methods of pitch regeneration for comparative purposes.


Release 2 - 2013.10.30 (journal paper)

This is the latest release of data only as of 30th October 2013. The method details and MATLAB code will be released just as soon as it's been published! But as a teaser, listen to these samples:

Just as in the original paper, the pitch is harmonically related to the formants, but you will have to wait a while longer to get the full details!


Release 1 - 2013.06.22

This relates to the following paper... please cite this paper if you use the samples or MATLAB code:

McLoughin I, Lee Jingjie, Song Yan, "Reconstruction of continuous voiced speech from whispers", Proc. Interspeech 2013, Lyon, France, August 2013.

Here are some samples for you to listen to. You can hear that the quality of the reconstruction is poor - although it is MUCH better than electrolarynx speech - but it has many of the vocal characteristics of 'real' speech, and that is what we are aiming for:

The INTERSPEECH2013 paper (click here for a PDF copy of the draft (not final) version of the paper) did provide many of the details about how to perform such a reconstruction. However we have decided to release full MATLAB code for download in order to help other researchers to get up to speed on parametric resynthesis. We need more people working in this area to make some big breakthroughs before this type of technology can be made ready for day-to-day use:

How to use this

Everything starts with:

reconstruct_speech.m

It reads an example TIMIT file, which uses the readsph() function. If you don't have that or you don't have TIMIT, just comment out that line and instead use wavread() to load speech from a mono (single channel, 16 bit) .wav file in the current directory.

The code resamples whatever input file you give it to 8kHz. Next, it artificially whisperises the speech using my function speech_to_whisper_shift() in speech_to_whisper_shift.m which I have provided free online in the same archive. Again, if you find this function useful in your own work, please cite the above paper when you publish!

rootformanttracker(), synthtrax() and some other functions have been derived, sometimes with slight modifications, sometimes used as-is, by code that other researchers have kindly made available online. These are not mine: the authors are acknowledged clearly in the files (and please respect the needs of those authors to have their work cited also).

You can see a copy of the spectrogram obtained from this system for the three example files (original, whisperised and reconstructed) in the following picture: