Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It seems like it would be, right? There are 44 tract-diameters you can modify to shape the vocal tract, and these can be used to generated specific vowel formants. I can imagine you can build a system using deep learning that can find the best parameters to match a steady state periodic pitch. It's a bit how some speech codecs work, like LPC10.


Problem is, the output would sound pretty metallic, just like LPC10.


Maybe, maybe not. LPC10 is a 8kHz speech codec optimized for low-bandwith signals. The Kelley-Lochbaum is a full-blown physical model of the tract.

What you put into the filter is important. The LF glottal pulse model used here is a pretty good excitation signal... aspiration noise REALLY makes a difference. It would still sound artificial, but it definitely wouldn't sound metallic.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: