It seems like it would be, right? There are 44 tract-diameters you can modify to shape the vocal tract, and these can be used to generated specific vowel formants. I can imagine you can build a system using deep learning that can find the best parameters to match a steady state periodic pitch. It's a bit how some speech codecs work, like LPC10.
Maybe, maybe not. LPC10 is a 8kHz speech codec optimized for low-bandwith signals. The Kelley-Lochbaum is a full-blown physical model of the tract.
What you put into the filter is important. The LF glottal pulse model used here is a pretty good excitation signal... aspiration noise REALLY makes a difference. It would still sound artificial, but it definitely wouldn't sound metallic.