11. English Phonetics
English is a complicated language, with dozens of native dialects and just, way too many vowels dude.
Since English spelling rarely reflects how it's pronounced, the only way to use it in UTAU is by using a phonetic system. There are three phonetic systems currently in use, and they vary quite a bit from each other. It's impossible to describe how things are pronounced through text, but I've made a chart to hopefully help you figure things out.
Since English spelling rarely reflects how it's pronounced, the only way to use it in UTAU is by using a phonetic system. There are three phonetic systems currently in use, and they vary quite a bit from each other. It's impossible to describe how things are pronounced through text, but I've made a chart to hopefully help you figure things out.
It's also worth nothing that CZ pronounced all of her phonemes in this video, and Yoichi made a chart using the phonetics as well here. The best way to figure out which phoneme suits the situation is to say it out loud and listen to yourself carefully. Phonetics are something you have to train yourself to listen for, but once you get used to it it'll be much easier.
Diphthongs
The biggest difference between English and Japanese voicebanks is the vowels - namely, the diphthongs. A diphthong is a vowel that's actually two vowel sounds stuck together, end-to-end. Because the vowel sound changes partway through, it's important to take proper precautions while otoing them to avoid weird stretching/looping glitches.
The otoing process varies significantly between systems, but the concept for diphthongs remains the same: For CV segments, make sure the right blank covers the second half of the vowel, and for VC segments of any kind, make sure the preutterance and overlap have large enough values to include both parts of the vowel.
The otoing process varies significantly between systems, but the concept for diphthongs remains the same: For CV segments, make sure the right blank covers the second half of the vowel, and for VC segments of any kind, make sure the preutterance and overlap have large enough values to include both parts of the vowel.