Add an option to turn off autoplaying autogenerated audio for vocab

Hi!
I really appreciate recent update when all vocab example sentences got their audio. I’m sure many users will find it incredibly useful.

But I personally find those audio unnatural and thus annoying. The voice is monotonous (and even kinda pompous like a sorceress casts its spell or like a stiff 部長), also there’s no any logical pause that a real human would do in many places. It makes pause only when a comma occurs (and this pause sounds unnecessary long sometimes) or sometimes in completely random illogical places. When there’s no any comma, the voice just rattle a phrase through, when a real human would run out of breath somewhere in the middle of the sentence.
It totally lacks any emotions, it reads imperative and declarative sentences in the same way.
And there are a lot of pronunciation errors, even 食べる became “shokueru” many times. It makes pauses in the middle of some katakana loanwords. It often drops dakutens and handakutens reading kana as if there’s none of them.

So I’d like to have an option to opt out autoplaying autogenerated audio when doing reviews, but still autoplay those recorded by real people.

7 Likes

Hi @HotAirGun, thanks for the feedback! We’ll definitely look into adding this kind of option in the future, and will continue to improve the system moving forward.

The missing dakutens or handakutens is not something I have experienced myself, so if you could point me to an example of this I will add it to the list of things to add to the program’s dictionary that we are improving. I also cannot find any example of ‘shokueru’, but was able to find ‘taberu’, which is a bit strange unless one of the other staff have done a quick fix for that. Again if you could find a link, we can add it to the program’s dictionary. @Chihiro has already made a sheet of some of the common words that were being mispronounced, and we’ll keep adding to that/fixing that straight away.

Totally understand the concern about the lack of emotion, but for the purpose of reviewing and learning how to recognize words in speech, we feel that this TTS is far better than nothing at all, and in many cases actually even easier than listening to a native, as all syllables are pronounced correctly. My personal gripe with the system is that the actual voice is natural, but, like you stated there is no emotion behind it. I believe that this is a big contributor to the ‘not quite right’ feeling that some sentences can give. We can actually alter emotions ranging from happy to angry to sad using this software though, so we will eventually also go through and fix the sentences that sound way off.

As mentioned, this program itself has a built in dictionary that we can save any common mistakes to and then reparse every single sentence again in just a few minutes, so we will be doing this over and over until we have caught the vast majority of the errors.

Unrelated personal opinion - I have been listening to these for about an hour and a half a day just on shuffle while I go for a walk around the neighborhood. They are an excellent way to train your ears, regardless of whether or not it is a real person. TTS will not stop anyone from mastering a language, and the pros far outweigh the cons. (Provided that the sentences are actually 100% correct, and the only thing missing is the human touch)

Please do not take this as disagreeing with your opinion at all. What you have said is totally valid and I think a lot of people will agree. I just wanted to make its value clear from the perspective of a student who is more interested in blasting their ears with Japanese and polishing their listening, rather than who is actually speaking.

6 Likes

Hi, and thank you for the answer!
I’d like to provide some examples, but it’s futile to give any links to the mp3 files because Amazon CDN you’re using disallows hotlinking. All mp3 files returns Access Denied if accessed directly. Probably Bunpro’s tech staff can allow hotlinking in the Amazon control panel.

Here, last two sentences. (Actually those weren’t the sentence I encountered “shokueru”, I don’t remember which vocab it has and can’t find the sentence now.)
Also you can hear right there that this voice engine just losts dakuten pronouncing べ as a particle へ, resulting in “e”.
Also there:
ドキュメンタリー becomes “to pause mentari”, losing dakuten and making weird pause
何も becomes “nanmo” (is this engine from Osaka?)

So, should I report errors in auto-generated audio to the feedback system? I afraid it will result in a very big pile :grin:

Could you guys please make the voice softer? Now it sounds like it’s threatening me or whatever. Also lacking of emotions gives me a strong vibe of uncanny valley, that’s why I’d prefer to opt out autoplaying auto-generated audio. Actually I already turned off autoplay, because of some kind of anxiety I feel when listen to this voice engine (at least with current sound settings).

3 Likes

That’s actually very strange. Considering the programming is the same across all sentences, I’m gonna go out on a limb and say that this may actually be human error. We’ll get to the bottom of it!

This problem is specifically with katakana words. We’re already going through trying to identify all the culprits.

Agree with you 100% here. I also get this vibe, but I just accept it similar to a video game. No matter how good the graphics get, the human eye, just like the ear, pick up on the most minute of details.

Actually, if it’s okay with you, I might actually just share our internal staff Excel sheet specifically for fixing this so that you can see the ones we’ve already identified and are in the process of fixing. I know you’re always very helpful with pointing out this stuff to us, so this way will probably save us both time :blush:.

P.S Have you tried both the male and female voices? I find the male way easier to listen to. It also has much more natural pauses.

3 Likes

Yes, feel free to send the link to DM :slight_smile:

Oh, is there the male one? I’ve tried to select “Male” once, but there was an error that the male voice is unavailable for this phrase. I believe that was a phrase with an auto-generated audio, but I may be wrong.

2 Likes

I wanted to comment, because I was just about to create this very topic and request. Actually I’m kind of surprised this hasn’t gotten more likes and other users’ feedback.

Apart from all the mistakes (e.g. 人 is read as ひと not じん in all these example sentences) I just cannot with this voice. I don’t know - it’s so good it’s bad. Definitely uncanny valley and for me currently a real hindrance to adding vocabs to my review queue, because I love the grammar audio and before answering I’ll always say out loud the example sentence to then compare with the audio. So I don’t wanna turn off autoplay really.

7 Likes