AI generated audio 👎

That website is actually really good, thanks for suggesting it! I feel like I saw it a few years ago but completely forgot about it. I definitely recommend anyone use this as well when learning new words… Actually I might look into making it a resource that we recommend in vocab decks as an external reference.

Hi @hoangr87,

Yes, you’re right - it’s an amazing program! One of the things that makes it so amazing is that the creators are very open to suggestions, as you can see from their responses to queries about AI-generated audio content.

Accent — and especially pitch accent — are very important when learning Japanese. For example, hashi 端 (chopsticks) and hashi 橋 (bridge) have different pitch accents, without which you would be eating your bento with a bridge and crossing the river on chopsticks!

Another amazing thing about Bunpro is that it’s a very supportive community. I hope you will enjoy that aspect of the platform on your journey to learning Japanese. 頑張って!

2 Likes

I agree, I avoid using the audio features in the vocab section because it doesn’t help me. It feels revolting to me to be honest, I guess such is my aversion to AI voice, uncanny valley style. I didn’t know how many people have this problem, it is a problem for me (so I avoid using them, and of course that takes away quite some value for me), but I guess if I am in the minority who cares.

As an example of a sentence that makes me feel that I would give probably all the example sentence for vocab (at least N5 vocab).

Example sentences in the grammar section are great though (again, at least the ones on N5/N4 levels).

Uh, no. It’s a completely legitimate concern when this is an educational tool that we pay money to use. I would rather have no audio than AI audio.

5 Likes

I really like the look of most parts of this tool as a new user, but also as a new user who has no involvement in this community, I have no fear of saying that the audio is definitely a little weird and slightly turned me off, since I thought before seeing this thread it was non-native speaking with a low quality microphone. To be clear, I believe this while having no qualms about AI in general or the ‘ethical’ considerations. Rather, I feel there are generally two different reasons to listen to audio:

  1. To work on pronunciation and shadowing, and learn the natural tone. No-one except AI would stress the first syllable in the word “syllable” for example. This is more often tested when talking to a native speaker.
  2. To work on recognizing words and vocabulary in speech. This purpose you seem quite familiar with, but from what I understand it is important for both JLPT and holding a conversation.

For the second purpose, this is definitely a step up, though for the first I find it mildly problematic. I also worry there might native audio mixed in with the AI generated stuff, so it would be nice at least to have an option to turn it off for the people who find AI generated stuff annoying or inappropriate for this task.

I would also to ask what direction this platform as plans to go: Is it supposed to be a “jack-of-all-trades” for learning Japanese, including vocabulary, grammar, immersion via stories, and listening? If so, wouldn’t it be worthwhile to consider pronunciation as well? If not, then what is to say listening should not be left to other platforms more equipped to handle it?

2 Likes

Hey, is there anything specific backing your suggestion not to use the AI audio for language shadowing? I use it and do find that integrating speaking in to my typical flashcard use has been good for helping me read and speak faster, as keeping up with the audio track really pushes me to be better. I suspect that physically voicing out the words would help to build stronger memories, too. I recommend Bunpro to people all the time specifically for language shadowing opportunity. Particularly to new people who show up to my conversation group with very little speaking experience.

How inaccurate would you guess that the audio is? Situations where the voice reads the incorrect reading of the kanji I can easily ignore. A lot of the criticism from other people seems to be centered on the melodic flow of the sentence overall. Do you think that the pitch accent of individual words is largely correct?

Personally I’d rather have 10x more spoken content for shadowing than have it be completely perfect, so I’m on board with the AI if voice actors are cost prohibitive. And the technology is getting better every year!

1 Like

Just to be clear, are you talking about the grammar or vocab audio? All grammar audio is recorded by native speakers.

It is and it is not problematic. I will answer this question more fully a bit lower with @Eroliene’s question at the same time.

There is no AI audio mixed with regular audio on the site. The only time this would happen is if you have both your vocab and grammar reviews mixed together.

Rather than being a jack of all trades, we’re a grammar focused tool that uses grammar as the fundamental base to tie everything else together. Hence why all of the vocab sentences actively use as much level appropriate grammar as possible. We may consider pronunciation as well in the future, but it is not something we are actively focusing on at the moment. We do include all pitch accent information for vocab though.

@DeclanF this will probably answer your above question as well about whether shadowing is problematic.

To be honest, I think the biggest difference between simply listening and shadowing is that one requires concentration (listening), while the other requires more active memory use (shadowing). That is to say, shadowing is a lot harder than simply listening, as you need to think about many more things at the same time. For people that are quite high level, shadowing using AI should not really be a problem, as you’d already know what sounds right and wrong in most cases. However if someone is a lot lower level, they may have no idea what natural Japanese sounds like, so can’t even really tell what parts sound off compared to native audio.

Basically I think listening to AI audio can and will benefit anyone, but shadowing from AI audio will really only benefit people that have enough experience to make active judgements on what they’re listening to. It’s a really hard question because I have heard people speak Japanese that have been learning for over 10 years and I could barely understand a word that they said, and I have heard people that started studying a month ago and they have fantastic pronunciation. Speaking is a very very individual skill.

I tend to agree. For me really the only super unnatural stuff is the pacing. This is usually when the sentence omits a particle, so the AI doesn’t put in a natural pause where the particle would usually have been. Sentences that have all of their particles usually have great pacing. As for the pitch accent, for the most part it is not bad. In testing, I can’t remember any sentences that I couldn’t understand specifically because of pitch.

Summary - To be honest shadowing from really anything will be better than not speaking at all (personal opinion). It is a lot harder to understand someone that only speaks in stilted sentences because they’re not used to actually using their mouth to produce Japanese, than it is to understand poor pitch. Teach your mouth how to move, and worry about the finer points later.

3 Likes

Thank you for the response! It alleviates some of my concerns. I was talking about the vocabulary audio, though I haven’t listened to the male voice which one of the other responses says is better, since the female one is default. As long as native audio isn’t mixed with AI generated stuff, that’s the thing I worry about the most, since that would be sacrificing quality for quantity. I also was not aware there were ways to view pitch audio, I’ll have to look into that. I still feel like maybe a toggle would be nice to remove the audio from the UI, since I know AI agitates a lot of people, but at the same time a simple user script could likely do something of that nature. I think it might also be worthwhile to add a note somewhere saying that the audio is AI generated, since that would also help ensure that people both use it primarily for listening, and make sure people like me don’t get concerned when it is the first audio they hear.

3 Likes

Got some unpopular opinion among commenters here:

Again I stan the male voice. Dude got some handsome voice, maybe he needs a synthetic face 2 :clown_face: Also I prefer the male AI voice from the vocabulary decks to the male RL voice on the lower grammar levels (N4). Actually that voice sounds off to me. At least the AI voice has a strong diaphragm. :wink:

But seriously: I didn’t use Bunpro for 1 1/2 - 2 years and I’m really happy about the AI voices. It is such a great benefit to have thousands of vocabulary words in the lower levels voiced. I really dig it, makes it much so much easier to remember.

I think the rhythm of the sentences is pretty impressive. And the pronounciation is also quite good. Humbly I say, mine is of course far worse. Again: these are several thousand sentences voiced. Spoken by a virtual japanese lad from the matrix. I take them, thank you very much.

Yes japanese Person X already corrected my pitch for a word which I was probably too stupid for OR the AI mispronounced but I’d rather learn a word, misapply and have a IRL situation which burns it into my brain than not having a virtual voice helping me getting to that situation. The context was funny, if I mispronounce again, gonna be funny again. It’s not that we all don’t say funny things voluntarily or involuntarily in our own native languages (or english), IME pitch mispronounciation can actually be S-level bonding opportunity with japanese people.

6 Likes

There’s a male AI vocab voice?

3 Likes

Thank you so much for the very detailed reply! I think I agree with your impression that the pitch is acceptable, and your instinct that getting in regular speaking practice at speed is much more important than it being 100% correct 100% of the time.

Sometimes literally I come across a new word that like, I’ve never spoken that combination of syllables before or something? I’ll trip on it again and again and again as my mouth tries to snap back to some more common syllabic pattern. But if I keep drilling and don’t give up, eventually I start to get it with more and more consistency. It feels like I had to stretch my brain to accommodate the existence of this heretofore alien sequence, and I walk away from those feeling like I really learned something when it finally comes out clean.

3 Likes

Can you at least make it an option so that we can avoid embarrassing ourselves by training against unnatural sounding voices? To be honest, I would have not signed up for Bunpro if I was aware that you were using AI-generated audio.

3 Likes

2 Likes

Here I see everyone discussing about the quality of the vocabulary audio and I’m so confused, as I don’t seem to have any audio option at all for any vocabulary (and I’d love to! don’t mind at all if it’s AI generated).

I use Bunpro through the android app (version is 0.4.1. that seems to be the last one), and I’m only doing reviews of N5 level vocabulary. Is this feature only available for higher levels, or do I need to join the beta for the app?

1 Like

Are you studying using the fill-in-the-blank style questions?
Currently there is only audio available for those.

1 Like

Oh I see… thats it then, I do the vocab reviews with the manual translate mode.

Is it planned to extend to it? I guess AI could be less reliable for accurate generation of single words compared to full sentences, but even so I think would be great having the option.

I totally get why there is AI audio for vocabulary and while I don’t think that is a problem, It it quite annoying when there are mispronunciations or when it just uses the incorrect reading for the context of the sentence.

It would be appreciated if each audio clip would be at least be verified by a human. If that is too much work to put on the Bunpro team, maybe adding a way for users to easily flag incorrect audio could be a step to making it a little better.

Problem is that user can identify bad audio only when user have higher level then the mistake…

4 Likes