AI generated audio 👎

Hi there, is it just me or the intonation and musicality in the new AI generated audio is not that great?
I used to practice shadowing on the old audio clips but now it sounds so off I don’t even bother playing it.
Bit of a downer if I may say, what do you guys think?

16 Likes

Indeed I would rather have native audio !

13 Likes

Is there new audio out? I haven’t done any reviews these last couple days so I can’t say.

Just adding that the grammar points are all native audio. There’s like a bijillion vocab sentences, to have those all be native audio would take a lot of time, effort, and resources that’s probably better spent elsewhere. The last ai generated vocab audio I heard was completely fine for vocab review, but I know they were looking into a different service

5 Likes

True I haven’t thought about the sheer number of example sentences, the ai audio does the job but sadly the overall musicality of the sentence is often wrong.

Maybe ive only heard real audio so far on the site. But so long as it’s correct and not too robotic sounding then all good whatever.

3 Likes

Can you point out a specific vocab word that has sentences you feel sound off. We can definitely take a look :+1:

1 Like

I can point you to some examples but generally, the longer sentences are the ones that suffer the most.
Maybe it’s just me :blush: or maybe is the slight robotic way of delivering the lines that puts my off.
I’ll go back in my cave now but thanks for looking into it.


1 Like

Btw I’m using the female voice. Thanks

2 Likes

native audio would make a world of a difference indeed !

I feel every single sentence I have come across sounds very good but not there, and that constant small off-sounding bit pushed me not to use the audio on bunpro at all personnaly.

8 Likes

I would highly advise anyone to listen to the audio, so long as there are no mistakes in it. Listening is a skill that you build basically for free, and not listening to something just because it is not perfect is like refusing to eat cake because there’s no cream on the side.

In saying that, we will update the audio again in the future when an even better AI model comes out. As of right now, we are using the best/one of the best ones. Unfortuantely with the sheer amount of example sentences, human actors are not a possibility (except for grammar which we aim to have all human audio for).

19 Likes

Asher I totally see your point and agree having an important volume of audio exposure is crucial.

I disagree however on the tolerance to imperfections as they are not that subtle to be overlooked.
I am currently focusing on shadowing on the side and the AI is really problematic as if even slightly off it is counterproductive to the effort I put aside of bunpro in that regard.
It isn’t a matter of someone having an accent or speaking in an unusual manner which you could adapt and even learn from but literally issues that wouldn’t happen from a human speaker. I can’t build any trust on that base as I value the pronunciation so much.

I was initially very impressed by the AI output on bunpro and indeed thought the imperfections are fine, but I realised progressively these “details” are absolutely changing it all in comparison to my other audio resources.

2 Likes

I 100% agree with what you are saying for the purpose of shadowing. I do not recommend using AI audio for shadowing at all. It is probably still a few years away from AI audio being at that level.

My statement is purely in regard to training your ears to listen, as it’s arguably one of the hardest steps to achieve, as you don’t have the benefit of seeing kanji/katakana/hiragana in front of your eyes when listening to things. What that does is allow you a chance to develop the ability to cycle through options of words in your head in real time and get faster and faster at ‘following speech’. Even if there are faults in the pronunciation, I think the benefit far outweighs the negatives.

The best example I can give is that it is like running on a treadmill. There are not the small bumps all over the street like a real road would have, and you are not able to adjust your pace with every step, but that doesn’t mean that it would be better to sit on the couch if you can’t get outside.

I will say though that this is 100% a personal opinion, and people absolutely should do whatever they want. They aren’t however going to injure their ability to listen with AI (just look how many native Youtbe channels are absolutely the worst vocaloid robot type voices).

5 Likes

This website allows you to type jp words and have youtube videos with that word come up.

13 Likes

If it’s possible to do for grammar decks then how is it not for vocab decks? It doesn’t have to be done at once or fast, you can start with the words with highest frequencies.

1 Like

I am aware you have a much more advanced command of Japanese than I have so part of me is just like “now just take the advice and do what he says!” :slight_smile:
I have no doubt your point is very much valid, but at the same time I don’t lack audio resources to fill my exposure time so I’ll just keep AI-free as much as I can, it is a personal feeling as you say as well.

I understand hiring actors is a whole different matter given the massive amount of content on bunpro and how its must be constantly evolving so I get the obvious benefit on your side as well.
Now I have been using Nativshark on the side which I am not completely convinced with for various reasons, but their audio content is fantastic and brings up the value immensely IMO. It isn’t much though, maybe 8 sentences /grammar point also with a choice of male/female actor. They have natural speech and very different speech styles.
I am bringing the comparison because they are also a small business so it seems doable, but the content is probably not as vast as bunpro.

1 Like

All valid points. I also would not recommend anyone just blindly listen to anyone’s advice :bowing_man:.

We’ll always do real audio for our grammar stuff, of which there are over 1000 points alone now.

Time. It has taken us several years to do audio for the 1000 or so grammar points we have on site. But there is over 1000 vocab items in N5 alone, and 10,000 by the end of N1. We would probably have to hire fulltime voice actors for it to be possible, something we do not currently have the means to do.

12 Likes

It’s actually so ridiculous to see people complain with such selfishness. Lacking any critical thinking to even have that tone asking why you guys can’t have genuine actors for every single line.

Thank you for creating such an amazing program.

11 Likes

What a great way to start off your first message :joy:

4 Likes

try using the male voice, it’s much better.

2 Likes

I’ll give it a go! Thanks

2 Likes