What Happened to Audio?

It seems like the auto-generated audio quality has possibly degraded recently. Before I recall only the key word would sometimes have an incorrect Kanji pronunciation, but now it sounds like the intonation/emphasis on numerous words is incorrect. Is anyone else noticing this?

Just wondering if there’s been some change to the way Bunpro handles auto-generated audio recordings. Still better to have them than not have them though, imo

5 Likes

Do you have examples?

2 Likes

One that I’ve just encountered would be the following sentence. The vocab word is 下宿:

“下宿で働く。
I will work at a lodging house.”

The emphasis is pretty clearly off. I never encountered this before, say a month ago, but now it seems very common.

2 Likes

More than emphasis it sounds almost like げしゆく

3 Likes

Yeah, that one is just plain wrong. It seems like the TTS thinks it is げしゆく :man_facepalming:

4 Likes

The sentences 父と母が仕事の関係で引越しをすることになったので、私は今親戚の家で下宿させてもらっている。and 下宿する場所を探す際には、必ずネットで調べてあらゆるサイトのレビューを比較するようにしている。also seem to have problems with when to pronounce は as わ.

3 Likes

Yes, there are a number of sentences where TTS says しゆ insead of しゅ (or other similarly applicable cases). Just right now, I stumbled upon 宿題しゅくだいをやる and it is read しゆくだい by TTS. Also a good few cases of 母は being pronounced hahaha :laughing:

4 Likes

I just thought it was odd that, before, the TTS sometimes would have the incorrect Kanji reading, but that was kinda it. Now there are these other pronunciation-related issues that seem suddenly very common. I use the audio a lot for listening practice, so it just seemed like…something changed rather suddenly. Maybe it’s a new batch of audio files that have been created, I don’t know. But it definitely wasn’t common before.

3 Likes

@Curnan2 @EbonyMidget @Flandre5carlet
Looking into those!

4 Likes

There are quite a lot of problems with the “new update”.

Misreading - は
君はちゃんと履歴書を持ってきたの? - Kimi HA
私の夢は警官です。 - yumei HA

Reading out punctuation.
ねぇ! 先生! 子供達を放っておいてよ!
Audio is doing: nee MIZO sensei MIZO

5 Likes

I just recently started using bunpro to study vocab which is all auto generated… quickly realized all the pronunciation mistakes. Now I only study vocab with my volume off. Glad it wasn’t just me who thought this!

2 Likes

Hi, thanks for expressing your concern about this! :bowing_man:

Unfortunately, you are correct about the degredation in quality. The software we use (Voicepeak) has some technical limitations with its accuracy for reading things due to its inability to pronounce kanji with the accurate reading 100% of the time. Due to this, we had a lot of weird mistakes in the original batch of audio where something like ひと would be read as じん or にん and a lot of similar things.

In order to get around this, we tried programmatically removing the kanji from everything so that the correct kana would be read 100% of the time, but this resulted in what we have now where the audio can’t seem to tell where one word ends and another begins, creating very strange pronunciations. In our small scale test we didn’t have this issue, but obviously with the massive bulk input, there turned out to be many cases that confused the software. In retrospect, we should have held off on implementing this until we tested that everything was correct, and apologize for the inconvenience!

For now, we are looking at rolling back to the old audio files, as there were far fewer mistakes. Additionally, we have also started looking into some other alternatives such as 音読さん and Microsoft Azure for creating audio for sentences, as they are both far far more reliable in terms of getting the readings of kanji correct, and are the current top picks for lifelike TTS.

16 Likes

Thanks a lot for this information, Asher. I have experience using Azure VTT (Voice to Text, sort of the inverse of what you’re doing but same underlying technology) as part of my job, both with English and Japanese, and it is indeed a solid choice in terms of quality and speed. I suspect the current tool you’re using simply won’t scale properly with the literal tens of thousands of sentences you probably need to run through.

4 Likes

This post was flagged by the community and is temporarily hidden.

Misreading - は is definitely the thing I find the most annoying atm (more than kanji misreadings). Some sentences just become so unnatural it’s unusable in my opinion. I know the team is working on fixes, looking fwd to that! :slight_smile:

1 Like

Not really a fan of the muffled lady voice either. I find it hard to follow.

3 Likes

You mean the new voice that seems to cover N5 vocab sentences? I assume that’s the new Azure AI. I agree it sounds more “muffled” (or tbh I kinda thought it just sounded like an older woman), but if it doesn’t have the issues we were having before, it’s a welcomed change from my perspective

2 Likes

Yes. It sounds really muffled.

2 Likes

@Curnan2 @FlippFuzz @kizzlesully

Any examples of particularly muffled sentences? We chose the voices that the native speakers on our team chose to be the most natural. Out of the male voices, the current one is definitely the best imo, but there were a few female ones that were close.

We’re potentially open to changing it as it isn’t a huge hassle for us, but would need some examples first. I can provide a clip of the other available voices to compare if needed.

Edit - The only ones I can find that seem muffled a little are the ones where there is a 「 at the beginning. We fixed this by adding a full stop between topics and quotations, but the correct set may not have been uploaded. I will double check that the correct ones were uploaded, as that fixes the pacing issues. One other thing I noticed is that they tend to play very quickly, sometimes cutting off the first fraction of the sentence. Replaying the sentence fixes this. Maybe try replaying the sentence to see if that is the issue?

2nd Edit - I have just tested the audio myself on site compared to on Azure. It appears that some massive compression has happened somewhere, and the audio is definitely lower quality than it should be. I will get this and the spacing problem fixed asap!

3 Likes

Compression is probably better than muffled. it is likely the lack of dynaimcs that is making it difficult to focus while listening, and dinstingish what is being said.

2 Likes