What Happened to Audio?

Asher · May 14, 2024, 3:18am

Hi, thanks for expressing your concern about this!

Unfortunately, you are correct about the degredation in quality. The software we use (Voicepeak) has some technical limitations with its accuracy for reading things due to its inability to pronounce kanji with the accurate reading 100% of the time. Due to this, we had a lot of weird mistakes in the original batch of audio where something like ひと would be read as じん or にん and a lot of similar things.

In order to get around this, we tried programmatically removing the kanji from everything so that the correct kana would be read 100% of the time, but this resulted in what we have now where the audio can’t seem to tell where one word ends and another begins, creating very strange pronunciations. In our small scale test we didn’t have this issue, but obviously with the massive bulk input, there turned out to be many cases that confused the software. In retrospect, we should have held off on implementing this until we tested that everything was correct, and apologize for the inconvenience!

For now, we are looking at rolling back to the old audio files, as there were far fewer mistakes. Additionally, we have also started looking into some other alternatives such as 音読さん and Microsoft Azure for creating audio for sentences, as they are both far far more reliable in terms of getting the readings of kanji correct, and are the current top picks for lifelike TTS.

Curnan2 · May 14, 2024, 3:51am

Thanks a lot for this information, Asher. I have experience using Azure VTT (Voice to Text, sort of the inverse of what you’re doing but same underlying technology) as part of my job, both with English and Japanese, and it is indeed a solid choice in terms of quality and speed. I suspect the current tool you’re using simply won’t scale properly with the literal tens of thousands of sentences you probably need to run through.

mekomori · May 15, 2024, 7:17pm

This post was flagged by the community and is temporarily hidden.

Keat17 · May 19, 2024, 7:40am

Misreading - は is definitely the thing I find the most annoying atm (more than kanji misreadings). Some sentences just become so unnatural it’s unusable in my opinion. I know the team is working on fixes, looking fwd to that!

kizzlesully · May 19, 2024, 3:05pm

Not really a fan of the muffled lady voice either. I find it hard to follow.

Curnan2 · May 20, 2024, 3:22am

You mean the new voice that seems to cover N5 vocab sentences? I assume that’s the new Azure AI. I agree it sounds more “muffled” (or tbh I kinda thought it just sounded like an older woman), but if it doesn’t have the issues we were having before, it’s a welcomed change from my perspective

FlippFuzz · May 20, 2024, 3:43am

Yes. It sounds really muffled.

Asher · May 20, 2024, 4:56am

@Curnan2 @FlippFuzz @kizzlesully

Any examples of particularly muffled sentences? We chose the voices that the native speakers on our team chose to be the most natural. Out of the male voices, the current one is definitely the best imo, but there were a few female ones that were close.

We’re potentially open to changing it as it isn’t a huge hassle for us, but would need some examples first. I can provide a clip of the other available voices to compare if needed.

Edit - The only ones I can find that seem muffled a little are the ones where there is a 「 at the beginning. We fixed this by adding a full stop between topics and quotations, but the correct set may not have been uploaded. I will double check that the correct ones were uploaded, as that fixes the pacing issues. One other thing I noticed is that they tend to play very quickly, sometimes cutting off the first fraction of the sentence. Replaying the sentence fixes this. Maybe try replaying the sentence to see if that is the issue?

2nd Edit - I have just tested the audio myself on site compared to on Azure. It appears that some massive compression has happened somewhere, and the audio is definitely lower quality than it should be. I will get this and the spacing problem fixed asap!

kizzlesully · May 20, 2024, 5:27am

Compression is probably better than muffled. it is likely the lack of dynaimcs that is making it difficult to focus while listening, and dinstingish what is being said.

Flandre5carlet · May 20, 2024, 8:04am

Yeah the audio on the new TTS sounds a little compressed, but the sentence-reading itself is MUCH better than the previous TTS imo. If I had to choose between the two I’d choose the new one despite the audio compression, so it’s good to know the compression is unintended.

Asher · May 20, 2024, 10:37am

It will be fixed in a few hours or so. We figured out the problem and are reprocessing the files now.

kizzlesully · May 20, 2024, 4:15pm

The voice is MUCH better now.

Edit: For what has been updated, so far :).

There is a bit of artifacting (?) with the female voice when there is a word with a quick pitch upward, like 毎日. It happens with all vowels; ランク is an example of an ‘a’ vowel word; 便利 is an example of an ‘e’ vowel word. The male voice has it in places, too.

Just pointing it out incase there is something you can do, or it is something that can be passed on to the azure team. Not a big deal.

BunproSupport · May 20, 2024, 5:31pm

@Flandre5carlet @kizzlesully @FlippFuzz @Curnan2

The file was using low sample rate, which caused it to sound flat and muffled.
It should be fixed now

Asher · May 21, 2024, 3:24am

Any examples of specific sentences that you can give me that sound particularly odd, I can see if I can fiddle around with or create custom lexicon entries so that they read correctly. Feel free to go nuts on the suggestions!

I know there are a few misreadings like びみしい instead of おいしい sometimes when kanji is used, but these seem to be random errors that occur every now and then rather than every time, so it will just be a matter of fixing the ones that are off. Thankfully from my testing less than 1% of sentences I would say have mispronunciations.

Sending error reports on the site, in this thread, or directly to me via PM is completely fine! I will fix the things I can, and might make some suggestions to Microsoft about anything weird that I can’t fix myself.

Note - There’s no need to report spacing things like one word being said too quickly after the previous one. I am already working on a fix for this in most places where it would matter.

Curnan2 · May 21, 2024, 4:00am

The new uncompressed audio seems really great to me. Please role this out to all audio sentences! Really makes a difference in quality of listening practice.

Asher · May 21, 2024, 4:42am

Shall do. We are doing one N-level at a time as I fix a few formatting things that allow us to get much better pacing consistency and make bulk changes easily. Just finished N3, so that should be getting processed today and uploaded tonight all going well. N2 and N1 will be a few days to a week after that, and then we’ll do all the remaining additional list stuff.

Curnan2 · May 21, 2024, 11:19pm

Just an additional point here, as of last night a large number of the (I assume yet-to-be-updated) recordings have stopped playing. I assume this is all part of the upgrade, but just flagging this in case.

Jake · May 22, 2024, 5:01am

I believe this should be fixed now.

chicharron · May 22, 2024, 4:06pm

Hi!
I think the new audio is very good! But I noticed some mistakes during my reviews this morning.
On 放っておく: 放っておく (日本語能力試験 N4) | Bunpro I think the audio for that word is wrong.
Also on 表面表面 (日本語能力試験 N3) | Bunpro the first sentence, the audio for 月 is wrong too. After that I decided to check the vocabulary for 月 (moon) 月 (日本語能力試験 N5) | Bunpro and seems some of them are wrong too.
Also on 包む包む (日本語能力試験 N4) | Bunpro the furigana for that word is くるむ but the TTS keeps reading it as つつむ so I can’t tell which one is wrong here.

Keat17 · May 24, 2024, 3:59am

Just want to give a shout-out to the team because the new audio is
better by miles than it used to be! Makes a huge difference…