How good are translators?

I get curious, and sometimes when I’m feeling a bit lazy with immersion I’ll grab the very english book I’m reading, Ctrl + c then Ctrl + V into some translator and use a decent sounding tts to read it to me as I follow along. I find it kind of helpful with learning new vocab and getting exposed to as much jp as possible, but honestly now I’m having doubts if it’s building up bad or awkward language. I know they’re not fully accurate, but is it ok to use this kind of method for a small portion of immersion, or is it better to avoid them in their entirety?
Sorry if this is kind of a dumb question, but I’ve been contemplating it more recently and honestly even outside of this scenario I’m curious on this communities options of eng → jp translators!

3 Likes

oh, cool idea! personally i have never done this before as a form of practice or immersion. i think the issue is that literary japanese is much different from spoken japanese - same goes for english and therefore many things get ‘lost in translation’ in the same way that both languages have weird sayings that aren’t translatable directly between them. also in my experience the longer the passage the more clunky the translation tends to be haha but i am curious what others have to say about their experiences as well!

2 Likes

The method is good. People underestimate how effective corrective feedback in your native language is. You try to puzzle out a sentence, check your interpretation vs a translation, and then go back and figure out things you got wrong. Extremely, extremely effective for making big leaps in reading comprehension.

Machine translation is probably not the way to go about it, though.

4 Likes

I think after some some you just get a feel of “ohh it uses this but it’s cringe”

Like even if I speak my granny a lot, I do not use her expressions because my brain just seas that it is just not used in circles where I exist. Even if I grown up with her, I still quite easily switch register to the one I need in a required situation, and use constructions people use there.

The worst thing, for example ai, can give you, is probably that uses a bunch of elivated terms that no one really uses, but it’s very noticeable there was some precious’s contact with the language.

Ye, maybe something like this

2 Likes

What tools are you asking about, specifically? It really depends what tool and what you’re trying to use it for. I assume you mean Google Translate?

If you want quick and dirty audio, Google Translate is hard to beat. You can’t go full autopilot with it, as it will mix things up occasionally. In more ridiculous cases the audio will be at odds with the phonetic pronunciation Google itself is providing. Usually a kun/on reading mix up. When I’m making flash cards and I can’t get native audio of something at a decent sound quality, I use Coeiroink to output small words, e.g. I was trying to make a Anki card of 「やるじゃないか」 and I wanted a clean audio sample. Coeiroink’s pretty decent with this. However Coeiroink is not a translator, but a voice synthesizer. But for the purposes for “I put Japanese text in - I get sound out” it’s in the same general category.

I think the amount of hate machine translation gets is insane. I remember using Google Translate in the early 2010s, and Altavista and Babelfish before that existed. It was straight garbage. Modern machine translation is amazing. To include the audio Google spits out. If I may be allowed a strawman argument, the complaints about modern machine translators feels in league with “I fed an entire legal document into DeepL and IT MADE A MISTAKE! In a mission critical application!!!” Well… no shit Sherlock. Machine translators are a tool, not the whole toolbox.

Speaking about audio specifically, I’m not sure what other tools you might be referencing. Maybe some LLMs? I don’t have any experience there; I’m allergic to creating accounts unless I absolutely have to need to and the free & account-less models I can use don’t do audio output. If you know of some other notable spoken audio generators, I’d love to know about them. :slightly_smiling_face:

2 Likes

I wouldn’t use a translator to generate Japanese text. There are too many “untranslatable” ideas that will only come from Japanese source material. So a word for word translation is often clunky and awkward.

I would probably suggest an LLM since you can be more specific about the style, tone and ask it to break down the grammar. Rather than a literal translation you can ask for several organic ways to say something. This makes it easier to interrogate and verify the answer. LLMs aren’t perfect, they hallucinate and tend to be sycophantic, but it’s probably the best option second to getting a native speaker or a professional translator to do it for you.

Ideally for what you’re doing you have 2 copies of the same book, English and Japanese and compare like for like. Results may vary based on the context of the book, but it’s very valuable to do.

However native Japanese text to is the best way to immerse. Some things (e.g. いらっしゃいませ) are not said in English so English to Japanese translation won’t ever give a sentence using those things.

2 Likes

I was using apple translate for the rough machine translation and I was using an AI voice (I forget what it was called just something random I found online lol) I don’t love LLMs but it sounded a bit more natural to me. Coeiroink sounds interesting I’ll def check it out!

Hmm yeah I considered that it could be just straight up wrong in some areas, but I guess I was curious if the amount it was wrong was enough to warrant the hate or prevent it from being an effective tool. I agree that it kinda feels like mistakes are more emphasized, and I just think I’m trying to understand the impact of these small mistakes and if that’ll make it overall ‘bad’ for a method like this

oooh yeah that makes a lot of sense. After I finished the book I was going to read no longer human and try that with the Japanese audio. I agree that it’s not going to be perfect, but since it was a small portion of immersion I was wondering if there was any value in it at all. I totally understand though!

1 Like

Yeah I think I can already tell it’s a bit clunky lol
It uses あなた WAYYY to much compared to natural japanese and everything is often translated into 丁寧 so I’m sure it’s awkwardly formal.
It’s been hard for me to avoid putting a bunch of random unessisary vocab from these transators into my anki deck, I think mostly because it’s difficult to tell or isolate the weird or ‘cringe’ words and phrases for me

1 Like

You are way better off using native JP content or at least official human translations of content into JP and then trying to parse those sentences.

Why form weak, singular connections to these concepts that are being translated dubiously when you could spend time with native content that is unquestionably natural and which you can determine nuance from slowly and deliberately?

4 Likes

I mean, this is the typical LLM double edged sword, i.e. succinct answer vs gaslighting. Personally, unless I’m working on a specific problem, the juice is not worth the squeeze of having to fact check everything it says. I sometimes I use LLMs to break down sentences into the constituent parts. For instance, LLMs helped me parse out 「チャーシューえぐないっすかこれ!」where です is casually abbreviated into っす. Once I had that, I could suss out and cope with the mostly rhetorical か and the slang usage of えぐい. LLM busted this all out cleanly and was well worth the fact checking. I don’t think any other tool could have gotten this done for me without me already knowing the answer to my question beforehand.

I’m not sure I trust LLMs at the moment beyond a “step 1 of an investigative process” kind of thing. I’d equate LLMs and machine translators to taking a dog out with no leash. They’re one squirrel/hallucination(lie) away from losing their shit. Yeah, they’re generally trustworthy, but you do need to actively keep an eye on them.

Well, first, I wouldn’t lump LLMs and machine translators together. They are different tools with different shortcomings. Secondly, I think here is where I’d advocate for a conservative approach. Making flashcards is a rather different from the image I got from your OP. “I want some quick and dirty audio to listen to that I’m not going to marry” is rather different from “I’m going to build a flashcard that I will use to commit a foundational idea to memory”. I think this is a dubious idea liable to failure (at best) to virtual self sabotage (at worst). I think you should really be using native audio/examples for this. Personally, I’ve made 150-200 cards in the past couple of months, and I think maybe one or two of those has a full ass sentence with synthesized audio. Even then, I did it because I had a specific phrase I wanted to use (a specific memory tied to where I encountered a word in the wild) but I couldn’t find good audio of. Even then there was a lot of time spent verifying pitch accents (Coeiroink gives you control over this, so it’s up to you to make sure it isn’t wrong) and grammar structures with similar examples.

Maybe you might take this as an overly conservative approach, but I’ve read enough r/learnjapanese answers to see confidently wrong or misguided answers - or even wholesale approaches. And in line with “you are not immune to propaganda”, I believe “I am not immune to making bad Japanese”. Thus, I’ll let Japanese people make the sentences I ingest.

But of course, I’m not your mother; do what you want. It’s certainly possible I got crackhead ideas and some other approach is valid. I’m only speaking from my anecdotal experience, and I’m biased towards an aversion of poor data on top of that narrow slice of experience.

2 Likes

No thanks so much for that view!!! It’s not all the time but sometimes I add vocab from those translation (though its a small percentage of what I’m studying) thanks for the really well written and thought out response, it’s given me a lot to think about and I’ll likely be more wary of it in the future!

If you are willing to read the same book in English then Japanese that’s super effective for immersion, at least in the limited times I’ve done it.

I think language learning is a use case for LLMs where 90-95% accurate is good enough for a lot of use-cases, especially for immersion. The mistakes they make are unlikely to be repeat misses, so you’ll have a positive upwards trend.

Unlike lawyering, engineering, healthcare or really any piece of work where a single hard fact being hallucinated is a critical failure, language learning is about aggregating experience. So as long as you get many examples of grammar, vocab etc. the mistakes will iron themselves out.

The main caution I have with LLMs is what prompts do you use? If you’re asking it to review Japanese you’ve wrote, it being sycophantic is problematic. If you’re asking for a translation, ask for an organic translation and then study the differences to see what’s clunky and what’s not. For people who don’t have a tutor or native speaker to help, LLMs are very good support.

Also yeah, I wouldn’t use a machine voice. For studying pronunciation, phonetics, etc. Dogen’s course or other free resources are pretty good. Shadowing too is very useful.
That said a kanji app I had did have a machine voice and I turned out fine I think. So probably not harmful.

Regarding the usage of LLM, I use ChatGPT quite often to translate from Japanese and sometimes to help me finish/polish up sentences in Japanese.

It’s important to give the LLM some context about the task really. If you explicitly ask it to translate things in one way or another, state some rules and give it few examples, it will be better. It’s good to check the instructions on some simple examples to figure out if those rules were correctly applied.

I wouldn’t trust an LLM to go from English → Japanese (seeing how LLMs generally aren’t as good as Japanese as they are English), but I generally find that ChatGPT’s ability to translate and break down Japanese with English explanations is extremely good. If anything, reading Japanese books and having ChatGPT explain the paragraphs has skyrocketed my Japanese.

Also, if you’re ever confused about explanations on Bunpro, you can feed sentences into ChatGPT and have it explain for you. It seems very good at explaining and breaking sentences down, and its accuracy is very high (I haven’t really ever had any major problems with its correctness so far).

1 Like

This is probably the archetypal example of ‘LLM fails miserably at this’. I remember having a good laugh when I figured out I could reliably feed it good sentences said by Japanese speakers and it would continuously rewrite them to sound like they came out of the Genki textbook, or just otherwise butcher it completely and completely pervert the original intent of the speaker, hahaha :rofl:

I mean it does this with English too. It assumes you want to rewrite stuff so it will always offer suggestions or rewrites even on perfectly good writing. I have to forcibly tell it not to correct or suggest edits when I ask it something about a piece of writing, regardless of the language…