Language learning and AI

Noxsora · May 23, 2025, 2:05pm

Found it

Asher · May 23, 2025, 2:22pm

Nope. We have no Japanese sentences on the website created by LLM’s. We do however have some (temporary) English translations of those sentences that were translated by GPT. However even those we have less and less of recently as we have been focusing on eliminating them. They were just placeholders until we were able to manually check the English.

No native content that we use will ever be created by ChatGPT.

rururun · May 23, 2025, 3:03pm

It’s come a very long way. A year or two ago I tried using ChatGPT to correct my journal entries in Japanese and the suggestions it gave me were always awful, but these days it seems to do exactly what I ask. I guess I can’t really judge it since I’m not a native speaker but it seems pretty natural to me. Of course it does still make mistakes, but I feel like so long as you’re not a beginner it’s within a range that you will probably notice.

rururun · May 23, 2025, 3:04pm

Not all textbooks are created equal. Some textbooks are written by native speakers who are licensed teachers and have passed the same test that you’re studying for (as part of getting their license) and were created in order to be used for curriculum at the school where they teach, and then on the other hand you have stuff like George’s books. And books with varying levels of quality in-between.

There’s some amazing textbooks out there, but you need to be careful and research into books before you buy them.

thworthwar · May 23, 2025, 8:01pm

@marziotta - I use it all of the time, and have found it really helpful.

I read NHK Easy news each morning, and will toss the articles into Chatgpt so I can see what it thinks the translation is. It has taken a while to train it to do as literal translations as possible in a format I like, but I think it has been very worth it. After doing this for several months, I am getting good enough at reading the NHK Easy news that I am not sure how much longer I will need to have Chatgpt translate them unless they are strange or on a new topic.

I also love tossing out specific sentences and having chatgpt break it down, especially the grammar principles. It has seemed to always get things right with me, when I cross reference it with my other learning materials.

There are also times when I am reading things and I don’t recognize the kanji, but it has furigana. I can ask chatgp what it is. It also works vice versa, when sometimes there is something in a dialect or is super colloquial chatgp can tell me what it means, and several times none of my other tools could.

amberglade · May 23, 2025, 8:55pm

Mostly AI is helpful, especially when translating phrases that are hard to prase. But when I do a deep dive analysis on something outside the training guardrails, hallucinations start to reign. These images aren’t LLM, but it’s the same sort of idea.

Rukifellth · May 23, 2025, 11:53pm

That sounds like you actually messed up on the input of that question. words of kanji+katakana? that makes it sound like you don’t know what kanji means in this context, so it just assumed you wanted words and you thought that Romaji is ‘katakana’ it’s a case of “garbage in, garbage out”. Though it also appears to be an older model, or worse, a google model.
In my experience, there are very very very very few kanji+katakana okurigana, if any. so it had no choice but to try to compromise because these systems are too dumb to be disagreeable or tell you something doesn’t exist.

If you ask questions with clear boundries that aren’t confusing, it does pretty well.

But as said previously, use it as a suppliment, not a replacement.

BreadmanNin · May 24, 2025, 7:27am

It’s been a while since I’ve had somebody phrase things in such an insecure way to me. Don’t be so on edge, we’re all here to learn Japanese and have a good time doing it.

Indeed, there aren’t many Kanji+katakana compound words, that’s why I was hoping the AI could list some. I don’t see how my question didn’t have clear boundaries though, and it’s certainly not true that it should have told me kanji+katakana words don’t exist. ビー玉、シャボン玉、メイド喫茶、サラ金、サラ金地獄、蛮カラ、コピー機, to give a few examples. All these words are on Bunpro too, have a look.

thelizard · May 24, 2025, 7:43am

thanks for the clarification, I really appreciate the amount of work you are putting into this app.

Rukifellth · May 24, 2025, 7:50am

Insecure? Could you please investigate what that word means, instead of trying to insult someone? Those words there that you are using are paired words or slang. It’s like saying “ice cream” is a single word. Japanese also love to combine words to make them shorter, making it appear to be a single word when it’s a mashup of two words. While yeah they often are together, it’s not a true word by itself.
As for the AI prompt “garbage in, garbage out” is a real thing. I’m not sure of your age, but remember old Google? Knowing How to look something up was more important than the question itself. AI is just an advanced internet search at the end of the day. I’m sorry you misinterpreted me as trying to insult you, but at least try to learn what you’re talking about before saying such things.

thelizard · May 24, 2025, 8:56am

this is a fundamental misinterpretation of what an LLM is, it’s a text generator. The answer is simply a generated sequence of words based on what the most probable word is after the previous one. They are trained on a very simple task: Predict the next token (word or subword) given a squence of previous tokens.

Given: The cat sat on the ___
Learn to output: mat

this is called “casual language modeling”.

Doing this on a massive corpora of text (books, websites, code, more) exposes the LLM to a lot of linguistic variety. So much that it might give us the illusion that it is reasoning. But in the end it’s just a text generator. It will therefore hallucinate on the simplest topics with no guarantee whatsoever that what it is outputting makes the slightest sense. And trusting what it says comes with other dangers as per this study made by Microsoft.

Further reads:

Here’s a very simple example on how the chosen words (the token sequence) determine the response without any regard to what it actually outputted.

Asher · May 24, 2025, 8:58am

@BreadmanNin @Rukifellth Let’s both not get off track here please. I doubt anyone meant anything negatively .

marziotta · May 24, 2025, 10:22am

About the con cafe. The llm have a tendency to never say they do not know something, so they make up stuff. From an European point of view, I see a bit of the extreme confidence I often see in some Americans, that are always right. I do not think all Americans do that, but this is a trait that is surely promoted in many cases.

Mathias_Tichota · May 24, 2025, 11:25am

They should learn the quote “All I know is I know nothing.”. I don’t know who said that.

thelizard · May 24, 2025, 11:45am

this is because “I don’t know” is rarely present in the learning set. Wikipedia, online discussion platforms, scientific papers and so on, they have an explanatory tone. People usually do not reply “Sorry I have no clue” to a question on Reddit or StackExchange for example (which is part of the training set for pretty much any LLM). If they don’t know, they simply don’t reply.

Hence the overwhelming majority of the training data of an LLM is that to a question, a confident answer follows. That’s what usually happens on the internet. Because the LLM does not truly think but only concatenates tokens by probability to follow the previous tokens (starting with your prompt), you will almost always get an answer that is explanatory and confident in tone. However, the confidence in the reply is not representative of the usefulness of the given information.

Liras348 · May 24, 2025, 12:12pm

Exactly this! I’m a 3D artist and the first thing I had to learn was, ALWAYS use reference images. That is literally an industry standard workflow. The reference either comes from photos other people have taken, from paintings other people have painted, or from 3D art other people have sculpted/modeled.

People that create moves, music and other forms of media didn’t create those things out of thin air, they were all influenced by things that they’ve heard and watched in their lives.

teclasgmxbunpro · May 24, 2025, 12:51pm

At least you are able to tell an explanation was bad, or you’ve read reviews saying that textbook X provides better explanations than textbook Y. With AI, you are alone talking to a bot. If you want to confirm something, you fall back to a textbook or google.

ndv · May 24, 2025, 12:56pm

I find Claude pretty invaluable for asking questions like “Can you unpack the grammar in this Japanese sentence? [sentence]. I don’t understand the role that [word] is playing.” It usually gives an excellent explanation in response to that kind of thing.

Don’t get into a long conversation with it though, just start a new chat, ask the question, get the answer, and move on. The longer the conversation gets, the more likely it is to start making up random stuff, in my experience.

ndv · May 24, 2025, 1:17pm

Also, for complicated or obscure questions (especially about Japanese culture), it helps if you say “answer first in Japanese and then in English.” The Japanese explanation will usually be better than the one it would give in English because it’s predicting what a Japanese speaker would say instead of predicting what an English speaker would say. So if it writes it in Japanese first and then translates it can give higher quality answers.

simias · May 24, 2025, 5:54pm

I’m very torn on AI use. I do sometimes use it both for language learning and my programming job, but in both cases I find that you should absolutely never blindly trust it. So for instance I find that generative AI can be pretty ok to generate code boilerplate if I already know what I expect and I can easily check the output for correctness.

For learning a language however that’s trickier, because by definition you probably don’t have the skill level required to vet the answers and identify incorrect or inaccurate statements. I still use it from time to time when I really struggle to parse a sentence in something I read, it usually does a good job at pure translation and a somewhat worse job at explaining the grammar.

But I would absolutely never trust it with a prompt like “can you explain Japanese い-adjective conjugations to me” or something similarly generic. I know some people use AI like they would a textbook and that seems like a terrible idea to me. Even generating text in Japanese for reading practice for instance seems so odd to me, you have access to a virtually infinite amount of Japanese for basically any level online, why risk having AI generate something broken?