Language learning and AI

They should learn the quote “All I know is I know nothing.”. I don’t know who said that.

2 Likes

this is because “I don’t know” is rarely present in the learning set. Wikipedia, online discussion platforms, scientific papers and so on, they have an explanatory tone. People usually do not reply “Sorry I have no clue” to a question on Reddit or StackExchange for example (which is part of the training set for pretty much any LLM). If they don’t know, they simply don’t reply.

Hence the overwhelming majority of the training data of an LLM is that to a question, a confident answer follows. That’s what usually happens on the internet. Because the LLM does not truly think but only concatenates tokens by probability to follow the previous tokens (starting with your prompt), you will almost always get an answer that is explanatory and confident in tone. However, the confidence in the reply is not representative of the usefulness of the given information.

7 Likes

Exactly this! I’m a 3D artist and the first thing I had to learn was, ALWAYS use reference images. That is literally an industry standard workflow. The reference either comes from photos other people have taken, from paintings other people have painted, or from 3D art other people have sculpted/modeled.

People that create moves, music and other forms of media didn’t create those things out of thin air, they were all influenced by things that they’ve heard and watched in their lives.

1 Like

At least you are able to tell an explanation was bad, or you’ve read reviews saying that textbook X provides better explanations than textbook Y. With AI, you are alone talking to a bot. If you want to confirm something, you fall back to a textbook or google.

1 Like

I find Claude pretty invaluable for asking questions like “Can you unpack the grammar in this Japanese sentence? [sentence]. I don’t understand the role that [word] is playing.” It usually gives an excellent explanation in response to that kind of thing.

Don’t get into a long conversation with it though, just start a new chat, ask the question, get the answer, and move on. The longer the conversation gets, the more likely it is to start making up random stuff, in my experience.

Also, for complicated or obscure questions (especially about Japanese culture), it helps if you say “answer first in Japanese and then in English.” The Japanese explanation will usually be better than the one it would give in English because it’s predicting what a Japanese speaker would say instead of predicting what an English speaker would say. So if it writes it in Japanese first and then translates it can give higher quality answers.

I’m very torn on AI use. I do sometimes use it both for language learning and my programming job, but in both cases I find that you should absolutely never blindly trust it. So for instance I find that generative AI can be pretty ok to generate code boilerplate if I already know what I expect and I can easily check the output for correctness.

For learning a language however that’s trickier, because by definition you probably don’t have the skill level required to vet the answers and identify incorrect or inaccurate statements. I still use it from time to time when I really struggle to parse a sentence in something I read, it usually does a good job at pure translation and a somewhat worse job at explaining the grammar.

But I would absolutely never trust it with a prompt like “can you explain Japanese い-adjective conjugations to me” or something similarly generic. I know some people use AI like they would a textbook and that seems like a terrible idea to me. Even generating text in Japanese for reading practice for instance seems so odd to me, you have access to a virtually infinite amount of Japanese for basically any level online, why risk having AI generate something broken?

2 Likes

Even generating text in Japanese for reading practice for instance seems so odd to me, you have access to a virtually infinite amount of Japanese for basically any level online, why risk having AI generate something broken?

That happens with any language, even English. LLM are just statistical models, they regurgitate whatever was fed to them. They over use words (“Why Does ChatGPT ‘Delve’ So Much?”: FSU researchers begin to uncover why ChatGPT overuses certain words    - Florida State University News), they don’t know if the output is correct or not (it has no concept of “knowing”, even), and they’re programmed to say whatever we want them to say, which is not to say that they’re programmed to say the right things.
In some models, even if it gives you the right explanation and you wrongly correct them, or pose a question in a certain way, it will “say” that they were wrong and wrongly correct themselves, just because that’s what you were looking for.

1 Like

Yeah as has been pointed out in this thread already, a huge issue with these language models in my experience is their inability to answer “I don’t know” or even ask follow-up questions when they lack information. Instead they just write out a plausible-sounding but sometimes completely wrong answer.

And the more niche the topic the worse it gets. That’s why it looks really impressive when you just ask simple questions as the model will usually have ample enough data to reply something coherent. But as soon as you start getting into trickier topics the results can get really poor really quickly.

I usually just ask it what the nuances are between two words. Like “How is 自分 different from 私 or 僕?”
The answer it gave sounded pretty legit to me.

「TL;DR:
Use 私 when being polite and neutral

Use 僕 if you’re male and in a friendly or soft setting

Use 自分 when you’re emphasizing responsibility, identity, or doing something yourself」

There was obviously more to this, but it all made understanding 自分 easier. It even added in about 俺 when explaining 僕.

At the end of the day, I follow-up with many resources. I like Bunpro, but I also find it to be aggravating at times. Without having an actual 日本人 spending time teaching me Japanese, from an area I might go to, every resource has its pros and cons.

1 Like

If that’s what it said that’s actually a pretty terrible explanation for 自分 IMO. The key for 自分 is that it’s not a first person pronoun, it’s a reflective pronoun, so it can be applied to third persons unlike 私、僕、俺、etc… The closest equivalent in English would be “oneself”. There’s more to it but if I was to give a quick explanation that’s certainly where I would start.

I’ll repeat what I said above and assert that for these very simple questions you’re almost always better off just searching for articles online before rolling dice with AI, for instance:

Amusingly while looking for other articles I found this one which seems to be mostly or entirely AI-generated and of extremely low quality as a result:

3 Likes

While I didnt ask it to define 自分, if that’s what your implying, it did say those things too. Overall, I think it deepened my understanding, and clearly showed me the differences. I actually wasn’t trying to really figure out 自分. I was trying to figure out:

I saw the “I, me, you” at the tail end of the first word, and wondered what made 自分 different.
I admit, I could have put a little more emphasis on that was a snippet of what it said, but if you were trying to convince me that it was worthless, all you did was confirm that I should use multiple resources.

Thank you for the Tofugu link, that is something I don’t utilize very often.

1 Like

Tried the same prompt in Gemini 2.5pro, got a decent result.


1 Like

Just a few years ago we only had to worry about bad search results from content farms, now it’s also LLM content farms…
I wonder if there is/will be an equivalent of ublock for search engines: a shared list of domains I can just forever remove from my search results.

3 Likes

there is and it works fairly well: ublacklist with shared AI content lists
I mainly use it to exclude AI generated images from my google images results
but I also use it to remove stuff like Quora or Pinterest in general from all results

3 Likes

Fascinating. After I got that weird result from old Gemini, I tried the same thing with a different AI and it came up with a decent response, but the answer you got from current Gemini is the best I’ve seen so far. Thanks for sharing!

1 Like

I use it to come up with prompts for me to write short essays, although it’s based off of just speculation I assume ChatGPT is most correct when it comes to formal and complete japanese. I also asked it how confident it was in Japanese, and it said that it was trained on much less Japanese than English but Still a lot of Japanese. Other than essay prompts (and corrections to the silly essays I make from them) I just use it for nuance between words and individual grammer and vocab points that don’t make sense.
Edit: I checked and ChatGPT is as good in Japanese as in English on everythign except very casual/slangy stuff, artsy references and language, and other things like that.

I personally will never use AI for language (or anything), since it’s job first and foremost is to give you an(y) answer to your prompt. If it doesn’t have the right answer it will hallucinate an answer. If you call ‘it’ out on the wrong answer, it’ll “apologize” and give more hallucinations.

Plus, there are literal thousands of Anki decks out there, hundreds of resources. Anything you could look for to study Japanese with has been handcrafted by a human being. Why rely on AI?

8 Likes

It’s a common behaviour among learners: we always search for more resources and the next big thing, and discuss study materials instead of actually studying.

5 Likes

Because it can be tedious to sort through the plethora of anki decks. But you bring up a semi-decent point. The answer is that usually the question an individual has at the time isn’t convienently answered by an anki deck. Also, it is usually a distraction free way of getting an answer. Countless sites have fluff surrounding a simple answer and so many ads, subscription offers, or sidebars. It’s often easier to get an answer and then verify against those other sources. Just like KLC is ordered in a way that allows side by side comparisons of Kanji, I for one like to see quickly what similarities might exist, and then skim through other ways of confirming.
I digress. I realize most of what I like to know could be learned through a Japanese thesaurus. But I still wonder if one exists that could compete. And if it can give the actual insight I can quickly ask and then confirm. In general, I prefer hard-copy items, as I learn more from a book in my hands and writing than anything screen related. Out of the numerous books I do have, none completely satisfy all my needs and questions.
I can’t speak for everyone, but I imagine this is also a basis for why many use a LLM tool. Not necessarily as a shortcut, but as a better way to cultivate their own journey.

If anyone has a suggested Japanese thesaurus book, I would love to know what it is. My searches haven’t been conducive. I can’t say I really gave it a thought before. I believe I heard one of the tests starts asking about word opposites, not sure if it’s JLPT or 漢字検定.
Bunpro does seem to be getting better about including similar words and related words, but it’s far from perfect.

1 Like