Known words vs. unknown words

At this point, I know over 2k vocab words, but I still feel like if I try to read anything, its like 80 percent unknown to known words. When does it start to tip the other way?

3 Likes

At around 5k words I felt more comfortable (still plenty of unknown words and a struggle though). Now my passive vocab is maybe 15-20k and I still have to look things up constantly. For something aimed at children I can quite comfortably read/listen to it although there are still unfamiliar words or phrases but for serious literature or academic stuff I spend more time in the dictionary than not. To have no lookups you realistically need to have a passive vocab of 40-50k although around 30k I believe things become comfortable. The main issue once you have more words under your belt is not really knowing “words” but knowing how they are used, especially in collocations and phrases. Lots of the words you know probably have ways of being used that you don’t know. Equally, plenty of the early vocab words are kind of “grammar” words or just have some “cementing” usage and are not “content” words which provide meaning.

Reading literary fiction or classics requires a massive amount of cultural knowledge and also a very very good sense of collocations and how the language is normally used (so that if an expected turn of phrase is subverted or a cliche is avoided you can actually understand and appreciate that fact) so although it is possible to get the gist of with a dictionary or even without there is still a massive gap between that and seriously understanding it. Consider that natives have to take literature classes and go to university to understand even their own native literature at a serious level. News or academically inclined stuff is actually not too difficult to read provided you know the specialist vocab and/or understand the topic in your native language as the sentences tend to be informative and to the point by design. If your interests lie with anime or whatever more casual mass entertainment style stuff then you can comfortably enjoy that kind of stuff from a pretty early stage although obviously some things are harder than others (5k, as I said, is a reasonable number assuming you are mining from relevant sources). Lots of popular anime/manga/LNs are aimed at children or have simple settings and stories so they are much more approachable and due to their relatively light and disposable nature it is no big deal if you miss details.

I would warn you against just cramming vocab at the expense of getting input though. You need the input but the SRS is just to help you go faster. Reading a lot is the fastest way to grow your vocabulary, not SRS. The main thing is to keep going. Good luck!


(I don’t want to waffle on about vocab sizes and how to count words etc but it is not a simple topic so the numbers I have given here are all based on a consistent way of counting words/vocab size however keep in mind depending how and what you count these numbers could look very different so different people may give different answers)

10 Likes

I hate to say it, 2k isn’t a lot. 4k isn’t a lot. 8k isn’t a lot.

It also depends what you are trying to read though. I have a far easier time reading pokemon Mystery dungeon (except the ungodly amount of katakana) than I do reading something like a kindergarten book.
I’ve taken some online tests and they say my vocab is somewhere between 4000 and 8000. but most of them are just visual and mental recognition, i would be unable to actively use the words in conversations.

Remmber life is a marathon, not a race. keep up the good work.

2 Likes

I don’t have the direct link to these stats, but here’s a rough estimate of what to expect in terms of “known words” to “unknown words”.

Taken from subtitles on Netflix Japan:
80% of words fall within a range of 1600 words
85% of words fall within a range of 2500 words
90% - 5000
95% - 13000

Note that 1600 words make up 80% of the japanese subtitles on Netflix Japan, not necessarily the spoken language itself. But I think it’s a good estimate of what to expect.

I want to go more in depth on what those numbers exactly mean though because it’s important - this might deviate from your question a bit though, but I feel the need to mention this. Despite 80% seeming pretty high, that remaining 20% is really important and will make it seem like you still know almost nothing. Something else to note too is that, even if you know the top 80% of spoken words, that doesn’t mean you understand them 100% and how they’re used in different sentences, how they fit with the grammar, and other things. When it comes to comprehension, here’s a good example of what 80% feels like:

“Bingle for help!” you shout. “This loopity is dying!” You put your fingers on her neck. Nothing. Her flid is not weafling. You take out your joople and bingle 119, the emergency number in Japan. There’s no answer! Then you muchy that you have a new befourn assengle. It’s from your gutring, Evie. She hunwres at Tokyo University. You play the assengle. “…if you get this…” Evie says. “…I can’t vickarn now… the important passit is…” Suddenly, she looks around, dingle. “Oh no, they’re here! Cripett… the frib! Wasple them ON THE FRIB!…” BEEP! the assengle parantles. Then you gratoon something behind you…

You’ll notice that there’s quite a bit missing here. Lots of details are missed. However, it’s not unbearable. Given context and if you had visual clues, you could fill in the gaps on some of those unknown words. However, I’ve noticed that even though I know the 1600 words that fall within that 80% range, I was no where close to being able to decipher Japanese like I am able to the English paragraph above. I’ve been exposed to English my whole life, so I don’t have any issue with the grammar, subtle nuances, etc. so I can focus all my attention on those unknown words and tie them in with the words I already know (very well). This just isn’t the case with Japanese (at least early on). Even if you know 2000 words, you most likely haven’t been exposed to them as much as you have your native language. Same goes for grammar, and sentence structure. These play an important role in comprehension as well. There’s a lot more to it as well. Like CursedKitsune said, cultural knowledge is also very important in certain contexts as well.

A word of encouragement though. Although there are a lot of words you need to learn to eventually reach higher comprehension - the more you read the easier it will get. Once again, like CursedKitsune said, avoid cramming vocab. It won’t help, trust me, I did it up to 15k words (50 words a day for maybe 6-7 months) and it didn’t help as much as I thought it would, and only gave me a nice 2-3 month burn-out. Continue reading - get as much input as you can because that’s going to help you out a lot. If I could, I would have spent a lot more time reading earlier on because I know now how helpful it is.

6 Likes

I agree with above comments. According to my ex-partner, japanese, knowing 3k kanji is good, but when it comes down to vocab you’re going to be finding yourself learning more than 30k just to be comfortable. But to be in college and going beyond it will take more than that. She said about 50k.

I also find a chart https://imgur.com/iqVEfsX Hope this helps.

Think about it like this. There are about 100 words in a paragraph. So you even if you know 99% of all words there will still be about one word per paragraph you don’t know.

Once you hit that level there’s about 1000 words a page or two. So even if you know 99.9% of words you will still have one word every page or so you won’t know.

Even in English as a teacher with a high vocab level, I find usually one or two words a chapter that are either outright new or a novel use of a word I haven’t seen. Putting my percentage at 99.99/9% or so.

It’s intractable. Ambiguity tolerance is the true skill you have to master.

5 Likes