Where do the Vocab Frequency Lists Come From?

Apologies if this is addressed somewhere (tried searching, couldn’t find it), but I mean these lists specifically:

image

Like, where does the Bunpro team get it from? I’m also curious what is meant by “General” frequency. We already have default frequency (what Bunpro recommends for ease of learning), dictionary (I guess just pulled from an online dictionary), but where does that leave General?

7 Likes

My concern is that despite setting “Anime” as the priority, I’ve only seen Anime frequencies in the 500-1000 range when I was expecting it to start at 1 and descend from there.

Bumping this thread tho, also curious where the data is pulled from.

1 Like

The word frequency data we use is aggregate data from a list that you can find online. Much like the JLPT data, there isn’t really any “true” frequency list because whatever subset of anime/novels etc is used to make a frequency list will end up biasing the data.

@TangoTangoSIerra The data does start from 1 but in most cases the first couple hundred are going to be particles and super basic common words like いい, する, ある

5 Likes

@Jake
Thank you for your reply!
I understand, but would it really go as low as 1,000+? Or do I have a setting wrong somewhere?