Learning science from BunPro data?

BunPro has, I imagine, a not insignificant amount of cloud-based metadata that concerns Japanese language acquisition. Have y’all published language-learning data anywhere? e.g. “average number of reviews for acquisition,” “how successful learners organize their reviews,” etc.?

6 Likes

We don’t have any published learning data. We do however have ~150 million review data points that we periodically crunch to find data about accuracy and other things.

I’m not really sure what publishing data would look like or what types of data people would be interested in seeing.

3 Likes

I can’t speak for what kind of data people would be interested in, but it might be fun to release some statistics each year around the same time the Year In Review information is released? I’d think of it almost like a Bunpro-wide “Wrapped”.

Something I would personally be interested in seeing is the number of new items learned vs accuracy (and maybe across JLPT level?). I feel like that could be a data point users could use to help calibrate how many new items they’re adding in a day.

Of course, learning is a highly individual journey so you may not want people comparing themselves to general averages, but could be helpful especially for people starting out since there are often posts in the forum from new people asking how many new items they should be adding!

7 Likes

Yeah, perhaps some unsupervised (or not) clustering or topic modeling, and then saying in which group you are, like: You’re on the group of users that take it slow but have little to no mistakes (and let the marketing team put a funny name on them), or you’re on the group of users that take it very fast, but watch for those ghosts.

A quick tf-idf can also tell you which grammar points are more difficult or easier for you than for the average person. Perhaps some N2 grammar is relatively easy for you, while you tend to forget より or something like that.

And the following question is more relevant to the bunpro team and user retention: taking time into the equation, when do people take pauses? Do accounts with a few months start strong, and then burn out? Or do they start slow and gain momentum? Which of these two keeps using bunpro after a year?

5 Likes

My research team has published both academic papers analyzing errors people make in our learning environment, and an anonymized corpus of learning data for other researchers to use.

Internally we use the data to improve the learning materials, of course.

1 Like

I’m extremely curious about this too. While there are plenty of anecdotes about what keeps people studying and progressing, I’d really like to see some authoritative data at some point. For example, is doing reviews at least twice a day associated with getting to higher levels?

2 Likes

I can imagine a research.bunpro.jp xD

1 Like

This is my curiosity. Plus, the way I use the SSRS for vocabulary is … maybe not unique, but I don’t know if it’s optimal: once I hit 6-10 missed words in a review session, that’s when I’ll pull off the main review branch and focus on just those words. I’ll cycle through these until I can remember all of them, and then “complete” them to proceed. If I try this with too many new words at once, I can’t keep them all in my head at once, which defeats the purpose of the exercise: to guarantee that I’ve had to recall the word at least once, or N number of times. I don’t know if this is an optimal strategy or if other folks have similar strategies for words they’re reviewing but still have problems with, if this affects the number of reviews required to acquire a word, etc.