Bunpro's bad SRS algorithm is discouraging

d11 · May 12, 2024, 6:43am

I just finished up another depressing study session with ~55% accuracy. When thinking about how this session has taken so many cards down an SRS level, and how many extra reviews that will generate in the future, I’m overwhelmed with frustration.

I’m about two weeks past the point of finishing N1 grammar, and so my daily Bunpro sessions are all about trying to push grammar points to higher levels and achieve full recall of all the 915 grammar points I’ve learned. But Bunpro’s SRS algorithm is sabotaging this goal, and instead just producing a daily ritual of feeling bad about what I’ve forgotten.

The basic problem is shown in an example like this. Let’s say you have a grammar point at level 9. This means you haven’t seen it in 2 months. You miss it. Now it’s down to level 8.

The next time you see this grammar point will be in 1 month! So you’re probably going to miss it again too. Now it’s down to level 7.

Again, you still won’t see this grammar point for 2 weeks. 2 weeks is quite a while; a lot of other Japanese is being studied in those 2 weeks, during which you haven’t seen the problematic grammar point. So there’s a good chance you’ll miss it again! So down to level 6.

And even level 6 is an 8 day interval, which doesn’t give us a great chance of success. I think you see the problem.

This has been my experience. Grammar points that aren’t completely trivial spend a lot of time bouncing between levels 10 (4 months) and 6 (4 days). Sometimes I’ll get “lucky” twice in a row and push from level 10 to 12, so the grammar point will disappear forever. Even this is a bittersweet victory; I don’t feel confident I truly remember the 410 level 12 grammar points I’ve supposedly “mastered”, since I haven’t been tested on them since whenever it was that I “mastered” them.

Personally, I try to mitigate this problem a little bit by turning on full ghosts. That gives me a 4 + 12 + 24 + 48 = 88 hour ≈ 4 day period in which I’m getting reminded of the problematic grammar point to try to boost it. But there’s plenty of time between those 4 days and the 14 day interval it’ll take for a level 7 grammar point to come around, for it to fall out of my brain again. And if I get “unlucky” too much, I’ll end up back at level 5, seeing the same grammar point while I’m still reviewing its ghost. (Sometimes even the same sentence, for the higher-level grammar points where Bunpro’s sentence inventory is low!)

Let’s take a step back and remember what SRS is meant to accomplish and how it works.

Remember that the goal of spaced repetition is to remind you of something right before you’re likely to forget it, thus reinforcing the memory. And in theory, each time you do this, it’ll reinforce the memory for a little longer.

Bunpro’s design assumes that all of our brains are calibrated to match Bunpro’s SRS intervals: i.e., we forgot things after a bit more than 4 hours, but reminding us then extends the memories for 8 hours, after which they fade in ~24 hours, but reminding us extends the memories for 2 days, etc. Obviously, this is not true for all brains. And it’s worse than that: different grammar points are hard for each brain in different ways. Using the exact same intervals for every brain + grammar point combination is bad!

But it’s even worse than that. Bunpro’s SRS design assumes that we never forgot something: it assumes we can always remind ourselves right before forgetting. But if you’ve forgotten something that’s at level X, actually, your brain is much closer to the initial zero-knowledge state (i.e. level 1) than it is to the current level (Bunpro’s assumption, of just level X - 1). Sure, you’ve gained something from repeated exposure, but you need to build that memory back up before we can be confident that it’s truly back to near level X.

How could Bunpro do better? Fortunately, this is a solved problem.

Remember, our goal is to figure out a SRS system that will prompt the user “right before” they’re about to forget. A good way of operationalizing this is something like: when the user has a 90% chance of remembering the grammar point. So we need to take all the data so far about the user’s performance, both on Bunpro grammar points in general and on this specific grammar point we’re quizzing them on. And we need to predict when they have a 90% chance of remembering it.

It turns out, over the last few years computers have gotten really good at using data to make predictions! And even refining those predictions over time! This field is called machine learning. And some smart people have applied it to spaced repetition systems, and created one that is extremely accurate. It’s what powers the latest version of Anki.

This method also does a good job of finding the right “re-learning” curve, after the user forgets a grammar point. In my experience using this in Anki, it starts out with a small-but-not-tiny interval (e.g. 5 days). If you’re good at that point, it’ll pick increasingly-large intervals. If you forget again, it’ll bump you down to 1 day, building you back up to large intervals more slowly. The key thing is that the intervals it chooses are customized to your overall review performance, and to the specific problematic grammar point. It’s much better than Bunpro’s two options, of “reset to level - 1” or “reset to level - 1 and also add some short-interval ghosts over the next 4 days”.

And this method is generally much less discouraging, since it trains itself to give the user 90% accuracy in every review session.

Finally, as a bonus, SRS systems based on real user performance like this will never count an item as “Mastered” and thus fail to remind you of them ever again. You’ll keep getting reminded of them as necessary forever, and if you forget them (which will happen, statistically, 10% of the time), they’ll come back into the rotation for just long enough to keep them in your brain.

Well, I got that off my chest.

Honestly, I’m not that hopeful that Bunpro will make core changes to their platform to use a more user-friendly SRS algorithm. The simple-to-explain SRS interval system with cute names like “Adept” or “Expert” seems pretty baked into the site. And it’ll meet user’s expectations coming from WaniKani. (I probably will be posting another such rant over there, where I have a similar problem after spending 270 days post level 60 with ~800 items bouncing between levels 4 and 8.)

But if anyone else is in a similar situation to me, where sometimes they finish a Bunpro session and just feel depressed and discouraged about all the things they’ve forgotten, I want to provide a little bit of an explanation. It’s not your fault: you’re not bad at Japanese, or bad at studying. You’re just using a SRS system which is poorly designed and will inevitably fail at helping you remember the harder grammar points, since it doesn’t customize to your review history and it doesn’t have a good re-learning strategy to react to forgetting a higher-level grammar point.

And maybe writing this essay will finally give me the courage to do what I should have done some time ago, and give up on Bunpro and reallocate that daily time toward more reading practice and native content consumption. It’s really hard for me, since I enjoyed using Bunpro as a learning tool so much. I’m intrinsically motivated by pushing things along progress bars, and I think working with specialized SRS web apps like WaniKani and Bunpro is a great low activation energy way of learning Japanese. But if Bunpro is not actually helping me remember the things it taught me, then what’s the point?

Thanks for listening.

stephane · May 12, 2024, 8:08am

Great post I can relate to despite not being using bunpro for long enough to feel it at full extent.

There are discussions on this forum about resets for a reason. To be honest there should not be such concerns whatsoever but it is relevant in the current state of the SRS algorithm in my opinion.
I personally have a constant worry in the back of my head that I am levelling up too fast so I regularly lower manually some grammar points. At least I am glad we have the ability to do so.

I think the ghost system is a great help already but it looses its point at higher levels.
I agree there is an issue on higher levels that should be rethought over, maybe the spaces stretches so much that we don’t have enough levels?
I am not sure.

My advice for now is to use the cram regularly as it makes you study not only more frequently but forces you to work on a wider range than you’d be exposed to by the SRS algorithm.
Also don’t limit yourself at bunpro for grammar, keep reading about all grammar points you have already studied, very fast just so they don’t get buried in the depth of your memory. If you keep doing this it isn’t time costly .

Now I felt your points are all very much valid and I hope Bunpro would be improved on this regard as well.

Gacee · May 12, 2024, 7:05am

I initially came into this post a bit trepidatious, but I think you’ve made some really good points and don’t actually have much to critique from this, except for one thing:

Something missing from the argumentation here is the study that (must) exist outside of the SRS system.

It seems to me that bunpro’s SRS is designed around a few assumptions of the learner, namely that when they fail a grammar point they will take the time to review the point outside of the failed question and that after a certain level of exposure in the SRS the learner should also be making efforts to natively encounter grammar content.

With these assumptions in mind, I believe that in bunpro’s SRS system failing is actually a good thing because it reveals what your brain needs to /process/ more rather than /practice/ more.

I think you’ve made a very lucid recommendation for an alternative, although I guess I potentially disagree with the main conceit that bunpro’s SRS is frustrating.

Jake · May 12, 2024, 7:28am

Thank you for the detailed feedback!

Improving the SRS algorithm is something that has been on my radar for the past year or so. A couple things we are considering/have considered:

Incorporating an improved model like FSRS
Looking at the 100m+ review data points we have and getting a general global difficulty weight for each sentence as well as each grammar point and incorporating that into how calculations are made.
Using the overdue time on a review (how many hours/days past due it is) and calculating that into the next timing/srs level, because it doesn’t really make sense to bump from srs 1 to 2 if you still remember it a week since it became due.

All in all, I definitely agree that despite making a few tweaks since we first started with our SRS intervals, there is still room for a lot of improvement.

m09 · May 12, 2024, 7:53am

I won’t repeat OP’s very good points but I’d like to basically +1000 the post (I already gave one heart but it’s not enough): the SRS is a core part of Bunpro features and it’s extremely lackluster compared to other propositions in this space.

The ghosts are what we call in french a “plaster on a wooden leg”, and especially when reaching higher levels the system completely falls apart. I see Bunpro’s SRS as a first exposition to grammar points with semi-arbitrary repetitions where it could be a proper SRS with some work.

I know the main problem with publishing Anki decks is monetizing, so that’s not an option, but having at least 90% of Anki efficiency would be so much appreciated I considered scrapping the site to create a (private) deck several times. Not to stop paying and keep on using Bunpro, I would have continued to sub, but to save time while doing reviews (I know it’s probably against the ToS, I’m just mentioning that to illustrate the frustration that comes with the SRS).

I’m very happy to see that you’re strongly considering improvements to the system Jake, that’s great! Especially since Bunpro is so good at basically everything else, I feel like having a strong SRS would really cement its position as a top resource.

LagonKa · May 12, 2024, 8:17am

Couldn’t agree more with what’s been said. FSRS is definitely the way to go.

I’ve been thinking about switching back to Anki recently. I’ve started experiencing the SRS biases way to much, especially with vocab.
While grammar hasn’t posed much of an issue at the moment, knowing that more advanced users are experiencing this, I’m sure it’s only a matter of time before I feel the same.

d11 · May 12, 2024, 8:24am

Thanks so much for your prompt and encouraging reply, Jake.

Let me just try to strongly urge you to go straight to FSRS. I can see how things like your second and third bullets might sound appealing. But I think they’re rooted in a bias toward the existing system, and toward simple, easy-to-come-up-with tweaks you can apply to it. The instinct to use per-user or global data is good, but ultimately the algorithm design you apply to that data will not be as good as FSRS.

What’s brilliant about FSRS is that it goes back to the basics and says, instead of trying to design a good algorithm using human intuition, why don’t we just learn the perfect customized algorithm for each (human, flashcard) pair. And the results speak for themselves.

If you do try to design your own algorithm using human intuition instead, then please at least collect data that lets you do an objective comparison to FSRS. Then, when the data very-likely shows that you haven’t managed to beat a state-of-the-art machine learning system at minimizing prediction error, you can revisit that decision.

If I could wave a magic wand, I’d have the global all-Bunpro-users RMS metric on a dashboard at Bunpro HQ, and have the team celebrate all efforts that drive that particular metric down. Because ultimately, that metric is what’s measuring Bunpro’s success at helping us all learn Japanese, which I imagine is the Bunpro team’s mission.

JamesBunpro · May 12, 2024, 8:34am

Do you have a suggestion for how to deal with the fact that Bunpro uses multiple sentences for each grammar point? I have thought about this before and this is the clearest roadblock to just simply switching the systems. I am not an expert on FSRS by any means though.

Unrelated to the question of what is best for the the Bunpro SRS. OP, if I were in your shoes I would slowly reset any leeches to 0 as they come up or simply delete them for the moment and come back to them later when they are easier. You could also study them in more depth to help stop them leeching. I am sure you have thought of these things already though.

Flandre5carlet · May 12, 2024, 8:32am

I’m nowhere near N1 or anything, but I’ve always felt like Bunpro’s interval reductions were a bit odd when getting into Seasoned+ territory.
The only other example I have is Wanikani, but failing a Master/Enlightened item on there bumps you back down all the way to Guru 1 or 2, your SRS interval going from 1 month to 1 week or from 4 months to 2 weeks. It feels a lot more drastic (and useful) a reduction in comparison to Bunpro where failing an Expert 1 item with a review timer of 4 months bumps you down to Seasoned 3 where your next review is still 2 months away. Ghosts alleviate that somewhat, but… only somewhat.

m09 · May 12, 2024, 8:41am

[nitpick on]
I find Jake’s intuition to use global data valid in the sense that in FSRS, different users are not expected to work on the same cards so global data cannot be computed and used. The only valid data usable to compute complexity for FSRS is the series of grading events produced by the user for a given card.

For Bunpro OTOH, such restriction is not necessary since we are all interacting with the same cards, so the global complexity statistics are really interesting and can nicely complement the series of grading events produced by the user.
[nitpick off]

I do agree though that vanilla-FSRS will probably be better than a custom algorithm using global statistics so I second your opinion that going straight to FSRS is best.

LagonKa · May 12, 2024, 9:08am

I don’t perceive this as a limitation at all. On the contrary, FSRS approaches each user with an extraordinarily detailed, personalized, and ‘holistic’ manner. These data are like a learner profile, tailoring intervals to individual needs.

Using Global Complexity Stats would limit things to “what’s hard for most”, completely disregarding other “hidden” variables that FSRS accounts for.

I suppose there’s likely a method to adjust the algorithm to encompass entire grammar points rather than operate on a ‘per-sentence’ basis. That said, the per-sentence approach wouldn’t be that bad of an idea.

m09 · May 12, 2024, 9:12am

I do agree that using only global stats would be way worse than proceeding with vanilla-FSRS, I was not suggesting that. I was suggesting that it’s an option to complement the series of grading events with global statistics, in a way that’s not possible in the general FSRS framework. Then, the model is free to use this additional data or to disregard it completely (e.g. by setting the corresponding parameters to 0) during the training phase.

d11 · May 12, 2024, 9:14am

I’d do the simple thing. If FSRS suggests that I should be tested on ならいざ知らず in 6 days, then I’d show it the next sentence in Bunpro’s sentence-cycle in 6 days. If I miss it and the interval goes down to 1 day, then show me the next sentence. Basically, treat the grammar point as the “flashcard”; use specific sentences only to add variety.

This uses the fact that Bunpro often has a great inventory of sentences per grammar point to avoid the sentence-memorizing “cheating” I sometimes catch myself doing with Anki, while still benefiting from the FSRS algorithm.

d11 · May 12, 2024, 9:22am

I think using global stats as input into a machine learning algorithm is a great idea. I agree that in theory a smart machine learning system should learn to ignore useless or harmful data. Still, the best way to do that so that it’s in-practice an overall enhancement to FSRS, instead of a regression, might be hard.

(Or it might be easy? E.g. just use it to compute an initial value for the “D” parameter in the algorithm, and then let FSRS take over from there?)

How I interpreted Jake’s second bullet point was perhaps overly-pessimistic: I thought it’d be something like taking the global data, using it to calculate some multiplier, and then changing the existing Bunpro SRS system intervals with a per-grammar-point multiplier. That sort of “let’s come up with something using our human intuition” algorithm is what I was warning against in my reply to Jake.

But yeah, re-reading Jake’s second bullet point I think he left both possibilities open. So that’s even more encouraging!

m09 · May 12, 2024, 9:46am

I completely agree that the way to add global stats to the FSRS computation so that they are beneficial is not straightforward (and I like your suggestion as a starting point).

A good idea might be to make review data public (without user-identifying information), to allow the community to contribute ideas backed by experiments.

Rukishou · May 12, 2024, 9:51am

Just here to quickly throw my vote behind prioritizing updating this, since it’s like the number one thing Bunpro (and any other SRS) can offer. Can’t imagine much else should take precedence.

casual · May 12, 2024, 9:56am

Very good points. I also would love BunPro to at some point consider dynamic intervals, taking into account review data you’ve already accumulated over the years.

Now, I think BunPro grammar points are very different in nature from “one review item”. Often one sentence is easy while another one is hard because the context is different. And so I’m not going to pretend I’ve tested specifically Anki’s current algorithm against other options possible here. (And by the way there are platforms that have tested it against their own review data and went with their own algo as a result).

But ultimately the goal is to allow the user to fail questions without regret. For me the regret is either “oh no my review load will explode”, or “oh no I’ll have to see this specific ghost several times”.

On the review load, current BunPro system is not bad at all. While it boosts intervals from 0 quite slowly, it also doesn’t drop them back to 0 on failure. A new algo would need to improve the boosting if it improves the dropping, I think.

Oh the ghosts, I love them and keep them always on, but I’m sorry, some sentences on some grammar points make little sense. It’s not clear why this specific quasi-synonym is ok while others are not, and especially some translations are hurting more than helping by saying something different from the JP sentence. Sometimes I try to challenge translations, or ask for clarifications in the review thread, and 90% of the time it doesn’t help me. So between the choice of fairly failing and taking on a specific ghost that I don’t want to see ever again, or unfairly boosting SRS level to my own future detriment, both options are not great.

das · May 12, 2024, 10:41am

Truth to be told, I never had any strong feelings about Bunpro’s SRS since it seemed to work for me (~90% card retrievability on Anki and Wanikani; no data on Bunpro), and criticism I saw was rarely backed up by any reasoning, but this is a well-written post with data to support the claim! The more I think about it, the less these set intervals you conquer like a staircase seem to make sense. I know that SRS improvements have been discussed for an eternity, but it would be nice if this could be tackled sooner rather than later because it might hamper the progress of thousands of learners. Looking forward to the changes!

lnj_lodr · May 12, 2024, 11:17am

I second this. BunPro’s algorithm works great for studying short term, say for a test like the JLPT (or for a course if you use the textbook route). The lessons are great and everything is explained really well. The fact that the sentences cycle and don’t use just one example per grammar point is really good. The interface is well designed. I switched from Anki due to all these features (while studying for the exam). If you want to use BunPro short term to study for some sort of test, then that is great, there is no system better. In fact, the current system with its ‘staircase’ and ghosts is quite effective for this kind of study. But after that…

The current SRS algorithm is awful to the point of being almost useless for long term retention. I definitely second the addition of an FSRS algorithm. Having used FSRS on two different programs already, it is a game changer for long term retention (which, lets face it, is the real thing you should be concerned about if you are learning a language).

Perhaps you could offer users the choice to choose which works best for them? With that being said I know it would be a lot of work to maintain two algorithms. If you have to go with one it has to be FSRS to make this a viable program for long term users.

Christophegand · May 12, 2024, 2:34pm

That’s a funny topic…

I have no idea of how SRS algorithms work.
The only thing I know is that I walked away from Anki because it was, by far, the most punishing SRS I have ever used. After one month of use I was so afraid of Anki that I almost stopped reading.

Fortunately, i gave up Anki instead. And I switched to jpdb.io
I don’t know how their algorithm work (they don’t explain how), but I consistently hit 70-75% of accuracy and I don’t feel punished.
In Bunpro I don’t feel punished either.

Since then I hate Anki with all my soul and don’t want to go back to it.