Bunpro's bad SRS algorithm is discouraging

Florance · June 12, 2024, 3:12am

I was thinking about the cram feature but haven’t used it yet because wouldn’t that mess up the SRS? Because by then the computer would think that you haven’t seen the grammar point in a month while you actually saw it during your cram session earlier that week. So when you get the review right, it’ll give you an even bigger interval. But at that point you don’t actually know it and would be even more likely to get it wrong later

stephane · June 12, 2024, 6:36am

well you are technically correct if you consider the SRS to be the core of your study but I think it shouldn’t be so it isn’t an issue at all in my opinion.
It is exactly the same as what would happen with exposure and of course you wouldn’t reduce it for the sake of bunpro’s SRS, the more you get the better.

The SRS is not so accurate that it should be taken religiously.
As far as I am concerned, I even adjust manually the SRS levels stepping down regularly if I feel a difficulty with a specific grammar point. I am fine with it my aim is not bunpro fast success but japanese solid learning. I’d just encourage to do so whenever you feel you may need more testing on a grammar point.

Noxsora · June 13, 2024, 2:46am

TLDR: I study by deck because it’s got some filters like cram and adjusts the SRS appropritly

So there are 3 ways to reschedual after cram

Bunpros’ don’t reschedual at all.
This is good if you want straight up extra practice.

jpdb: modified schedual, if you do a card early it is reschedualed to longer of schedualed due date and time elapsed between reviews
If you have a card due in a month, and you do it in a week, it will still be due 3 weeks after your early review

if you do a card late it is reschedualed
if you have a card the that was going to come back the next day, but you did it a week later, it’s reschedualed another week later, because appartly you can remember it for a whole week

(a slightly more complicated answer is “It uses 1.2* actual interval instead of the schedualed interval if you don’t do you reviews on time.” )

OLD anki (I’m still using 2.1.48, I heard in bunpro’s forums 2.3 has a different schedualer)
You can toggle between not reschedualling at all like bunpro or

Longest reschedual: push your revews foward by the same about of time as if you did them on time.
This is bad if you like to cram stuff you just did while it’s still fresh, your doubling the length of time between your reviews each time you cram

Noxsora · June 13, 2024, 3:16am

I agree I shouldn’t stress about SRS as much as I do. Ooops
I’ve been studying Japanese for 11 years now the best part of SRS is it comes back eventually.
Sure coming back to 3000 reviews is scary, but better than coming back to lingo deer with 300 lessons done, not sure which ones I still remember
I’ve never wanted *more * reviews. I understand why you do, to get more practice. I don’t find reviews fun.

This is the SRS readjusting. Probably you crammed a point because you thought you would get it wrong now. getting it wrong now or later is they same on net. You mess up the srs if you cram so regularly that you ‘never’ get a review wrong. SRS is built on the assumtion of 70-95% accuracy if you are outside that SRS can get out of sync.

LagonKa · June 13, 2024, 8:08am

Hi there !

Unrelated to the recent discussion, but I’m curious about your experiments with FSRS and the vocab section. Any updates ? Could we expect any changes in the near future? And what about mixed production/recognition reviews? @Jake

tlock · June 16, 2024, 3:25pm

@Asher @Jake

I appreciate your thoughtful responses to @d11 's urging on moving to FSRS algo, and I understand your responsibility is to the entire community of users, which makes making core changes difficult to do, especially if they are potentially divisive.

Looks like the conversation broke down above when it turned into a bit of back-and-forth when asking for linked research and then whether or not that research was valid for this use case, but if I may, I’d like to bring it back up, and hopefully steer it in a more helpful direction.

To be clear, the only thing in question here is how the timing of the next review is set. The method of input/recall as compared to other decks/platforms, nor the fact that Bunpro uses multiple examples per “point” (card), shouldn’t distract from this. Those points matter, and are what makes Bunpro special, but the time-to-next-review question is the one that is being asked.

So, given the research and proven effectiveness of FSRS in similar contexts (of course there is no published research on different spacing algorithms as applied to specifically Bunpro – that would be rather absurd to expect), I wonder:

what is the fear / expected downside of using that algorithm? (Do you think it will hurt some users, and if so why)

And, given that the current algorithm is so basic (it just gets longer every time and you go back a step if you get it wrong, as far as I can tell):

what evidence/reasoning would there be to expect that this specific approach is more effective than FSRS?

And lastly, assuming it is not a technical limitation, since the platform is quite customizable already, and that Jake said he would have something available for beta testing after only a short time after OP landed on the forum:

Is there any reason not to implement FSRS and make it available behind a settings flag (even if only beta, or not set as default, so that power users can find it)?

Finally, just will interject my opinion into the mix. I think that a modified version of FSRS that does allow for “mastery” after enough correct answers is much better (or a button to manually mark it). I would argue against the OPs statement that “statistically, 10% of the time [you’ll forget]”, as that doesn’t take into account any real-world usage. I’d wager there are zero users on Bunpro who don’t use Japanese at all in daily life, reading, or media, unless they are tire-kickers in the first weeks of trying before giving up. It’s highly doubtful N4-N1 speakers are going to forget 10% of their N5 grammar points that they encounter every time they use Japanese outside of Bunpro.

Neon_Kitsune · June 16, 2024, 4:25pm

While everyone was arguing about SRS types and other stuff, I’m just here, trying to figure out why I need to care about the math behind SRS and such. I just wanted to learn Japanese to watch anime without subs…

(Please don’t take my post seriously)

Jake · June 16, 2024, 9:19pm

@LagonKa Thanks for the follow ups. This wasn’t abandoned, we just have a few things in the pipeline that we were working on getting out before we push the changes we have decided on.

Here is a broad overview:

We looked at global review data for over 100 million reviews.
- For grammar, users have on average 88% accuracy in the lower streaks and for 11/12, in streaks 7-10 the accuracy drops to 81-82%.
- For vocab, it is 92%, dropping to 86-87% in streaks 7-10.
- Overall we feel that is pretty decent accuracy but will implement changes to the default SRS timings to reduce the interval slightly for streaks 7-10.
  - This should help improve accuracy at those intervals.
We will be making adjustments to how Ghost Reviews work.
- Instead of static intervals, they will have dynamic intervals based on the streak the main review is at when they are created. The higher the streak, the longer the Ghost Review intervals will be, but they will always finish before the mid way point of the main review timing.
  - This will help spread out ghost reviews which should reduce day to day review count.
  - This will help give users some practice for that specific grammar point (the sentences will change) closer to their next review (somewhat mimicking what coming across it naturally would be like).
We looked at global accuracy grammar point by grammar point and vocab by vocab to identify grammar and vocab that have low accuracy.
- We will be slowly going through them and making adjustments to how we present them, the sentences we use for reviews, and the hints we provide. That should help improve overall accuracy for those specific pieces of content.
We looked at FSRS.
- We testing it on existing user reviews to see what the intervals would look like for a user switching to it and also tested it on new reviews.
  - For existing reviews due to our custom timings, the data used to build the prediction for FSRS out of the box quickly goes to very, very long intervals.
  - For new reviews, we would need to make a lot of custom changes (10 minute intervals, dropping back to very short intervals when missing a review etc are things we don’t really like about Anki’s approach).
We reaffirmed our philosophy/approach
- At its core, we think of Bunpro as an immersion tool disguised as a SRS tool/grammar guide.
  - Our primary goal and the thing that has the most outsized impact on users’ overall learning is giving users lots of exposure to Japanese in the form of sentences that are level appropriate/build upon previously learned content and fit within the framework we follow (see below).
    - We focus on using a combination of the four most powerful memory tools according to the most up to date neurological studies of how memory works:
      - Repetition - Basic memory based on repeated exposure (basic SRS stuff)
      - Association - Advanced memory based on links between previously learned knowledge, or artificial associations like mnemonics. We do i+1 and have a lot of natural associations in our vocab sentences.
      - Novelty - How new something is. Novelty in learning revolves around experiencing things for the first time in fresh ways. In Bunpro we provide novelty by having new sentences each review.
      - Emotional Response - Both positive and negative are far better than ‘neutral’ in terms of creating long lasting memories. Our example sentences are a very broad mix of happy, sad, funny, weird, mildly offensive, and completely neutral things. While we do sometimes get complaints about a few sentences, there is overwhelming evidence that being provoked emotionally is a good thing when it comes to remembering.
  - In terms of what we “optimize” for:
    - Outside of three situations (simple front back recognition flashcards, timings that are too long, and users who try to speedrun their reviews by not actually reading the sentences), the “optimization” of the review timings to maximize the time between reviews isn’t something that has much of an impact on the success or failure of our core goal (immersion).
    - While it might seem like optimizing the intervals with something like FSRS to try to maximize the time time between reviews is a no brainer, the SRS/timings isn’t what helps you “learn”, it is being forced to read a lot of Japanese and exposure to grammar and vocab in new and varying contexts that is the “secret” to learning.
    - That is why we have put the effort into making the ~100k sentences that are on the site and why we will keep adding to it and implementing new ways to get more immersion within Bunpro.
- We may consider implementing FSRS for straight flashcard style reviews (front/back) for power users who want to use it.
  - It isn’t something that we can’t do, it’s just that while it works well for static reviews (like simple flashcards) where the data never changes and the focus is on maximizing the interval, we try to encourage using dynamic reviews that change (cloze or reading) and focus on maximizing exposure.

This turned out to be a lot longer of an overview than I had planned on writing but I hope it gives everyone a better idea of our thinking, our overall philosophy and the steps we are taking/will take in the future to improve.

Always happy to hear thoughts from you all so please don’t hesitate to let us know what you think.

tlock · June 17, 2024, 2:41am

Thanks @Jake for taking the time for an in-depth update here. I appreciate all the other considerations you brought up, but again, I think the FSRS camp is pretty much just concerned specifically with review timing.

The things from your response that specifically relate to this are:

and

The short version, if I may summarize, is that:

looking at the review timings produced by the FSRS algorithm with real data, the Bunpro team determined they were too long apart generally, and simultaneously too short in the early stages of new reviews. This is based on the team’s experience and intuition of running Bunpro, and hints at a distrust of FSRS for producing the “right” timings.
the goal of Bunpro is not to produce ideal timings for memorization, but instead to provide immersion into written Japanese, with some structure regarding timing.

For me, the interesting take-away here is that FSRS is alleged to produce longer review times, not shorter, which is at odds with the OP’s assertion that the current SRS timings hurt the user who forgets / misses a point in the higher stages of srs.

Regarding the closing point that Bunpro may consider implementing SRS for straight flashcard style reviews:

My 2cents would be that this may fall flat for users who want FSRS since it wouldn’t be implemented into the rich Bunpro-style review system. Users who want that could just use Anki today, no need to wait for a maybe-maybe-not implementation, and users who want FSRS powering the full Bunpro experience, as opposed to front/back flashcards, will not get that in the near term, and likely not in the medium to long term.

Cheers, thanks again for explaining the core team’s vision and plans for improving the Bunpro experience.

akkim2 · June 17, 2024, 3:05am

Perhaps this is oversimplifying things, but it seems to me that, FSRS totally aside, most of us would prefer if getting a Seasoned (or greater) point wrong would simply bump it back to a low Adept. The current system doesn’t bump it back enough, leading to further disappointment and prolonging learning the concepts.

That feels like an easy enough setting to implement without totally overhauling Bunpro’s existing algorithm.

@Jake , would you be opposed to an optional setting that simple?

Jake · June 17, 2024, 3:26am

I don’t think we necessarily have anything against FSRS. My understanding is it is the best algorithm out there for optimizing that time interval to get it as long as possible.

However I do think it to some degree goes against our goals of giving lots of exposure and building a user up to a point where they start to feel comfortable going out and consuming native material.

At the end of the day Bunpro or any SRS for that matter is good for building up a foundation but you won’t ever reach fluency without interacting with a massive volume of Japanese, whether that is reading, listening or speaking.

Ultimately, I think our approach is a bit different from other services.

Rather than try to maximize the length of time a user uses the site (content gating, trapping users into spending years doing reviews, etc), our goals is to get them to the point they no longer need Bunpro as quickly as possible.

If by using Bunpro, a user progresses to the point they feel they no longer need Bunpro, then we consider that a win in our books.

Regarding your points, since our approach is to generally try to provide flexibility and customization and trust that the user knows what works best for them, I don’t in particular see anything wrong with your suggestion of putting the option for FSRS behind a settings/beta flag (for either all vocab or all vocab and grammar?) as long as we include caveats about how it will impact their review timings, ability to earn xp, etc.

Jake · June 17, 2024, 3:28am

Instead of -1 to the streak, something like cut the streak in half for content over a specific srs level?

akkim2 · June 17, 2024, 3:41am

Either the ability to choose between more options than (-1) or implementing something similar to Wanikani’s (WaniKani’s SRS Stages | WaniKani Knowledge) that will do a heavier drop if you’ve clearly forgotten the point. That seems to me to address my (and OP’s!) concerns for the most part.

In my experience and I think many others’, it’s easy-ish to get something up pretty high (Seasoned) when the point is still fresh in your mind, but also to forget before you see the next review as the intervals lengthen. Dropping down “slowly” means that the intervals are still so long that you’ll surely still have forgotten it by the next time you see it, and so on and so forth. An option to enforce a steeper “drop” on an item upon getting it wrong some number of times (1-3?) when it’s at a high enough level would help us really learn points that didn’t quite stick for the long term without them languishing for weeks or months!

This also meets your aim above:

Rather than try to maximize the length of time a user uses the site (content gating, trapping users into spending years doing reviews, etc), our goals is to get them to the point they no longer need Bunpro as quickly as possible.

since this will help us learn points for good rather than having items stick around far longer than they should.

I always wince when WaniKani drops me back to Guru on something, but I have no clue how I’d learn it otherwise!

In contrast, when I forget something in Bunpro, I tend to continue in a state of not having learned it for a long time, prolonging my long-term Bunpro use

Thank you for your consideration!

tlock · June 17, 2024, 3:45am

Jake:

I don’t think we necessarily have anything against FSRS. My understanding is it is the best algorithm out there for optimizing that time interval to get it as long as possible.

However I do think it to some degree goes against our goals of giving lots of exposure and building a user up to a point where they start to feel comfortable going out and consuming native material.

At the end of the day Bunpro or any SRS for that matter is good for building up a foundation but you won’t ever reach fluency without interacting with a massive volume of Japanese, whether that is reading, listening or speaking.

Ultimately, I think our approach is a bit different from other services.

Rather than try to maximize the length of time a user uses the site (content gating, trapping users into spending years doing reviews, etc), our goals is to get them to the point they no longer need Bunpro as quickly as possible.

If by using Bunpro, a user progresses to the point they feel they no longer need Bunpro, then we consider that a win in our books.

I vibe with this answer, a lot, actually.

Cheers.

Jose7822 · June 17, 2024, 4:08am

I agree with this suggestion. Simply adding it would mitigate most of the current issue, and it’s a simpler solution (I think).

rustx · June 17, 2024, 5:51am

I don’t like Wanikanis system actually as I find it also gives too much leeway before you see the card you’re struggling with again. Anki equivalent is the most efficient for me as a lot of the time when I get something wrong on bunpro its not “It was that point instead of this one!” it is a “I forgot that point even exists.” However I never get that feeling with Anki as it will keep showing up until I get it, and also never ‘truly’ be mastered (burned in wanikani) as I’ll see it again in a years time or so.

I don’t want to hijack this thread but for me much more brutal then the SRS lengths here, does anyone else run into this? It happens so much for me that I get a couple wrong just because what its asking for is simply not taught.

Asher · June 17, 2024, 5:58am

I really like this idea a lot, especially if it is a setting that users can have a little bit of extra control over in terms of agressiveness. Like drop back to 3, drop back to 5, drop back to 1 even, etc. One of the things that really doesn’t work with FSRS with the way that Bunpro designs content and reviews is that, specifically because our reviews give a different example sentence with each new SRS level, it slightly defeats the purpose of something like FSRS.

Here is an example of what I mean by defeats the purpose. FSRS is designed to work very well specifically when users very honestly rate themselves on remembering a piece of previously seen information, rather than new information. They are then given options such as

‘Again’, ‘Hard’, ‘Good’, or ‘Easy’.

‘Again’ is the only one that drops the time, while ‘Hard’ generally keeps it around the same. ‘Good’ and ‘Easy’ then have their own separate timers. Implementing this would require a Bunpro user to not only know how FSRS is supposed to work, it would also require them to evaluate for every single sentence the reason why they didn’t get the answer correct. Did they completely forget it, did they have too many better known grammar patterns in their head, did they just make a simple typo?

The user would have to analyze which button to press for each and every sentence, and also think back on other sentences where they have reviewed the same grammar. If we imagine that the student actually knows the grammar, but the sentences themselves have varying levels of contextual difficulty, then the instance where they press ‘Easy, Easy, Again, Hard, Again, Easy’, when each sentence is actually different, might prove nothing more than that they had two easy sentences, followed by one they didn’t get, followed by one they kinda got, followed by one they didn’t get, followed by one they did get.

The SRS is building them a retrievability predictor based on something that cannot be predicted, as they are seeing randomized or progressive sentences, rather than the same thing that they made the mistakes on. Basically we’d have to give every single sentence its own SRS timer, rather than each grammar point, for FSRS to work effectively. All that would actually achieve though is people memorizing the exact sentences they struggle with, rather than exposing them to many different usages of a grammar pattern. It would also result in a huge overabundance of reviews where people press ‘Hard’ just because they want to keep something at a similar interval.

@tlock Another thing that we are taking into consideration and building on improving is what we feel to be one of the primary reasons a lot of users get certain questions wrong. This would be the fact that they are just seeing more common grammar points far more often in their day-to-day lives, so they’re not getting the amount of exposure needed for trickier/rarer grammar points to be at the front of their mind when doing reviews where potentially many different answers would be acceptable. As @Jake mentioned, we’re working on a fix for this that we think will mitigate this problem a lot.

Not sure if these examples provided are actually helpful or not, but hopefully some of the points that we’re concerned with stick out a bit more clearly now. Basically we want to find the best solution without creating a state of analysis paralysis in users where they don’t know what difficulty they should be pressing on every single review.

Flandre5carlet · June 17, 2024, 6:49am

Yeah, I seem to struggle the most with points that are between Adept 3 and Seasoned 3 because when I miss one of them due to clearly not remembering it, the interval before I see it again seems so long still, and thus it takes a long time for an item to get to an SRS stage where I actually remember it again (especially a Seasoned 3 item dropping to Seasoned 2… and thus slowly dropping down to Seasoned 1, then Adept 3…)
Comparatively, I missed an Enlightened item on WK (4 months) and it dropped down to Guru 2 (2 weeks) which is a HUGE drop.

Jake · June 17, 2024, 9:50pm

@Jose7822 @Flandre5carlet I have added it on the list to go out with the SRS timings changes when that gets implemented.

Jose7822 · June 17, 2024, 10:28pm

@Jake Thank you