Hi everyone!
Hope you’re all enjoying your week and getting excited to start some book club action together!
This is a quick post about the Bunpro vocab decks for novels/manga etc, and how the evolution of this system is taking place on our back end. It will also include a bit of information about what you can expect in the near future.
The Bunpro Text Wrapper
Some of you may remember that we started ‘wrapping’ grammar points within standard grammar sentences about 2 years ago. This was a new feature at the time which meant that you could hover over a specific part of any sentence that you didn’t fully understand, click on it, and if the part you clicked on was one of our grammar structures, a little popup would tell you what the grammar was.
Believe it or not, when we first started doing this, it was just @Fuga and myself sitting in Google docs all day staring at walls of Japanese text and inserting html tags manually for each and every grammar point. Not the most effective system in the world, and certainly prone to a few… human errors.
Despite the rocky start, we knew that the capability that grammar wrapping provided was something that would make the life of any student far easier, and that we should explore the concept further. Since last year we have been slowly developing a tool that we will use behind the scenes that will do this work for us.
We have adapted part of this tool to provide vocab decks for any and all media going forward.
Beta Testing (kind of)
This tool is still in a v1 alpha phase and we are working to tweak the output when applied to this specific use case, but the best way for us to do so is to put it to work and adjust it as it parses more content.
To start with, we will only be providing decks that cover the first chapter of books, then the second chapter will be added to the decks the next week, etc etc. This is so that we can do a lot of quality checking on relatively small amounts of text each time, and allow us to potentially fix things that will make each and every batch of new words contain less errors.
As many of these ‘errors’ are moreso things that are far easier to spot by people than by our tool, we would like to actively request those that are using the decks to give us feedback. Here are some examples of things that our tool cannot catch (just yet), but are easy to catch for people.
-
Words that can be said in different ways, but use the same kanji. For example - 避ける with the reading of さける, and 避ける with the reading of よける. Unless there is furigana, the only way to know which one is being used is through context. If you think that one of the words in a deck has the wrong reading, please let us know!
-
In the reverse of the situation above, sometimes words in books are written as hiragana only, when there are several options for that word. An example here is おさめる. Does the book mean 修める, 収める, 治める, or 納める? Again, if you think our tool chose the wrong one, please let us know!
-
Set phrases being split (if we don’t have a record of the phrase already). There are many times that words are joined in Japanese as part of a set phrase, and are better off learned that way, rather than as individual words. This can be verbs that use other verbs as auxiliaries like 持ち上げる ‘to lift up’, or just things that are set patterns like 目を凝らす ‘to stare intently’.
In either case, due to them being better learned as a pattern, rather than as 持つ followed by 上げる in a deck, we would love users to point out any of these that you may notice are not joined correctly. Thankfully we already have many collocations and phrases in our vocab database, so this type of error will become far less common very quickly as we keep adding new words and phrases.
Although these are the most obvious ones. There are bound to be some other oddities that show up here and there.
Lastly we can only link to words we already have in our database (about 30k at present). So words that show up in books but that we don’t have yet will be added as we go. If you see a word in the book, but not in the deck, please let us know so we can add it and update the deck.
Inconsistency of Example Sentences
One of the best things about the vocabulary database here on Bunpro is all of the natively written example sentences that are designed to give you lots of usage context about new words, at the same time as using a huge variety of the grammar that you will have learned on the site.
Despite this, as our book club decks will be created based on appearance in the book, rather than the actual level of the words, you are likely to encounter many words that do not have example sentences yet. In these cases, please just keep this in mind. If the vocab item is in a deck, you can rest assured that it is already in our queue to have example sentences written for it. It just might take a while depending on where the word is in the list that we follow.
Quality example sentences for all vocabulary that make active use of grammar that students are currently learning is a huge priority for us, so we will continue to get these out as quickly as we can.
Wrapping things up
We can’t wait to get this tool up to a level of accuracy that will enable us to instantly create decks from almost any text file, whether that be subtitle scripts, novels, light novels, you name it. We estimate that with each new book we parse, the amount of new words and errors will decrease, so it is only a matter of time until it is at the high standard that we appreciate our users holding us to.
Book club vocab decks will come first, but it is highly likely that we will start adding decks for more and more books in the background as time goes on, half for the purpose of training our tool as quickly and efficiently as possible, and half for the purpose or wanting to make a wide array of popular titles available to our users in a timely manner.
Thanks again to everybody participating in the book clubs, and we hope that through working together, this will be yet another addition to our catalog of features that will help you achieve mastery of Japanese in record time!
Have a rockin day!
User requests/development plans (Ongoing Edit)
A list of things that users have requested to be able to do with our book decks, as well as with the wrapping tool in general.
- Be able to see a list of possibilities when a word in the deck is ambiguous due to its appearance in the book. (kana only when there is multiple kanji, or kanji only when there is multiple kana).
- Link vocab entries that are also grammar points together so that progress is saved for people that have already studied the grammar.
- Use the parser to make custom decks from user’s own content that they upload.
- Be able to filter decks by JLPT level, etc.
- Be able to track progress across books and display how many unknown words are in unread books.
- Make suggestions for new books to read/decks to study based on previously studied book categories.