Community Decks (Beta) - Nov 19th 2024

Sean · November 20, 2024, 2:19am

Yes going to need to create a way to add-to-Deck through the individual Vocab pages as well as through Search pretty soon.

@haldo
As for API, nothing planned of yet… CSV support coming soon.

For CSV support, is there anything you want added/would be nice?
Literally in the process of building it right now.

machinaeZER0 · November 20, 2024, 2:43am

This is incredible!

lantana · November 20, 2024, 3:28am

I have created a deck for the textbook 新中級から上級への日本語 Authentic Japanese Progressing from Intermediate to Advanced [New Edition]. Deck available here. Grammar is broken down by chapter. Not all grammar points are on bunpro but I’ve included all I could find.

I’d been studying this textbook for a few months by adding the points individually so I hope this can save others a bit of time.

I had some errors with the title. Using the Japanese name wouldn’t let me edit and would take me to the ‘no decks match page’. And there’s a character limit on titles, so I couldn’t put the full English textbook name in either.

Jake · November 20, 2024, 8:20am

I added that option!

@adorable that is a perfect example of the kind of awesome deck we figured you all would come up with! Well done 🥹

haldo · November 20, 2024, 8:55am

It could be interesting to be able to add the name of the unit/description directly in the csv.
Possibly an option to reimport a csv for an existing deck to replace it/add item.
The complicated part remains to know how to differentiate the vocabulary/grammar in a single file and how to manage the items that are not present in the site resources.
Maybe present an option or a prompt to manage these cases.

Flutter · November 20, 2024, 12:59pm

I just finished a rough (but working) version of a script to automatically extract all vocab from a manga into a csv file to be imported to Bunpro: GitHub - Fluttrr/manga-wordlist-extractor

Can’t wait for the import feature to be released now, hope this script will help other people too!

eefara · November 20, 2024, 2:22pm

Thank you so much! Will definitely have to use this once CSV imports get done. Do you (or anyone else) know if there’s an equivalent for novels?

Flutter · November 20, 2024, 3:48pm

If you have a PDF that you can just copy all the text out of, I could easily make it work with text files too. If they’re images this script might work with them? The models are trained on manga but some of them should work okay on other forms of media too. Otherwise I’m not aware of any other options, but I also haven’t really done any research on anything other than manga (because anything else is covered pretty well by jpdb)

eefara · November 20, 2024, 4:01pm

I might be able to extract the novel file from an epub. Never tried it, but it seems like something that should be doable. Extracting from an epub would be the best solution, I would think (less need to convert to other formats), but if your text extractor is able to accommodate PDFs, all the better.

Flutter · November 20, 2024, 4:00pm

I just mean any format that you can open in some sort of viewer, hit CTRL+A, copy all the text and just put it into a text file. Actually working with PDFs or epubs directly is probably a pain, unless There’s a way to just automatically extract all the text in one go, I’d have to look into that. But implementating a pure text extractor would probably be really easy. (I say probably because the thing I’m using to get vocab from sentences is pretty flawed with verbs, I’ll just have to look into alternatives before the feature actually comes out)

eefara · November 20, 2024, 4:03pm

Whoops, I must’ve misread your original post, haha. Anyway, would definitely be something I’d need to look into, seeing how easy it would be to set it up so the entire novel is captured in ctrl + a.

simias · November 21, 2024, 10:11am

There are a bunch of tools and package you can use to convert PDFs to plain text, either through the command line or python modules. See for instance PDFMiner

There are also python libraries to manipulate epub files but I have no familiarity with them: EbookLib · PyPI

Flutter · November 21, 2024, 10:35am

Thank you a lot, I’ll look into that!

danyramdas · November 21, 2024, 4:24pm

Hi, thank you for doing this. Maybe @Asher or @Jake could contact you to get the missing vocab added.

As I think this will be most beneficial for every one learning Japanese!

This is indeed exactly why it’s great that this exists. So thank you so much at the whole team.

@Jake you requested what you need for important tool? Well I have an idea.

The great bookclub users on Wanikani uses excel sheet for vocab (I think most of them if not all uses the same format) Would be great if this could be added as import template structure.

I would hope that much of the data is preserved. As in useful remarks. So it will be come useful at its best. This would also save as an back up in case deletion of said forums.

If all data is added it can also mean bookclub users will start using this in addition.

If not I will not be the only one making same vocab list on here.

Hence the support for the format Wanikani uses.

DeclanF · November 22, 2024, 2:39am

I looked into seeing if I could get game data into custom decks; however, the issue is more figuring out where spaces are inside Japanese sentences than extracting the text. If anyone has any tools for extracting words from text, or even if Bunpro could use the massive vocabulary data they have to allow importing from plain text, I would appreciate it.

Flutter · November 22, 2024, 8:49am

I found the best way is using MeCab along with some dictionary. You can look at the tokenizer.py in the repository I posted in this thread a few posts ago, if you’re using Python I think you could even just import my package and use the method directly to turn a list of strings to a set of vocab items.

danyramdas · November 22, 2024, 10:36am

I slowly working on this. I use the vetted list at jpdb.io as source.
Currently 4 chapters done.

nemuikamu · November 22, 2024, 8:57pm

this looks amazing!
I’m excited to get decks related to manga/anime/movies to help me study phrases and vocab that’s related specifically to the content

in particular, I’d like to see ones for Solo Leveling and Perfect Days

DeclanF · November 23, 2024, 2:21am

Thanks! I went a little crazy and made a tab-separated CSV file for myself of the vocabulary from yokai watch 1, since I’ve heard that game is good for beginners (and it’s fun).

github.com

Declan-F/Shape-Lagging-Elsewhere/blob/main/finaltext.csv

さ	394
ケータ	31
君	245
で	2131
は	3731
妖怪	837
探し	14
と	1845
参る	11
ます	1888
か	2882
うん	221
今日	199
何処	165
に	4454
行く	837
至る	3
所	230
潜む	15
て	5387

This file has been truncated. show original

AndrewA · November 23, 2024, 3:10am

This is cool! Someone should make Japanese from Zero decks (someone might be me).