Thank you for removing the limit, it is much easier to import now.
I made a zip with all the words of animes present on jpdb and converted in both formats supported, I think that could be useful.
We’re going to need a way to skip/blacklist words.
With scrapers and importing csv there’s going to be a lot of words included that are either wrong (wrong word pulled), just filler words and various laughs or exclamations, or words users just don’t care to learn through vocab decks (looking at you particles). The deck creator can delete at will to their liking, but other users are currently forced to deal with them by either adding to master or having a forever incomplete deck progress.
A skip for later feature would be nice in general, but I’m thinking also a blacklist where you can add words you don’t want thrown at you ever again.
I’m trying to curate my decks with every unit imported, but goodness I never knew there was so many different ways to add laughter to vocab
A giant import of ~18k words worked until I tried to save them. After around 25%, I started getting this error logged in the console dozens of times a second and nothing was happening:
EDIT: Same thing happened again on the second try. Here’s some more stuff about it:
This is the network tab around the time the error started happening:
The fetch errors look a bit different this time:
This is the Response Header of one of those POST requests that keep failing:
Not a PITA at all It was late, and what I wrote wasn’t clear. Here goes (with pics !)
If I first create an empty deck, and then manually create units
When I use the import tool to add words into already created units, I end up with the original unit and overflows.
I can edit the overflow units, but not the main one (here 神家没落), because there is no edit button for it.
After I’ve changed the titles, it ends up looking like this
This is with all three original units manually created and with vocab imported afterwards. Three bugs :
-
I can’t edit the title of the main units (so I have a lot of “Pt 2” units, but no “Pt 1”
-
All the overflow units are at the end, and I have to manually re-order them
-
Overflow units are created even though the preceding unit isn’t full : I guess the parser first checks that there are, let’s say 755 words in the import, creates two units of 500 and 255 items, and only then checks for duplicates in the whole decks, deleting any, so I end up with two units of, for example, 233 and 211 items
If I now create the same, empty, main deck
and upload all the vocabulary at once, creating units via the importer
I can change the title of any units (because now every unit has an “edit” button), overflow units are in order, and there are no “main” units with less than 500 items (bug #3 has gone away)
and my whole deck ends up looking like this :
(much neater !)
TL;DR :
-
if you first manually create the units, and then import, you can’t change the main unit title right after importing because there is no “edit” button
(EDIT : I’ve found out that you can edit it, but only afterwards if you return to the main deck screen
) -
if I import unit by unit, I guess the parser first checks the size of the import, and only after checks for duplicates, and the final number of items in each unit can get weird
Of course, it’s better if I import all the vocabulary at once, but it might not be always easy/possible. If the 500-item limit for units goes away, I guess all those bugs go away too…
Or a way to suspend/clear words by Bunpro tags, at least ? For example, freezing all words tagged N3 and below, or onomatopoeia, or whatever ?
I think if the Unit limit of 500 should persist, there should at least be a solution that automatically gives the auto-generated units useful names. It’s quite annoying to have to go through a dozen units and manually rename each one to “Part 1”, “Part 2”, etc. when this could easily be done during the import. If the unit already has a title and is being split up, I think the “Part 1” could be added in parentheses after the unit name. Although all in all, this is really just a quality of life thing that would bother me if I’m trying to create a bunch of decks in a row (mostly for movies).
I have been working on a MGS deck and I try to check every word to try to make sure there isn’t anything weird around. One thing I noticed that was a little annoying, its how sometimes the tool grabs words from E1 or A lists instead of the JLPT list first.
For example, for something like 服, instead of showing up this one 服 (JLPT N5) | Bunpro it was showing this one first 服 | Bunpro. It happened a lot but the ones I made notes of were words like:
汗: showing this 汗 | Bunpro instead of 汗 (JLPT N3) | Bunpro first.
現代: 現代 | Bunpro instead of 現代 (JLPT N3) | Bunpro.
スイッチ: スイッチ | Bunpro instead of スイッチ (JLPT N5) | Bunpro.
I think it should give priority to JLPT words before anything else, and maybe the “A” ones after it, and the “E” ones last.
I used the japanese text import tool, so I don’t know if this happens with the JMDict one too.
Wouldn’t it be better to prompt the user to pick which one they want? I could easily see 現代 | Bunpro or スイッチ | Bunpro being used in a book/piece of media instead of the more common definition.
@veritas_nz thank you for the heads up! Been a bit busy this weekend, but I’ll give the importer another shot! o7
That could be good too, I guess it depends on the media you are consuming. But a prompt for every one and having to check and pick each manually would be a gigantic task I think JLPT should be default, but I guess having that option too wouldn’t hurt!
You are 100% correct. Perhaps there could be a setting per user? Default to one word list or the other, or ask for every word in doubt? That way users who are aiming for a particular word list as a source don’t need to manually confirm everything.
@veritas_nz Did another test upload of 1,994 items; it went very smoothly this time!
Feedback:
- I’m seconding the issue @Magyarapointe is seeing with un-editable pre-created Unit names and Units not having the max of 500 items before auto-generating new ones.
- I’m quite happy to see the importer auto-correcting katakana → kanji words (ex: ロンドン to 倫敦, ジブラルタル to 日巴拉太
why is this word even in Bunpro’s vocab list, I’m impressed) and some hiragana → kanji (ex: あらゆる to 有らゆる, ほど to 程), but I notice it still doesn’t seem to like auto-converting katakana to a hiragana equivalent (ex: マジ → まじ, ダラダラ → だらだら), and doesn’t seem to be able to catch converting a common word written in katakana (when you might expect hiragana) into the kanji that Bunpro teaches (ex: カモメ in the book not converted to 鴎). - (Not really feedback, but just wanted to mention that I’m sorry if all my nitpicking about converting vocab written in stylistic ways is way off base and a huge pain with the vocab that Bunpro is referencing against; I haven’t actually used the Bunpro vocab feature extensively, so I’m not actually super familiar with its setup. )
- I was searching to replace ハッキリ with はっきり, but the latter is not selectable, despite showing up in search:
I think this one at least is a case of はっきり already existing somewhere in the word list as well as ハッキリ; I just did a ctrl + f on the document I imported and they both show up. Could some sort of feedback be implemented to let the user know this is the reason why Bunpro won’t let me select that option? - I like the new confirm on delete dialog box added for individual items, but could one be added for the Delete All button as well?
- Would it be possible to have a button to re-adjust Units? So what I’m imagining is: let’s say Unit 1.1 has 500 items and Unit 1.2 has 500 items and Unit 2.1 has 500 items. After cleaning up the auto-import, Unit 1.1 now has 450 items. Would it be possible to have some selection boxes and a button to allow me to shuffle the top 50 items from Unit 1.2 into Unit 1.1, retaining whatever word order I had before and leaving 2.1 alone? This is more of a nice-to-have than anything else, but thought I’d toss it in the pile.
- Related not important nice-to-have: Same scenario as above, but the user wants to retain a 480 item maximum for some arbitrary reason in Unit 1.1, so instead of filling all 50 missing item slots, the Unit reorganizer lets them type in to add only 30 items from Unit 1.2 into 1.1.
- Back to actual issues I’d like to bring up: I saved my test list, went back to the deck edit screen, then realized that 風 should have been parsed as かぜ and not ふう. Once I go back to the import screen, though, I’m no longer able to re-search an alternative for 風 like I was able to before saving. Would it be possible to allow us to do so? Or is the only alternative to delete the wrong word and add the correct one? I’m potentially concerned about large word lists that you want to keep in a specific order. 風 just happened to be at the top here, but what if it’s #228 out of 500 words? I suppose I could add the new word, drag it up next to the incorrect one, then delete the incorrect one… But wanted to ask about the viability of just modifying the incorrect one directly anyway.
- Potential issue for smaller word lists: when importing small amounts of words, I can repeatedly click on the “Import Successful!” button and generate identical Units over and over, and the dialog box doesn’t leave until I stop:
(All the “New” Units there are me mashing the button.) I don’t know if you can do this for larger (500+) word lists or not, and I don’t want to try and accidentally lock up my computer, haha.
I think other than comments on Bunpro not catching vocab to auto-convert, I’m pretty happy with the 1.0 version of this. There’s a fair amount of manual work needed from the user for large word lists, but that’s inevitable.
Edit: Unrelated to importing, but just noticed: on the deck edit screen, I just noticed that the X button for editing a Unit doesn’t seem to work. Clicking it doesn’t close the dialog box, but clicking outside the dialog box does:
First of all sorry for all the lost time
Not sure what this could be…
A few questions to begin:
- Roughly what time/date did you start this save process?
- Were you doing the episode uploads in batches? Or all 18k at the same time?
- How many items/Units were you trying to save at once?
- You didn’t manage to check the duration of each request? Were the all sub-30 seconds?
- Did you somehow go above the 500 items-per-Unit limit? Or were they all sub-500 items?
I think that’s all the questions I can think of.
Sorry for the laundry list!
If you could also please DM me the text that you uploaded, I’ll try myself on my end. 🙇♂️
Thanks for the detailed analysis!
- I’ve found out that you can edit it, but only afterwards if you return to the main deck screen
Yes this is one of the quirks of the Importer.
The Edit Deck and Import steps are separate (one exists on the old system, the importer on the new).
The more this stuff keeps coming up though, it might just be worth merging them into one big Edit/Import page now rather than later.
As it is now, it’s just confusing.
- if I import unit by unit, I guess the parser first checks the size of the import, and only after checks for duplicates, and the final number of items in each unit can get weird
Looking at your screenshots, it looks like You had 4 Overflow Units, and then after you edited the titles, it increased to quite a few more?
I’m guessing the sub-500 count is because you are removing items that weren’t found?
@veritas_nz
No, there are more overflow units because I took a screenshot after importing vocabulary for other units too. That’s all, and that was to also show that the importer, if you do it by unit, will put all overflow units at the end of the list, instead of having them after the unit they overflowed from.
And no, I didn’t remove anything, and there were, at most, 1 or 2 missing item (the JMDICT id importer is very effective !) It’s the importer removing duplicates from decks only after creating overflow units… Compare this screenshot after importing without manually creating units, letting the importer taking care of it : the only units with less than 500 items are the final overflow units for each main unit. And I used the vocabulary list, but fed it differently to the importer (unit by unit in the first case, as a whole in the second). I did absolutely no editing, it’s all the importer’s doing…
When there are multiple possible results, I can maybe mark them as orange (instead of red/green).
That way it can be visually scanned easier, and doesn’t block the save-step.
- I started around the time I made my post, I’d say roughly 30 minutes before it
- I did all 18k words at the same time
- I was saving all of them at once
- I didn’t check the time each request was taking sadly, but all the batches seemed quite a bit below 30 seconds
- None of the units were above 500 words
Ah, that would be great!
That sounds good!
How do I do to replace a word in a unit? Without having to deleting it and adding it again? I think that would change the position of the word too.