Community Decks (Beta) - Nov 19th 2024

Of course, just provide the folder with the epubs and use the --separate (and --id for bunpro specifically) option. This works for all file types, it will create sections for each file (or sub-folder in the case of manga)

2 Likes

I couldn’t get it to work, but it’s not your program, it’s something that Calibre does when splitting (everything works fine with an un-splitted epub). I ended up reconverting to .txt, making a folder of those, and then it works

2 Likes

If you could send me those epub files that don’t work maybe I could try to troubleshoot it but if it really is Calibre then that’s a shame. Maybe there are other ways to do it?

2 Likes

Sure, I’ll send you the zipped folder with the split epubs inside. I tried running your program on the (unzipped) folder with --separate, and later tried running it on each file, I always ran into the same error :

/ebooklib/epub.py", line 358, in get_body_content
if len(html_root.find(‘body’)) != 0:
TypeError: object of type ‘NoneType’ has no len()

But running on a folder with text files with flags --separate --id --type txt worked flawlessly.

And as you suggested, I asked ChatGPT… which suggested a patch to epub.py, which caused another error to appear… at which point I gave up.

But I really think it has somehow to do with manipulating the epub file in Calibre, because when I tried with another epub that I had split, then merged, then reconverted, it gave another, different error. I guess epub editing is just above my paygrade…

Just send me an email and I’ll send you the file (it’s a burner account, and since the novel is not public domain, I don’t want to send the whole text with a public link here…)

2 Likes

So I finally uploaded the whole vocab list for the novel I’m reading, by chapter, and I ran into a couple of strange things :

  • The first import I did was over 500 items (I hadn’t checked before) : it filled the Unit I had created, and spilled over a new one, but the first unit was filled with 501 items, which mean I couldn’t save it. I deleted on item, and only then could I save. The extra item had been added, there wasn’t a problem with parsing/recognizing it
  • After that, I did several other imports over 500 items. This behaviour (first unit filled with 501 items) didn’t happen again, but Bunpro created extra units apparently before checking for duplicates : that is, I ended up with, for exemple, 3 units (the one I had created, and two “overflow” units), but all were much below 500 items (One import ended with two units of 335 and 319 items, one ended with three units of 237, 237 and 62 items, the last one ended with two items of 130 and 97 items)
3 Likes

Should be fixed now if you update to v1.2.1! I just made it skip the messed up documents, from looking into it those should contain no text anyway.

EDIT: I made a thread to properly announce the program, any further discussion about it should definitely be moved to that thread: Vocabulary Extractor: Make your own Decks from Manga/Anime/EBooks

4 Likes

Yes ! it works great ! Thank you once more !!!

1 Like

When you import decks, all missing vocab is thrown to the bottom of the deck, which is great for fixing them all in one place. but after you fix them and replace them with what youre looking for they stay at the bottom of the deck. which messes with the chronological order as they appear in the book. trying to reorder them where they should go takes too much time, to both find the location it should have been in and draging it up through 500 items

is there a way to maintain its original location in the import after the missing vocab error is fixed?

4 Likes

Would it be an idea to have an optional field for a custom study sentence in the import process? It would allow the user to preserve the context of where the word came from.

It might be awkward to fit it in conceptually between shareable custom decks and personal custom sentences, but I thought it’d be an interesting feature.

5 Likes

Are we able to add grammar items to custom decks or just vocab?

2 Likes

You can add Grammar, just not through the importer.

2 Likes

Ah that makes sense, thanks (⁠.⁠ ⁠❛⁠ ⁠ᴗ⁠ ⁠❛⁠.⁠)

2 Likes

Any idea when maintence will be done on importing decks? I cooked up some Ace Attorney csvs I cant wait to throw in the deck pool

I did get half a chapter of harry potter done before the maintence, which had to be broken into parts because of the 500 unit limit. I can see this number being a lil low for book chapters or anime episodes. Dont suppose increasing the number of vocab allowed in units is part of the maintenence :eyes:

3 Likes

Probably back on the live site without any changes tomorrow.
We thought it was the thing causing the server issues we were having, but it wasn’t.

Was working on adding the “Add to Deck” button for the individual Grammar/Vocab pages + on the Search results page.
So that should be out tomorrow at least.

2 Likes

Made a not of the other stuff , but just wanted to mention:
The parser only searches for Vocab, not Grammar!
Hence the ぞ stuff not working

If I accidentally delete something right now it’s gone forever

Probs not gonna implement saving of the not-found items, but I will add a Confirmation model before allowing the deletion of items.

3 Likes

Do you have the text that generated this error?
If so I will try to reproduce and fix it before re-publishing the importer.

1 Like

I assume it’s the process of what I was doing rather than the list specifically so I’ll describe that in detail as well, but here you go: A Silent Voice Vocab · GitHub

What I did was I manually copy-pasted around 250 words each time from the file, and imported each batch separately. I didn’t rename any of the units at first so I ended up with 5 units with the same name in the import page. As I tried to rename the first one, all the ones below it duplicated and turned into the frozen state. The duplicate ones were inserted between the edited unit and the other 4 that still work at the bottom, if I remember correctly. If I now edited the next editable unit (the 2nd one, now at position 6 i believe), the same thing happened with the units below that one, and so on until I finished all of them. Only editing the last one didn’t cause any problems.

Also, another thing I’d like to add: If I remember correctly, any time I tried to import those 250 items, it lost a couple of them even though they were not shown as a fixable mistake. I ended up with a couple words missing from every Unit even though the JMDict IDs should all be entirely unique. Maybe it’s very similar words being consolidated? Or counters being combined into just the standalone counter vocab?

3 Likes

Wasn’t sure to mention it here or suggestions. But when importing csv (non id numbers), the wrong word will sometimes be chosen for kana words or even single character kanji. Which is understandable of course. But, sometimes I’ll miss that it did this until after the import. Which to fix, you have to delete>add correct vocab>move into chronological spot

Can there be a way to simply edit/exchange vocab and replace it with the correct vocab exactly where it is in the unit lineup? Being able to pull up the vocab chooser you see for missing items on import anytime during deck edit would be really useful

Also big thanks to the adjustment of how it handles missing vocab on import😭 my missed vocab doesn’t sit at the bottom of the unit, but actually where it belongs thank youuuuu

2 Likes

Just pushed a bunch of fixes for issues that have recently been mentioned.

Also switched to “batching” system for both searching for items and saving content.
This should fix those mass-import timeouts.

Tomorrow I will remove the Unit item (currently is 500) and total item (currently 2000) limits.

8 Likes

I just tried this out with a bunch more anime and it works great! That ghost unit bug when renaming units still exists though, sadly. And of course it’ll be amazing once the limits are gone. Thank you so much for all the hard work!

2 Likes