[Request] Bridging the gap between Bunpro and Yomitan

I tried to work on this. I scraped the grammar points page, and generated a working dictionary. Itโ€™s only for Grammar Points, no vocabulary though.

image

Running python script.py does the following:

  1. Download (then cache) the Grammar Points Page
  2. Parse the page and generate a clean JSON of the data (then cache)
  3. Append new grammar points to the conjugation.csv file (used to customize recognition reading and inflection for each point)
  4. Use the JSON and the conjugation.csv file to generate a dictionary according to the schemas
  5. Output a file like Bunpro Grammar-2025-02-06.zip in the build/ directory.

The dictionary itself is pretty bad at the current state. It will only work for non-conjugatable structures, so it will find ใงใ™ใ€ใ“ใ‚Œใ€ใงใ—ใ‚‡ใ†ใ€ๅฅฝใใ€ใใ‚‰ใ„ใ€ใฎใŒใ™ใใ€ใ‚‰ใ—ใ„ใ€ใœใ‚“ใœใ‚“ใ€ใ‹ใ‚‚, and will not find ใ‹ใ‚‚ใ—ใ‚Œใพใ›ใ‚“ใ€ใ‚ใพใ‚Šใ€œใชใ„ใ€็œŸใฃ่ตค, ใ—ใ‹ใ€œใชใ„ etc. So itโ€™s pretty much useless.

To make it better, it needs manual cleanup (in the 927 points), to setup the terms correctly, since It will never find โ€œใใ‚‰ใ„ โ‘กโ€ or โ€œใ†-Verb (Dictionary)โ€, add the reading where necessary, and the inflection tag to support conjugations. Also, it would be good to scrape every grammar point page, to display more useful information directly, get more versions of the same grammar point and other stuff I canโ€™t think of right now. Thatโ€™s a ton of work and I probably wonโ€™t be doing that anytime soon.

The conjugation.csv file is currently empty, running the script for the first time will populate it. Fixing all terms, adding readings, and inflection tags in the file would be enough to make the dictionary usable. Other than that, it would be a good idea to add something to the file to โ€œexcludeโ€ certain super common points, or adding other points as aliases with the same ID, perhaps. If anyone wants to do a PR to the code or send a updated csv file directly, I accept it.

Anyway, during the research I found these Grammar Dictionaries and I probably will be using those instead for the near future, although a link to Bunpro would be cooler.

5 Likes