Questions about JMDict and Bunpro's database

Edgar-BP · November 30, 2024, 5:41am

Sorry if this isn’t the right place to ask, but I’m going to ask here since it’s for “everything related to Bunpro”. This question is mainly for the staff (and any programmer with good ideas).

For a small SRS project of mine I am trying to import the JMDict dataset into my SQL database for the same purpose as Bunpro is already doing: vocab lessons and reviews.
Since Bunpro’s implementation seems quite good, I wanted to know how the dataset is stored, rather than doing my own wobbly implementation from scratch.

My current implementation would be a vocab entry with kanji forms, kana forms and senses as foreign relations, to follow the structure of the dataset as much as I can.

So my questions are:

What kind of database is Bunpro using? (postgresql, mongodb…)
How is the dataset implemented in the db?
Does Bunpro regularly updates its JMDict data?

Thank you in advance!

veritas_nz · November 30, 2024, 7:18am

Hi there!

Thanks for the question.

We have Vocab as its own table in our Postgres DB.
On each Vocab, there is a field called jmdict_id and jmdict_data.

We store the JMDict data as an object directly on jmdict_data..
The formatting for that comes from this 3rd party project.

This allows us to have clearly structured JMDict Word objects that we can parse, all while using their Typescript type definitions .

We regularly update the DB using their releases.

TLDR: we don’t really implement the JMDict data in its own tables, but rather just tack on each Word into a field on our existing database.