Thanks, I just tried giving it a go! I installed python 3.12 as recommended. Something went wrong running the tool - I got a couple errors (below). Does this look like something on my end? To keep it simple, I dumped all of my manga JPG files into one folder, ran it, then got an empty CSV file as output:
Code
C:\Users\zan>jpvocab-extractor --type manga C:\Users\~~\ALL_FILES
2025-01-07 02:17:44,722 | INFO | root - Extracting texts from C:\Users\~~\ALL_FILES...
2025-01-07 02:17:44,722 | INFO | root - Running mokuro with command: ['mokuro', '--disable_confirmation=true', 'C:/~~/ALL_FILES']
2025-01-07 02:17:44,722 | INFO | root - This may take a while...
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\zan\AppData\Local\Programs\Python\Python312\Scripts\mokuro.exe\__main__.py", line 4, in <module>
File "C:\Users\zan\AppData\Local\Programs\Python\Python312\Lib\site-packages\mokuro\__init__.py", line 3, in <module>
from mokuro.manga_page_ocr import MangaPageOcr as MangaPageOcr
File "C:\Users\zan\AppData\Local\Programs\Python\Python312\Lib\site-packages\mokuro\manga_page_ocr.py", line 7, in <module>
from comic_text_detector.inference import TextDetector
File "C:\Users\zan\AppData\Local\Programs\Python\Python312\Lib\site-packages\comic_text_detector\inference.py", line 14, in <module>
from comic_text_detector.utils.io_utils import imread, imwrite, find_all_imgs, NumpyEncoder
File "C:\Users\zan\AppData\Local\Programs\Python\Python312\Lib\site-packages\comic_text_detector\utils\io_utils.py", line 11, in <module>
NP_BOOL_TYPES = (np.bool_, np.bool8)
^^^^^^^^
File "C:\Users\zan\AppData\Local\Programs\Python\Python312\Lib\site-packages\numpy\__init__.py", line 427, in __getattr__
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'bool8'. Did you mean: 'bool'?
2025-01-07 02:17:55,287 | ERROR | root - Mokuro failed to run.
2025-01-07 02:17:55,287 | INFO | root - Getting vocabulary items from all...
2025-01-07 02:17:55,287 | INFO | root - Vocabulary from all: , ...
2025-01-07 02:17:55,287 | INFO | root - Processing CSV(s) using dictionary (this might take a few minutes, do not worry if it looks stuck)...
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 79.32it/s]
2025-01-07 02:17:55,318 | INFO | root - Vocabulary saved into: vocab_all in folder C:/~~/ALL_FILES

