What's the accuracy on deepseek OCR? Archive.org has a copy of Black Shirts and Reds that has some really awful OCR text in it. I've wanted to get a better OCR of the text for some time. Deepseek OCR is probably not the right tool for it.
I have wondered how good something like Crush would be at building epubs out of raw text and a epub temple/style guide.
1
☆ Yσɠƚԋσʂ ☆ - 2w
The accuracy depends on the quality of the source image, it tends to do pretty well even with compressed ones. Doing OCR on a whole book might be a bit slow, but could be worth running a few pages through to see what it would look like. You could definitely use crush to make a script that would feed a pdf through deepseek-ocr and output formatted text. You'd probably have to stream it through by doing a few pages at a time.
yogthos in crushagent
Made a GUI app built on top of deepseek-ocr.rs
https://git.sr.ht/~yogthos/inscriptusWhat's the accuracy on deepseek OCR? Archive.org has a copy of Black Shirts and Reds that has some really awful OCR text in it. I've wanted to get a better OCR of the text for some time. Deepseek OCR is probably not the right tool for it.
I have wondered how good something like Crush would be at building epubs out of raw text and a epub temple/style guide.
The accuracy depends on the quality of the source image, it tends to do pretty well even with compressed ones. Doing OCR on a whole book might be a bit slow, but could be worth running a few pages through to see what it would look like. You could definitely use crush to make a script that would feed a pdf through deepseek-ocr and output formatted text. You'd probably have to stream it through by doing a few pages at a time.