Wikipedia:Reference desk/Archives/Computing/2022 August 15

From Wikipedia, the free encyclopedia
Computing desk
< August 14 << Jul | August | Sep >> Current desk >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 15[edit]

Is there any manual OCR which allows us to assign characters manually for identical glyphs?[edit]

Rather than a conventional OCR or an OCR trainer, is there any engine that would group identical glyphs and allows us to manually assign characters to each occurrence of the glyph? This would help in digitizing old multilingual or handwritten documents faster and with better accuracy. Basically something that would output identical glyphs under same code which we can search-and-replace with the required characters. Thanks. - Vis M (talk) 07:47, 15 August 2022 (UTC)[reply]

The idea seems sound, and the best OCR software must contain components that already do a great deal of the job (isolating glyphs and edge tracing), but I have not found anything that matches. Two scanned glyphs are unlikely to be identical images; at best they are very similar. So the full task involves cluster analysis as an essential and non-trivial (but doable) component.  --Lambiam 08:32, 17 August 2022 (UTC)[reply]
Ok, thank you! - Vis M (talk) 18:31, 22 August 2022 (UTC)[reply]