How to isolate glyphs

Hi. I am new to KNIME. I am delving into it because I cannot code much at all - hoping that an interface will help me achieve my goal.

In images like the one below, I am trying to put together a program or process so that each individual letter (glyph) is cut out and saved into its own image file, without any loss in resolution.

I have learned a lot and tried a lot. And I know it’s possible. But I still can’t figure out a solution on my own.

Any help would be very much appreciated. Yet if a reader would like a more specific issue to solve for me, I would love to know how to exclude objects from connected component analysis based on size. (I have learned how to exclude things based on color, brightness, and a number of other much more complex factors - “derivatives of derivatives,” it seems. But I can’t figure out how to do it simply based on object size. Smh.) If possible, I would like to know how to do it based on width, based on length, and based on area.

Much thanks in advance!

1 Like

Hi @sb000

Welcome to the KNIME forum.

If I may say, providing with a solution to your question is not as easy as one may think. I have worked on manuscript restoration by image processing in a previous “avatar life” for the British Library. This manuscript is pretty clean, which is very good news, but still you would need to do a bit of pre-processing on it (for instance there is “ink bleeding” between different leaf sides that could be tackled using Blind Deconvolution, and warping of lines because the leaf was not flat scanned, affecting the lighting of the leaf and eventually the line processing as well). This is only to beging with but then you would need to do much of what it is required to do classic OCR on hand written documents.

Could you please tell a bit more about what is your aim with the extraction of letters? Is it OCR eventually or something else? What would you like to do with the individual letters? What other tools are you currently using to do your image processing on these manuscripts?

Should you be interested, I’ll try to help on this (on some steps and gradually) once I have your reply.

Good luck & best regards,

Ael

1 Like

Thank you so much, Ael.

This is “stage 1” of a roughly 3 stage hobby project.

I wish to have an image file for every individual glyph in a manuscript. (OCR is not needed per se.) These images need to be lossless with respect to resolution and color. The letter shape itself is preferable to a rectangle; but I’ll take either.

Stage two is to use these to make a bitmap font - programmed to utilize the full extent of alternates gleaned from stage one.

Stage three is to construct a new set of manuscript images along these lines: let’s say I use the open source images of the Lindesfarne Gospels; I would put together a set of images that contain the Gospels in English. They would look like Lindesfarne in every way (as authentic-looking and true to the original’s appearance as possible), but the reading content would be from a faithful, modern English translation of the Gospels (not Jerome’s per se).

Any input - particularly on gleaning glyph images (stage 1) - would be welcomed. :slight_smile:

Hi @sb000

Interesting hobby !

Thanks for making your aim clearer. I would recommend to do stage 1 & 2 manually and not automatically if you just need to do it once. Most probably it will take less time than putting in place an automatic algorithm to do the same job in view to creating an alphabet of glyphs. KNIME may not be the best environment to do stage 1 & 2 automatically nor even manually. Besides this, you will need to do manual refinement on your cropped glyphs to have the quality you aim to compose a new manuscript.

Stage 3 could be automated once you have your alphabet of glyphs and KNIME could be used eventually for this. But still this would be quite a big project.

Best regards

Ael