Suggestions for how to handle fused labels

joshuahoran · February 16, 2018, 9:08pm

HI all,

I'm just diving in to the imaging nodes and I am blown away at how powerful these nodes are. I am currently processing images of vials with circular labels on the caps. My goal is to isolate and then perform OCR on the digits shown on the label. I have learned that Tess4J does a poor job when there is border around the text so I go to great lengths to remove as much of the peripheral (non-text) image before performing OCR. To do this I use:

Global Thresholder -> Connected Component Analysis -> Feature calculator

I use the calculated features to filter out anything large or circular. Overall this works very well except in cases like the one shown in the attached image. Notice that the top '7' digit is fused with the border segment, which results in its removal along with the border. I am looking for suggestions on ways I can try to split the '7' from the circular border. So far I have tried the Waehlby Splitter node, but that seems to segment everything into a million smaller pieces and I am no longer able to identify which pieces are part of the border and which are part of the text.

Any suggestions are welcome!

Three_Circles.PNG

gab1one · February 18, 2018, 7:27pm

Hi Joshua,

As a starting point: try out the Thinning node, followed by a Morphological Operations node with the Erode option to increase the size of the lines again. This allows you to split the "7" from the rest of the border.

best,

Gabriel

joshuahoran · February 20, 2018, 9:23pm

Thanks Gabriel. This helps me move in the right direction.

system · March 5, 2018, 10:51pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.