Anomalous behaviour of Dictionary Tagger

In the attached simple workflow, if we add two or more spaces between the two words composing a keyword present in a PDF doc while the same keyword in the dictionary has only one space, the multi-space version is tagged in the doc if the single-space version is also present (test3 and test4) but not if the multi-space version is alone (test1).

test_dictionary_tagger.knar (119.2 KB)

Hi @mpenalver -

Interesting - thanks for the workflow and example files. Let me ask one of our developers to take a look at this. There does seem to be an inconsistency in handling whitespaces here, regardless of which tokenizer is used.

(EDIT: New ticket created, AP-15275)


That’s right. Thanks a lot @ScottF.