Anomalous behaviour of Dictionary Tagger

mpenalver · September 30, 2020, 11:29pm

In the attached simple workflow, if we add two or more spaces between the two words composing a keyword present in a PDF doc while the same keyword in the dictionary has only one space, the multi-space version is tagged in the doc if the single-space version is also present (test3 and test4) but not if the multi-space version is alone (test1).

test_dictionary_tagger.knar (119.2 KB)

ScottF · October 2, 2020, 3:40pm

Hi @mpenalver -

Interesting - thanks for the workflow and example files. Let me ask one of our developers to take a look at this. There does seem to be an inconsistency in handling whitespaces here, regardless of which tokenizer is used.

(EDIT: New ticket created, AP-15275)

mpenalver · October 2, 2020, 4:01pm

That’s right. Thanks a lot @ScottF.

system · June 2, 2023, 9:41pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.