Adding weight to tagged terms

YGD · June 20, 2022, 8:46am

Hello KNIME users,

I’m currently working on a dictionary based workflow for text classification and I then use several dictionary tagger. But when I use the tag filter after all my taggers, I find myself with multiple tags for a same text. So I would like to know if it’s possible to add weight to dictionary taggers so that only the tag found with the highest weight is kept.

The workflow used is this one :
classification_test.knwf (40.0 KB)

If weight isn’t possible, I want the last tag used to be kept.

Thanks for your help

victor_palacios · June 20, 2022, 8:01pm

Hi, could you share your data or a picture of the output you are getting vs the output you are expecting? Thanks

YGD · June 21, 2022, 8:16am

Hi, sorry I can’t share the data because I work with confidentials data, but the output I get is this one

and I’m expecting this one

As you can see, the data I expect is composed of a single string, without duplicate and when two tags are found I keep only one of then, if possible the last one tagged.

victor_palacios · June 21, 2022, 4:51pm

In that case, please share fake data (5 rows or so?) as input since I can’t execute the workflow without input.

Or you could try using a different workflow. We did a similar challenge for Just KNIME It! Here is my solution to a problem like this (see “complex solution”) and here are community solutions which might help you.

YGD · June 22, 2022, 7:51am

Here is the workflow with fake data.
Classification_dictionary_based.knwf (207.8 KB)

As I have multiple terms for each category, I don’t know if your solution will work but I will try anyway.

victor_palacios · June 22, 2022, 9:56pm

I see, I think a combination of regex split and column merge will work here as long as your categories/tags/labels don’t have spaces in them.

Screen Shot 2022-06-22 at 2.53.10 PM

Screen Shot 2022-06-22 at 2.55.22 PM

Classification_dictionary_based_with_regex.knar.knwf (228.6 KB)

YGD · June 23, 2022, 8:30am

It works really well thank you for your help

system · June 30, 2022, 8:30am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.