Hello KNIME users,
I’m currently working on a dictionary based workflow for text classification and I then use several dictionary tagger. But when I use the tag filter after all my taggers, I find myself with multiple tags for a same text. So I would like to know if it’s possible to add weight to dictionary taggers so that only the tag found with the highest weight is kept.
The workflow used is this one :
classification_test.knwf (40.0 KB)
If weight isn’t possible, I want the last tag used to be kept.
Thanks for your help
Hi, could you share your data or a picture of the output you are getting vs the output you are expecting? Thanks
Hi, sorry I can’t share the data because I work with confidentials data, but the output I get is this one
and I’m expecting this one
As you can see, the data I expect is composed of a single string, without duplicate and when two tags are found I keep only one of then, if possible the last one tagged.
In that case, please share fake data (5 rows or so?) as input since I can’t execute the workflow without input.
Or you could try using a different workflow. We did a similar challenge for Just KNIME It! Here is my solution to a problem like this (see “complex solution”) and here are community solutions which might help you.
Here is the workflow with fake data.
Classification_dictionary_based.knwf (207.8 KB)
As I have multiple terms for each category, I don’t know if your solution will work but I will try anyway.
I see, I think a combination of regex split and column merge will work here as long as your categories/tags/labels don’t have spaces in them.
Classification_dictionary_based_with_regex.knar.knwf (228.6 KB)
It works really well thank you for your help
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.