Using KNIME to classify based on known strings

I am trying to set up a process that will process my data to improve the quality of the classifications added - I will try and set out the over arching story and then the part I would like advice and guidance on:

I have a set of events which can have a number of yes/no type classifications - any event can have any number of classifications and it is normally down to the user to ensure that these are properly applied. There are some fields on the database that can be logic checked to help (if field X states Yes then so should field Y) - I have this aspect sorted out but what I would also like is to be able to throw up those events that are missing classifications based on the presence of certain keywords.

My challenge is that I also have a descriptive text that I would like to check - I want to use words as words, and not just do the blanket string search since that will miss out or incorrect flag up many things. So if the check term is “dance” I would like to draw back “dancing” and “danced” as well, and ideally suggested synonyms too although that is less important currently since they will be less strongly associated with the classification.

I can get to the point where the workflow deals with the data up to the point of the textual process but am not sure what node will be able to apply the logic and would like a steer on this to help me get a few steps forward.

Okay having looked through Nope Pit I can see the best approach to my untrained eye is to use one of the stemming nodes to strip bits down - get the principle just need to try and see it in action which leads to the issue that many users seem to have of text nodes not being available - another topic that one.

The next step I need to get my head around is the filtering by a group of keywords - again I am keen to know what node could hold my list of test terms - in my original example for dance events the list of search words/stems would be “danc” , “disco”, “festival”, “rave” I think.

Hi @StevenFrancis -

It sounds like you might have a use for the Dictionary Tagger and Tag Filter nodes. Here is a link to the Workflow Hub that demonstrates how they can be implemented - just use your forum credentials to login:

https://workflows.knime.com/knime/hub/workflows/08_Other_Analytics_Types%3A01_Text_Processing%3A04_Dictionary_based_Tagging

You can also find this workflow on the EXAMPLES server - search for “dictionary_based_Tagging”.

Best,
Scott

I have a similar task to search ‘ItemDesc’ against a bank of keywords. It does not appear the Dictionary Tagger and Tag Filter are available… any recommendation?

image

Here’s the ruleset:

image

Hi @wpaschal -

Those nodes are part of the KNIME Textprocessing extension - you need to install it first.

If you open the example workflow discussed above in the KNIME Hub, you should be prompted to install the appropriate extension. Here’s a more recent link: https://kni.me/w/IBp9LRLyNKA9r0H6

So you feel those nodes would be the best solution for what I’m trying to accomplish?

In your case, maybe all you need is the Rule-based Row Filter. This would avoid having to deal with the text processing nodes at all.

The syntax in that filter might look something like this:

$ITEMDESC$ LIKE "*Rent*" => FALSE
$ITEMDESC$ LIKE "*Shipping*" => FALSE
.
[Rest of your rules]
.
TRUE => TRUE
1 Like

Sweet!! This is super-powerful!

Thanks for the work you are doing.

2 Likes