Spit documents into two separate groups if word is included or not

Dear Knimers
I’m sorry to bother you but I’m feeling particularly dumb and stuck, I hope you can help me.
I have tweets about AI.
I would like to analyse the tweets in two separate groups:

  • Group 1 where the word ‘artificialintelligence’ is mentioned but NOT the word ‘machinelearning’.
  • Group 2 where ‘machinelearning’ is mentioned but NOT the word ‘artificialintelligence’. How do I achieve this?

many thanks for your help!!
Outi

Hi @outiloni

The Rule Engine node can help you out here:

$column1$ LIKE "*artificialintelligence*" AND NOT $column1$ LIKE "*machinelearning*" => "GROUP1"
$column1$ LIKE "*machinelearning*" AND NOT $column1$ LIKE "*artificialintelligence*" => "GROUP2"
TRUE => "UNK"

You can subsequently filter or split the desired group.

2 Likes

THIS IS ABSOLUTELY BRILLIANT!!! Thank you very much!!!
I can see this working on string variables (each tweet).
Will it work on Preprocessed documents where I have coded the various versions of ‘artificialintelligence’ (AI, Artificial intelligence etc) under just one ‘artificialintelligence’? When I tried this, it gave me all 190K tweets/rows under just one category…

Many thanks for all your assistance!
Outiloni

You can use

to specify additional rules for AI, Artificial intelligence etc…

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.