Spit documents into two separate groups if word is included or not

outiloni · October 17, 2023, 1:46pm

Dear Knimers
I’m sorry to bother you but I’m feeling particularly dumb and stuck, I hope you can help me.
I have tweets about AI.
I would like to analyse the tweets in two separate groups:

Group 1 where the word ‘artificialintelligence’ is mentioned but NOT the word ‘machinelearning’.
Group 2 where ‘machinelearning’ is mentioned but NOT the word ‘artificialintelligence’. How do I achieve this?

many thanks for your help!!
Outi

ArjenEX · October 17, 2023, 2:22pm

Hi @outiloni

The Rule Engine node can help you out here:

$column1$ LIKE "*artificialintelligence*" AND NOT $column1$ LIKE "*machinelearning*" => "GROUP1"
$column1$ LIKE "*machinelearning*" AND NOT $column1$ LIKE "*artificialintelligence*" => "GROUP2"
TRUE => "UNK"

You can subsequently filter or split the desired group.

outiloni · October 18, 2023, 11:01am

ArjenEX:

$column1$ LIKE "*artificialintelligence*" AND NOT $column1$ LIKE "*machinelearning*" => "GROUP1"
$column1$ LIKE "*machinelearning*" AND NOT $column1$ LIKE "*artificialintelligence*" => "GROUP2"
TRUE => "UNK"

THIS IS ABSOLUTELY BRILLIANT!!! Thank you very much!!!
I can see this working on string variables (each tweet).
Will it work on Preprocessed documents where I have coded the various versions of ‘artificialintelligence’ (AI, Artificial intelligence etc) under just one ‘artificialintelligence’? When I tried this, it gave me all 190K tweets/rows under just one category…

Many thanks for all your assistance!
Outiloni

izaychik63 · October 18, 2023, 7:53pm

You can use

to specify additional rules for AI, Artificial intelligence etc…

system · January 16, 2024, 7:53pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.