how to calculate occurrence number into dataset

i have one a dataset with fix row (56 rows) and a second dataset with 36000 rows.
I want to calculate the number of occurrences by taking each record from dataset 1 and project it in a new column in dataset 2. exemple like this
dataset 1
keyword

account hijacking
accumulo
acoustic cryptanalysis
active cyber defense
active defense
advanced encryption standard (aes)
advanced evasion technique
advanced persistent threat (apt)

dataset 2 shoud be like this

keyword sum
azaezaeaazerazraz account hijacking
azerazerazeazer accumulo
azeazerazer az r eaze acoustic cryptanalysis accumulo
zerazerazerazer
azerazerazerazer
azeazerazer az r eaze acoustic cryptanalysis
azeazerazer az r eaze acoustic cryptanalysis
azerazerazeazer accumulo

thanks

sorry like this

keyword sum
azaezaeaazerazraz account hijacking
azerazerazeazer accumulo
azeazerazer az r eaze acoustic cryptanalysis accumulo
zerazerazerazer
azerazerazerazer
azeazerazer az r eaze acoustic cryptanalysis
azeazerazer az r eaze acoustic cryptanalysis
azerazerazeazer accumulo
keyword sum
azaezaeaazerazraz account hijacking 1
azerazerazeazer accumulo 1
azeazerazer az r eaze acoustic cryptanalysis accumulo 2
zerazerazerazer 0
azerazerazerazer 0
azeazerazer az r eaze acoustic cryptanalysis 1
azeazerazer az r eaze acoustic cryptanalysis 1
azerazerazeazer accumulo 1

Hi @willson

Take a look at this workflow how to calculate occurrence number into dataset.knwf (39.2 KB)


It loops every keyword on the table with the sentences to check. I didn’t calculate a sum yet, because a sentence can match multiple keywords (add a Column Aggregator if you want to sum up).

gr. Hans

2 Likes

Hi @willson,

In addition to the solution by @HansS, I have prepared another one which I think covers more different cases. But first, if you are going to use @HansS’s solution, you have to modify the expression in the Rule Engine to this:

$new column$ >= 0 =>1
TRUE => 0

The difference is the equal sign (=) in the first line ($new column$ > 0 =>1 to $new column$ >= 0 =>1).
Otherwise, with the current expression, if a keyword is placed at the beginning of the string, it would not be counted.

I thought the keywords may appear several times in a string, but the solution by @HansS, just tells you if a keyword exists in a string or not. Here is the workflow to count the number of each keyword in each string: (I have modified your 2nd example table to consider more different cases)

count_keyworkds.knwf (30.2 KB)

I have used the count() functions instead of indexOf() in the String Manipulation node.

:blush:

2 Likes

@armingrudd

tnx for pointing this out, I missed it in the solution

and adding the count() instead of the IndexOf() is a good option

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.