Countif across 2 tables with multiple columns

Hello all,

I’m new here in the KNIME community and am trying my hand at gradually replacing Excel with KNIME. If this question is misplaced here, I apologize.

I find after days of research and thinking unfortunately no solution to a COUNTIF function over multiple columns or tables. Groupby or join alone have not helped me.

Here’s what I’m trying to do:

Table 1 (all words from table 2).

ID, Word
Flower
Windshield
Pear
Road
Tire
Grass
Cow
Gear
Plants
Animals

Table 2 (words themed; row 1 = topic)
There can be thousands of topics here!

nature,car
flower,windshield
Pear,road
Grass,tire
cow,gear
Plants,
Animals,

Now I would like to create the following table, which I would solve in Excel with Countif:

Table 3
ID, Word,Nature,Car
flower,1,0
windshield,0,1
Pear,1,0
road,0,1
Tire,0,1
Grass,1,0
Cow,1,0
Gear,0,1
Plants,1,0
Animals,1,0

Maybe the background will help a bit too, as this looks a bit abstract. I have a table with a good 10,000 entries (trainig data). In this table there are two columns

  1. Texts
  2. classification
    The whole texts were once divided by hand into 9 classes. The texts are all of different length and very different. In principle I would like to train an algorithm on the basis of the appearing words that this makes the classification. For this I thought to write all words in own columns (for each entry). This would correspond then transposed the table 2 described above. Subsequently, I have created a table that contains all words, this corresponds to table 1. Now I hope that I can build a basis for a ML algorithm with the creation of table 3.

Hello @Sebastian_hro ,

have a look at this workflow:

Hope it solves your problem!

Have a nice day,
Raffaello
LinkedIn

3 Likes

Hi Raffaello,

thank you very much for your response!

The challenge for me, however, is that Table 2 can also have 10,000 columns. Accordingly, I would then need 10,000 joiner nodes.

I have also tried the whole thing with a loop, but it does not want to succeed.

I am therefore looking for a solution that reacts more flexibly to the number of columns in Table 2.

With “10k entries” I thought you meant 10k rows D:

I’ll come back to you with a solution :slight_smile:

RB

@Sebastian_hro have a look at the same link above, I have added a new workflow.
This time you can make it work with multiple columns in table 2.

RB

3 Likes

Hello @Sebastian_hro ,

here is another way to solve the task.


countif_example.knwf (17.6 KB)

Can you give it a try?
If you do not need the ID from the first table, you could also consider using the Reference Row Filter instead of the Joiner to limit the Words to those available in the first table.

Cheers,
nan

@lelloba was faster. Nice workflow.

4 Likes

Hi @lelloba and @nan,

thank you so much!!!

Both workflows work perfectly. It took me 3 full days and you solve it within minutes :blush:

1 Like

Hi @Sebastian_hro

I’ve seen that @lelloba and @nan have already provided you with answers but I’d started working on this, so thought I’d finish and post anyway :slight_smile:

If you already have table 2, then I believe you can do this without needing to use table 1, by doing a series of Pivoting, Unpivoting and Transposing

Here is my workflow on the hub:

Best wishes
Heather

4 Likes

Hi @HeatherPikairos,

also a big thank you to you!

After trying the solution of @nan I realised the same. Now I am doing an Unpivoting → Pivoting → Missing Value → Transpose and it works just fine.

Your solution is great as well =)

2 Likes

That’s great :slight_smile: I’m glad you’ve got it working!

1 Like