Filter out rows with columns with filter criteria from a different csv

phil21oo · May 5, 2023, 9:36am

Hello,

i have a database and want to filter out rows which contain some words that i have in a google sheet (for example buy, cheap… and many more)

column

burger cheap
burger tasty
buy burger
burger new york

filter out 1 and 3

Now my question is how can i implement this filter criteria to a row filter?

Thx for your help

MoLa_Data · May 5, 2023, 9:46am

Morning!
I think this can help you.

phil21oo · May 5, 2023, 10:02am

hmm, ty, but i want to filter out the row if the cell contains the filter word not if the cell is exactly like the filter word

for example
filter
cheap

database

burger cheap
burger
pizza cheap

1 and 3 out

aworker · May 5, 2023, 10:03am

Hi @phil21oo

Although the question seems simple, the answer is not and needs recursion, hence a loop as follows:

Hope it helps.

Best
Ael

HansS · May 5, 2023, 10:03am

Hi @phil21oo

See this wf filter_out_rows.knwf (52.5 KB). It uses a loop for every word you want to search for so you can exclude the input-row.

gr. Hans

phil21oo · May 5, 2023, 10:07am

thank you very much, i will try it

phil21oo · May 5, 2023, 10:23am

i have a question, what does this mean “SWords”

aworker · May 5, 2023, 10:33am

Hi @phil21oo,

This workflow is, let’s say, L2 KNIME level where it is needed to use loops and variables.

The word $${SWords}$$ means variable Words, which appears in the “Flow Variable List”. It is named with an S character at the beginning to indicate that it is of type String.

The variable $${SWords}$$ is created by the -Table Row to Variable- node and it provides as a variable to the -String Manipulation- node the word to be replaced at every iteration of the recursive loop.

A recursive loop is needed because I guess you need to eliminate ‘ANY’ word appearing in your table of words (2nd Table) if several words to eliminate appear in the same sentence. The recursive loop injects back the same table of sentences into next loop iteration, and at every new iteration, a word is checked and eliminated. So for instance, ‘new york’ is eliminated in first iteration from your sentences and once this is done, then ‘paris’ is eliminated from those updated sentences in the second iteration. And so on so forth until all the words of the table of words are checked. The final number of iterations is necessarily the number of words to check, and this information is gathered by the -Extract Table Dimension- node.

Eventually, all the sentences are tested for all the words.

Hope it is clear. Otherwise, plese reach out again for further help

Best
Ael

phil21oo · May 5, 2023, 11:27am

wow, awesome, thank you so much. Where in the process can i double check which sentences are filtered out? ty

aworker · May 5, 2023, 11:32am

My pleasure @phil21oo

Sentences are not really filtered out. What are filtered out are words from sentences. If you remove the -Duplicate Row Filter- node, the sentences at the beginning and at the end will remain in same number and position. If you duplicate the column of sentences at the beginning to keep a trace of the "initial sentences before changing, you will be able at the end of the process to check which sentences have changed and which are those that haven’t.

I let you do it as exercice while I do it on my side to eventually upload it here

I’ll be here back soon,
Ael

iCFO · May 5, 2023, 12:00pm

This can be done without loops using the Rule Based Row-Splitter (Dictionary). It is my standard approach since additional criteria can easily be added to an existing process when necessary.

aworker · May 5, 2023, 12:08pm

Hi @phil21oo

The modified solution is as follows:

The last node (-rule based row splitter-) splits the table based on the difference between the orginal sentences and the modified ones.

Hope it helps.

Best,
Ael

gonhaddock · May 5, 2023, 12:29pm

Hello @phil21oo
Have you already had a look search in forum?
I would follow an approach similar to this one based in Regex, as you can easily deal letter case… You can avoid of using loops as well.

BR

HansS · May 5, 2023, 1:19pm

Hi @phil21oo

Interesting dicussion, I added an alternative to my wf filter_out_rows.knwf (79.4 KB). I think I can be done much easier with no loops, but making use of the KNIME TextProcessing nodes. I used a StopWord filter to filter out the words. And by using the ReferenceRow Splitter you have both the modified and unmodified records.

gr. Hans

badger101 · May 5, 2023, 2:19pm

@phil21oo Glad to see you found many alternatives. Not meaning to spoil the party, but just pointing up some alternatives here which:

don’t require loopings
use only native and basic Knime nodes (no Extensions)

Input:
ps2

Output:
ps3

Here’s the link:

badger101 · May 5, 2023, 3:07pm

Edit: Workflow updated to consider standardization to lowercases.

system · August 3, 2023, 3:07pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.