Filter out rows with columns with filter criteria from a different csv

Hello,

i have a database and want to filter out rows which contain some words that i have in a google sheet (for example buy, cheap… and many more)

column

  1. burger cheap
  2. burger tasty
  3. buy burger
  4. burger new york

filter out 1 and 3
image

Now my question is how can i implement this filter criteria to a row filter?

Thx for your help :slight_smile:

Morning!
I think this can help you.

1 Like

hmm, ty, but i want to filter out the row if the cell contains the filter word not if the cell is exactly like the filter word

for example
filter
cheap

database

  1. burger cheap
  2. burger
  3. pizza cheap

1 and 3 out

Hi @phil21oo

Although the question seems simple, the answer is not and needs recursion, hence a loop as follows:

Hope it helps.

Best
Ael

4 Likes

Hi @phil21oo

See this wf filter_out_rows.knwf (52.5 KB). It uses a loop for every word you want to search for so you can exclude the input-row.


gr. Hans

3 Likes

thank you very much, i will try it :slight_smile:

1 Like

i have a question, what does this mean “SWords”

Hi @phil21oo,

This workflow is, let’s say, L2 KNIME level where it is needed to use loops and variables.

The word $${SWords}$$ means variable Words, which appears in the “Flow Variable List”. It is named with an S character at the beginning to indicate that it is of type String.

The variable $${SWords}$$ is created by the -Table Row to Variable- node and it provides as a variable to the -String Manipulation- node the word to be replaced at every iteration of the recursive loop.

A recursive loop is needed because I guess you need to eliminate ‘ANY’ word appearing in your table of words (2nd Table) if several words to eliminate appear in the same sentence. The recursive loop injects back the same table of sentences into next loop iteration, and at every new iteration, a word is checked and eliminated. So for instance, ‘new york’ is eliminated in first iteration from your sentences and once this is done, then ‘paris’ is eliminated from those updated sentences in the second iteration. And so on so forth until all the words of the table of words are checked. The final number of iterations is necessarily the number of words to check, and this information is gathered by the -Extract Table Dimension- node.

Eventually, all the sentences are tested for all the words.

Hope it is clear. Otherwise, plese reach out again for further help :blush:

Best
Ael

1 Like

wow, awesome, thank you so much. :slight_smile: Where in the process can i double check which sentences are filtered out? ty

1 Like

My pleasure @phil21oo

Sentences are not really filtered out. What are filtered out are words from sentences. If you remove the -Duplicate Row Filter- node, the sentences at the beginning and at the end will remain in same number and position. If you duplicate the column of sentences at the beginning to keep a trace of the "initial sentences before changing, you will be able at the end of the process to check which sentences have changed and which are those that haven’t.

I let you do it as exercice while I do it on my side to eventually upload it here :wink:

I’ll be here back soon,
Ael

This can be done without loops using the Rule Based Row-Splitter (Dictionary). It is my standard approach since additional criteria can easily be added to an existing process when necessary.

2 Likes

Hi @phil21oo

The modified solution is as follows:

The last node (-rule based row splitter-) splits the table based on the difference between the orginal sentences and the modified ones.

Hope it helps.

Best,
Ael

4 Likes

Hello @phil21oo
Have you already had a look search in forum?
I would follow an approach similar to this one based in Regex, as you can easily deal letter case… You can avoid of using loops as well.

BR

1 Like

Hi @phil21oo

Interesting dicussion, I added an alternative to my wf filter_out_rows.knwf (79.4 KB). I think I can be done much easier with no loops, but making use of the KNIME TextProcessing nodes. I used a StopWord filter to filter out the words. And by using the ReferenceRow Splitter you have both the modified and unmodified records.


gr. Hans

2 Likes

@phil21oo Glad to see you found many alternatives. Not meaning to spoil the party, but just pointing up some alternatives here which:

  • don’t require loopings
  • use only native and basic Knime nodes (no Extensions)

Input:
ps2

Output:
ps3

Here’s the link:

4 Likes

Edit: Workflow updated to consider standardization to lowercases.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.