Remove strings from one column that are contained in another column

mkubita · May 30, 2023, 8:03pm

Hi,
I wanna remove substrings from the first column that are in second column.
Example:

I have a table that looks like this:

As a result my table would be like:

It should search for all matches of a values that are in second column and replace them.

dnaki · May 31, 2023, 1:14am

Hi @mkubita , and welcome to the forum!
If I understand you correctly, you wish to search the string value in the first column for the string value in the second column and wherever found, remove the matching string from the first column?
If so, the 3rd row of your example result table should be:
| Wanna dog and a cat | have
Is that correct?

In any case, this could be accomplished several ways. Perhaps the most straightforward would be to use the String Manipulation node. You could select the option to replace the value in the first column, and enter the expression as in the screenshot below.

If instead you want to use the value from the second column that is in the previous row, you could use precede the String Manipulation node with the Lag column node. I hope this helps!
-Don

mkubita · May 31, 2023, 6:21am

Hi,
thanks for the answer. I did not explain it correctly, so I may clarify my question.

I have two columns in my dataset and the second column contains the strings that should be replaced in the first column - but the thing is that that the order of rows does not matter.

It means that if I have in me second column strings like that: dog / cat / horse and in my first column I have phrases like: dog likes cat / I like you. Firsly it should search for ‘dog’ in first column and remove this strings, then it should search for ‘cat’ and remove it, then for ‘horse’. As a result my dataset should look like: likes / I like you

HansS · May 31, 2023, 7:17am

Hi @mkubita

I don’t understand the logic.

row1 → I have a cat => expected result: a cat
but in row 3, column2 there is: have => I would expect for row 1 just: I

result: =>
row 1 → I
row 2 → there is dog
row 3 → dog and

gr. Hans

mkubita · May 31, 2023, 7:50am

Yes, you are correct @HansS. I made mistake

mkubita · May 31, 2023, 7:57am

So, do you know how to do this?

mkubita · May 31, 2023, 7:59am

I will clarify, because I made mistakes.

I have a table that looks like this:

As a result my table would be like:

HansS · May 31, 2023, 8:01am

Hi @mkubita

Yes I have a solution. I used the KNIME Text Processing nodes for this, see
KNIME_project3.knwf (37.1 KB)

I created a custom StopWord list from column2 to filter the words in column1.
Schermafdruk van 2023-05-31 09-58-43

gr. Hans

HansS · June 6, 2023, 8:15pm

Hi @mkubita

So, does my suggestion helps you out?

mkubita · June 13, 2023, 5:29pm

Yes, thanks! That works great

system · September 11, 2023, 5:30pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.