how can we match the substring of a string

I have two string who is made up of different substring.
I need to check whether the two string is same or not
String 1: “pl12, 78we, fg45” String 2 “78we, fg45, pl12”
Answer: true (because all the substring of string 1 is in string 2)

1 Like

Use

with list sorted aggregation.
Then use

to mark similar strings based on aggregated columns.

6 Likes

Hi @PankajChaudhary ,

I think regex is easier solution. Check this:KNIME_project300.knwf (51.9 KB)

GL,
Mehrdad

3 Likes

I tried out using the Column Aggregator but could not get it to turn “e, b, d, c, a” into a sorted list [a, b, c, d, e], which is odd as it feels like it should be capable. Not sure what I was doing wrong.

I also took a look at the following post, which on the face of it is the same problem, and may contain the answer

but in the end, I thought I’d also just have a go. (You can never have too many variations and ideas!! :-))

This workflow also produces the result.in a somewhat convoluted way…

Broadly the task is
(1) Turn both Strings into sets of data A and B that can be compared
(2) Find where an element of set A is not in set B
(3) Mark string A as “is subset” where all of its elements are in set B

My initial stab at this had a question mark. What was the desired outcome if an element in String A is repeated. Can that occur? I had to assume that “c,c,d,e” was NOT considered a subset of “c,d,e” , and that meant I had to add some more nodes to do counts of elements and handle that bit too. So my result wasn’t as small a workflow as I’d like, but it’s another option … :rofl:
KNIME_test_is_subset.knwf (51.2 KB)

4 Likes

Hi @PankajChaudhary,

This can be solved with KNIME in very different ways as nicely showed by @izaychik63, @mehrdad_bgh and @takbb.

Here goes my contribution based on aggregation, considering both case (allowing repetition or not of substrings) :slight_smile:

20210501 Pikairos Compare Sets Example.knwf (61.0 KB)

Hope this helps :slight_smile:

Best,

Ael

5 Likes

Hi @mehrdad_bgh
In my case, the no of elements in the string is different for every row. How can we generalise it for that?

1 Like

Hi Pankaj,

Sorry for delay.

:smile:
Regex.knwf (46.0 KB)

GL,
Mehrdad

1 Like

Hi @mehrdad_bgh
Actually, both columns are in a different table like you have taken in the first workflow.

Hi Pankaj,

Regex.knwf (50.5 KB)

2 Likes

This took me some time to figure out the splitting but maybe python node would be an option as well

5 Likes

@Daniel_Weikert Enjoying your python solutions. Am I right in thinking that your equal() function is returning exactly what it says, as in True if the sorted collections are the same?

So if we want to return True if Column A either equals B or is a subset of B, then your method could be changed to this:

def equal(a,b):
	return all(x in b for x in a)

p.s. my mate Google told me this… hope it’s right!. :wink:

2 Likes

I think you are correct.

Speaking of solutions. Whenever I visit the forum I found at least 10 new solutions by @takbb :sweat_smile:
Excellent work Brian, the only problem is I am not able to catch up reading all of them :joy:
Best regards

4 Likes

:slight_smile: … thanks @Daniel_Weikert . I discovered the best way to learn Knime was to pretend I knew what I was doing and just dive in head first attempting solutions… which might be why some of my solutions are perhaps a little off the wall!

2 Likes

Here’s another thread with a similar theme, looks like this had a few other options to add to the list!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.