JOIN with the help of regex

Hello all,

I would like to join column “Access” from the second table to the first one.

First table:

ID NAME
123454dcv Schmidt
123454fff Skorski
1234avcfrt Li
654744rt Mumford
12dcf Bryan

Second Table:

ID Access
123 55
654744 27
12dcf 25
So, if values in "column ID" in the first table match partially/fully the values from "column ID"(Second table), then add the value from column Access.

Result should be:

ID NAME Access
123454dcv Schmidt 55
123454f Skorski 55
1234avcfrt Li 55
654744rt Mumford 27
12dcf Bryan 25

Any ideas how to achieve that?

Thank you.

Best,
Ram

Hello @IMR2KA
You can take a look into this recent topic and discussion. I think the use case is very similar to your challenge:
https://forum.knime.com/t/join-2-data-sets-using-contains-value-in-joiner-node/66655

I hope this helps. Please comment if further support is needed.

BR

2 Likes

@gonhaddock
Thank you, it works for me partialy.

Can you please explain what does this mean?

regexMatcher($column1$,
join(
“(?i)(.?("
, $column1 (#1)$
, ")[^$]
)$”
)

Before regexMatcher:

After:

So, ?i is case insencitive, but after that cannot understand.

THANK YOU!

Best,
Ram

Hello @IMR2KA
What do partially means? Be aware that you are applying a generic workflow, and regex can be tricky depending the source, encoding…

If you can provide some dummy data or sample representing your use case, when it is only partially matching; we can try to find a more suitable regex approach. But I guess the mechanics are valid for your requested JOIN function.

regexMatcher: Applies regex to string and returns true if regex matches. $LOGIC$ [TRUE/FALSE]

The(?!) means insensitive case; if your look for an exact case match, it can be removed. [$] avoids to end the code if the reference text includes such character…

regex101.com can explain better than I do…

Did you try @takbb’s H2 Database proposed Component?

BR

1 Like

Sorry for being not precise. I wrote partially, because I added some steps to the solution from @takbb, but as a result it works.

@gonhaddock I used regex101.com to understand the regex but it is kind of difficult.
1. (?i) CASE INSENSITIVE
2. (.*? Any characters before the column value
3. )[^$] takes into account all values except for ones with $ at the end
4. *) repeat
5. $ finish the cell with $ to go to the next cell (see 3)
Correct?

@takbb @gonhaddock Thank you all!

Best regards,
Ram

Hello @IMR2KA

Just forget about [^$] as it is optional and it looks a remaining of the reused code from some previous use case. So here is redundant (or not).
Just be aware that you have to include dollar symbol between the square brackets, when it is present in your wildcard (see example). This is why working with regex can’t be generic, specially when working with special characters.

BR

1 Like

You could have tried string similarity search as an alternative.
br

1 Like

@takbb’s “Join Contains” component works with your sample data.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.