Joiner with Parcial information

Bleck · May 7, 2024, 1:15pm

Hello im having a problem with a joiner where one information is complete while the other was written by a person so it has been shorten but is kinda random, the info seen to be exact to a certain point but it always drops
example

RCV 100 X 150000 P NAT = RCV 100 X 150000 P NATURAL PVC 30 60

i would just use the string manipulator substr( , , ) but the problem is that they are kinda random so some time i will need more letter and some times i will need less

is there a way to use some kind of joiner where it reads both sides and if match until a certain point i can join or that is impossible and i HAVE to have the exact cutting point?

thank you for your time

rfeigel · May 7, 2024, 2:30pm

You might try the LIKE function in the Rule Engine node. You’ll need to add a wildcard to the end of the shorter string.

Bleck · May 7, 2024, 5:50pm

i can already shorten the string on string manipulator my problem is that i have 2 documents one with a full name and the other has it shorten, but its random how short it is some times it cuts a word, some times it goes max to the numbers and some times its the full thing so i dont have a good way to cut in the middle because if i shorten too much its possible to break because i removed the identifying number but other cases if i dont short enough the joiner does not work because they are different, i was wondering if there is some kind of joiner or code node that i can use where it will try to find the most similar to connect the 2 documents, if that is not possible is ok i can always find ways to do the job even if just partial and fill the rest by hand
still thank you for your time

izaychik63 · May 7, 2024, 6:51pm

You can try this solution

takbb · May 7, 2024, 7:06pm

Hi @Bleck, you may be interested in some components that might be able to assist you here.

If you know a little sql, there is

Otherwise, if you can add a wildcard to the end of the shortened column value, you can try this:

A workflow to demonstrate the different “joiner components” is available here:

Another possibility (and if you know SQL) is to use my Table Connector components which allow you to treat your KNIME tables like a database and you can then write
a custom join between the different tables using DB Query Reader

rfeigel · May 7, 2024, 9:36pm

Could you upload some sample data for both tables? That would make testing easier.

Bleck · May 8, 2024, 11:45am

1.xlsx (5.4 KB)

I created this as an example imagine Column A is one document and column D is the other if i cut too much i cant differentiate natural from natura but if i dont cut enough the knime wont understand BLK from BLACK i hope this helps to visualize my problem

rfeigel · May 9, 2024, 1:39am

Given the configuration of your data I can’t think of a way to parse it to create matches. Also there’s probably more going on with your complete data set than you’ve explained. The attached workflow illustrates 5 simple similarity tests including @takbb’s component. The String Similarity Joiner component performs an actual join based on the similarity index you input. You can be as (un)conservative as you want.

system · August 7, 2024, 1:39am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.