Hello im having a problem with a joiner where one information is complete while the other was written by a person so it has been shorten but is kinda random, the info seen to be exact to a certain point but it always drops
example
RCV 100 X 150000 P NAT = RCV 100 X 150000 P NATURAL PVC 30 60
i would just use the string manipulator substr( , , ) but the problem is that they are kinda random so some time i will need more letter and some times i will need less
is there a way to use some kind of joiner where it reads both sides and if match until a certain point i can join or that is impossible and i HAVE to have the exact cutting point?
i can already shorten the string on string manipulator my problem is that i have 2 documents one with a full name and the other has it shorten, but its random how short it is some times it cuts a word, some times it goes max to the numbers and some times its the full thing so i dont have a good way to cut in the middle because if i shorten too much its possible to break because i removed the identifying number but other cases if i dont short enough the joiner does not work because they are different, i was wondering if there is some kind of joiner or code node that i can use where it will try to find the most similar to connect the 2 documents, if that is not possible is ok i can always find ways to do the job even if just partial and fill the rest by hand
still thank you for your time
Hi @Bleck, you may be interested in some components that might be able to assist you here.
If you know a little sql, there is
Otherwise, if you can add a wildcard to the end of the shortened column value, you can try this:
A workflow to demonstrate the different “joiner components” is available here:
Another possibility (and if you know SQL) is to use my Table Connector components which allow you to treat your KNIME tables like a database and you can then write
a custom join between the different tables using DB Query Reader
I created this as an example imagine Column A is one document and column D is the other if i cut too much i cant differentiate natural from natura but if i dont cut enough the knime wont understand BLK from BLACK i hope this helps to visualize my problem
Given the configuration of your data I can’t think of a way to parse it to create matches. Also there’s probably more going on with your complete data set than you’ve explained. The attached workflow illustrates 5 simple similarity tests including @takbb’s component. The String Similarity Joiner component performs an actual join based on the similarity index you input. You can be as (un)conservative as you want.