Loop over string to create partial join

Hi guys,

I need your help to finish building my flow.
I have a table like below

ID Name 1 Name 2
1 Mario Rossi Luigi Verdi
2 Antonio Marco de Giusti
3 Fabio Michele
4 Carlo Michele Stefano Gino

Starting from this table, I want to create a flow/loop where I join totally or partially the values in Name 1 or Name 2 as below:

ID Output
1 Mario - Luigi
1 Mario Rossi - Luigi
1 Mario - Luigi Verdi
1 Mario Rossi - Luigi Verdi
2 Antonio - Marco
2 Antonio - Marco de
2 Antonio - Marco de Giusti
2 Antonio - de
2 Antonio - Giusti
2 Antonio - de Giusti
2 Antonio - Marco Giusti
3 Fabio - Michele
4 Carlo - Gino
4 Michele - Gino
4 Stefano - Gino
4 Carlo Michele - Gino
4 Carlo Stefano - Gino
4 Michele Stefano - Gino
4 Carlo Michele Stefano - Gino

Can you help me?
Many thanks
R

This sounds like a task for the Cross Joiner node.

Hi @ScottF , I had the same reaction at first, that is to use a Cross Joiner. But looking at the desired output, it’s not. It’s not joining all records from Name 1 to Name 2, but rather combining all combinations of single word of Name 1 and Name 2 of the same row. Though eventually the Cross Joiner can be used after some initial manipulations and within a Loop (Group Loop by ID)

@Reader91 , for ID 1, should you not also have these 3:
1 Rossi Luigi
1 Rossi Verdi
1 Rossi Luigi Verdi

@bruno29a yeah, i forgot them.
It should have all the possible combinations

R

Hi @Reader91 , do combinations include reverse/different order of the string?

For example, while you have Mario Luigi Verdi, can you also have Mario Verdi Luigi?

I haven’t considered it at the beginning but I think it can be useful.

Maybe Pythons Itertools package can help you here, specifically combinations from it.
Otherwise like @ScottF suggested maybe first split and ungroup the data and then a crossjoin.
Not a perfect solution but maybe it gets you started.
br

Hi @Reader91 , I’ve come up with something using Python. The reason why I used Python is that it has a function called combinations() from the itertools library, and it basically finds all combinations for you automatically.

The workflow looks like this:
image

Input data (same as yours):
image

Results:

I know I asked this: “while you have Mario Luigi Verdi, can you also have Mario Verdi Luigi?”. This is more of a permutation for that combination. Combination-wise, you will get only “Mario Luigi Verdi”.

If you want all permutations of all combination, we can also add a permutation on top of these combinations. Python also has a permutations() function :wink:

Here’s the workflow: Get all combinations of names.knwf (14.1 KB)

2 Likes

Thank you @bruno29a for the solution. Very appreciated

I have one question because when I start the python script I receive an error message saying

ERROR Python Script        0:3        Execute failed: Could not start Python. There are problems with your Python environment:
Could not start Python executable at the given location (no_conda_environment_selected\python.exe): Cannot run program "no_conda_environment_selected\python.exe": CreateProcess error=2,

Hi @Reader91 , ah, it looks like you don’t have a Python environment set up. You need to set this up in order to be able to use Python.

Please check this documentation for installing Python:
https://docs.knime.com/2018-12/python_installation_guide/index.html#quickstart

@Reader91 I’ve completely rewritten my workflow. I also used python and it seems to work better than the preceding version 20220307_split_names – KNIME Hub
Let me know if this solution fits your needs

Hi @Reader91 , since you don’t have Python installed, I’ve written an alternate version in Java. You should not need to install Java as Knime comes with an embedded version. I get the same combinations. I’ve also optimized the logic where instead of starting with 1 element for combinations, I start with 2, that way I don’t need to do row filter anymore. I’ve added that optimization in both the Python and Java version.

The new workflow has both versions, and looks like this:
image

Results from Java:

Here’s the workflow: Get all combinations of names.knwf (17.4 KB)

EDIT: I would still suggest that you get Python installed. You’re limited if you don’t have Python. Python is essential when you need to access several other rows while processing a row (which is NOT in this case, hence why it was doable in Java). But if you are restricted for time, you can use the Java version for this workflow in the meantime.

Is the Java language so limited? Is that the reason why we don’t have an Alteryx Multi-Row Formula like node in Knime?

Luca

Latest version is much simpler. Still needs Python to run. 20220307_split_names – KNIME Hub

Hi @Reader91,

I tried to do this by using KNIME base nodes (no code). I’m not sure if this is the most optimized approach, I just tried to do it quickly. You or the other guys here may improve it:

40473.knwf (57.7 KB)

2 Likes

(it’s becoming an obsession) I’ve edited the Python nodes and now the result should be correct. 20220307_split_names – KNIME Hub

1 Like

Hi @Luca_Italy , I would not say that the Java language is “limited” compared to Python, I think it’s just a matter of how Knime decided to allow Python to navigate through the rows, but not with Java. Java has the same capability - it’s just an array in the end.

Thanks guys for the support on this issue.

I’ve built the flow based on @bruno29a tip with python, very quick and it covers my needs.

Many thanks
R

1 Like

Ok, so why we don’t have the same functionality like alteryx’s multi-row formula?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.