Find connection between different tables

Hello,
I would like to compare two tables and find out connections between them. Both tables are different, but they are connected in some way. One table is connected to the other table by certain columns and I would like to find out the columns. Is there a solution for this with Knime?

Hi @student96 and welcome to the forum !

Would it be possible for you to share (upload here) a bit of the data you want to compare between tables ? Depending on the nature of the data (numerical / categorical / String, etc.), different solutions are possible. Thanks in advance.

Best,

Ael

3 Likes

@aworker

Thank you :blush:. Unfortunately, I cannot upload any data. But it is rather strings that would have to be compared with each other.

Hi @student96

Without an example of your data, it is going to be difficult to set up and share a workflow, because I believe this is a very much a data dependent problem. Different data require different metrics for comparison, even between strings.

Could you please share a bit of dummy data that is not confidential and that looks at least similar to the one you need to use and compare ? This point is important because the “metric” to use for comparison has to be chosen in terms of the data nature.

A few rows would be enough to understand your problem and eventually suggest possible solutions.

Thanks,

Ael

4 Likes

Thanks for your help @aworker
Okay so in both files there are about over 1000 entries. They are CSV files which have approximately this format:
File 1:
“Column1”; “Column2”; “Column3”
“String1”;“String2”;“String3”
“String11”;“String12”;“String13”

File 2:
“Column4”; “Column5”; “Column6”; “Column7”
“String4”;“String1”;“String5”;“String6”
“String14”;“String12”;“String15”;“String16”

As you can see in this example, the tables are connected via Column 2 and 5. I would like to find this out automatically.

In my case, however, it is also unknown via which column the entries are connected.

Best regards

1 Like

Hi @student96, so are you saying that columns are considered “connected” if they have any data (even if it is just one cell) in common?

Why don’t you consider column 1 and column 5 connected, as they both share String 1?

2 Likes

In Row1, String1 appears in Column1 and Column5. In Row2, String12 appears in Column2 and Column5. According to your explanation, Column2 and Column5 are the connection, but that only holds true for Row2. Is there a mistake in your example or in your explanation?

2 Likes

In the original data set there are several columns that are somehow connected. I would like to automatically determine which columns are related.

@G47_2 That was a mistake, have improved it:

File 1:
“Column1”; “Column2”; “Column3”
“String1”;“String2”;“String3”
“String11”;“String12”;“String13”

File 2:
“Column4”; “Column5”; “Column6”; “Column7”
“String4”;“String2”;“String5”;“String6”
“String14”;“String12”;“String15”;“String16”

1 Like

@student96, thanks for the clarification on the example.

So can you confirm what actually constitutes two columns being “linked”? Is it that ALL rows for column can be matched to a row in another column, or is it that at least one row has to match, or is it that a significant number of rows have to match? Eg more than 50% for example?

2 Likes

Hi @student96 , in addition to the clarification that @takbb is asking, would a column from a table match only to one column in the other table, or could it match to more than 1 column? And if so, can the following data happen?
File 2:
“Column4” ; “Column5” ; “Column6” ; “Column7”
“String4” ; “String2” ; “String12”; “String6”
“String14”; “String12”; “String2” ; “String16”
Notice Column5 and Column6, they would both match Column2, but the matches will return different rows.

And, can multiple columns from File 1 match with the same column from File 2? Similarly with the above comments, but obviously the reversed data.

And, can we have multiple columns from File 1 matching to unique columns from File 2? For example, can we have Column1 match with Column4 and also Column2 match with Column5?

And finally, can we have multiple columns from File 1 matching to multiple columns from File 2? (basically a combination of all of the above)

2 Likes

@bruno29a

Only this can happen:

“Column4” ; “Column5” ; “Column6” ; “Column7”
“String4” ; “String12” ; “String2”; “String6”
“String14”; “String12”; “String2” ; “String16”

And, can multiple columns from File 1 match with the same column from File 2? Similarly with the above comments, but obviously the reversed data.

No

And, can we have multiple columns from File 1 matching to unique columns from File 2? For example, can we have Column1 match with Column4 and also Column2 match with Column5?

Yes

And finally, can we have multiple columns from File 1 matching to multiple columns from File 2?

No

Thanks for clarifying my questions @student96 . Can you also answer what @takbb is asking - what are the files that make 2 columns to be “linked”?

2 Likes

@bruno29a

So I always have to compare column by column. I want to find out if one column contains data that also appears in the other table. I want to find out the connections. Some connections are clear to me, but I want to find out if there are also unknown connections between the data.

For me, it would be enough to take values from one column in the first table, and compare them to each value in the other table. Once there is a matching value, I know there are connections between the columns.

Hi @student96 , thank you for the additional comments, but I think we need more precisions I’m afraid.

Based on what you said:

Would the following example qualify as a connection between Column2 and Column5?
File 1:
“Column1”; “Column2”; “Column3”
“String1”;“String2”;“String3”
“String11”;“String12”;“String13”

File 2:
“Column4”; “Column5”; “Column6”; “Column7”
“String4”;“String2”;“String5”;“String6”
“String14”;“String12”;“String15”;“String16”
“String17”;“String18”;“String19”;“String20”

Based on what you said, this case would qualify, as the data in Column2 also appears in Column5, though Column5 has additional data.

We’re also not clear what defines “Once there is a matching value”. As @takbb asked, does this mean ALL rows are matching, or only a number of rows? And if it’s only a number of rows, what’s the magic number? 50%?

2 Likes

@bruno29a
I would like to compare each row until there is a matching value. I have adjusted the example a bit again:

File 1:
“Column1”; “Column2”; “Column3”
“String1”;“String2”;“String3”
“String11”;“String12”;“String13”

File 2:
“Column4”; “Column5”; “Column6”; “Column7”,“Column8”
“String4”;“String2”;“String5”;“String6”,“String2”
“String14”;“String12”;“String15”;“String16”,“String2”
“String17”;“”;“String19”;“String20”,“String12”

There are also sometimes empty values in a column for a row.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.