Comparing two tables, Find Header and choose options

Sebastian_Baran · August 29, 2024, 7:55am

Hello,

I would like to compare two tables. One created by me where all possible table headers are collected in rows.

The second table is a random Excel data table loaded into the analytic platform.

I would like the program to compare Table 1 and Table 2 and recognize what data I am dealing with (using the header). Then send this data to another method of processing this data (depending on which data was read in)

Probably the main problem is that this data has a different number of columns and the header does not always have to be in the first row.

Table with collection header

Nr	Column 1	Column 2	Column 3	Column 4	Column 5
1	AAA1-ExampleHeader	AAA2-ExampleHeader	AAA3-ExampleHeader	AAA4-ExampleHeader
2	BBB1-ExampleHeader	BBB2-ExampleHeader	BBB3-ExampleHeader
3	CCC1-ExampleHeader	CCC2-ExampleHeader	CCC3-ExampleHeader	CCC4-ExampleHeader	CCC5-ExampleHeader
4	DDD1-ExampleHeader	DDD2-ExampleHeader	DDD3-ExampleHeader	DDD4-ExampleHeader

Example excel data table

Column 1	Column 2	Column 3	Column 4
blablabla

DDD1-ExampleHeader	DDD2-ExampleHeader	DDD3-ExampleHeader	DDD4-ExampleHeader
Data-Column1A	Data-Column2A	Data-Column3A	Data-Column4A
Data-Column1B	Data-Column2B	Data-Column3B	Data-Column4B
Data-Column1C	Data-Column2C	Data-Column3C	Data-Column4C

Thx for your replies.

ipazin · August 29, 2024, 8:41am

Hello @Sebastian_Baran,

and welcome to KNIME Community!

If I got it right you can simply use Reference Row Filter to leave only one row in collection header table. Before that prepare your random data.

Something like this:
HeadersForum

In Reference Row Filter you can choose any column from collection header table. Assumption is that column names match. If that is not the case you should play regular expression…

Take a look here:

Br,
Ivan

takbb · August 29, 2024, 9:13am

Hi @Sebastian_Baran , in your actual data, will the column names always be completely different between the different table structures or will they actually have common column names?

ie. In your “example” data, you have AAA1-, BBB1- and so on, but I’m assuming that real life isn’t so straightforward, and that your column name list could be more like:

Nr	Column 1	Column 2	Column 3	Column 4	Column 5
1	ID	Name	City	Town
2	ID	Department	Project
3	CODE	Name	Joining Date	Department	Project
4	Name	ID	Location	Grade
5	ID	Name	City

so that there are column names “in common”, and you’d need to match ALL column names (and in the right order??), rather than just the first column name, to be sure of which type of data was represented?

Sebastian_Baran · August 29, 2024, 9:23am

Exactly as you write. I get various data tables with different headers, but I know these headers. Now I would like to automatically, after recognizing the header from the list, pass this data to the appropriate method of processing this data (New Metanode).

Edit
Recognition from just one column is not enough. Some headings may be repeated in the same places and only matching the whole (all the headings from the columns) differentiate them from others.

takbb · August 29, 2024, 9:54am

Thanks for the confirmation @Sebastian_Baran. Here is a workflow that may be able to assist. Replace my “delimited text reader” components (which just enabled me to quickly paste in your example data) with your actual nodes representing data/excel files.

I have used KNIME 5.2.5 for this.
It should work with any KNIME 5.x version. If you are using KNIME 4.x, I’ll have to modify it (and have a small rethink), as you won’t have the Table Splitter node. The Table Splitter enables searching a specific column for a particular value, and then splitting the table at that point. This was useful for the “header” row matching.

Sebastian_Baran · August 29, 2024, 11:58am

This solution is helpful. After some minor modifications it will work for me. Thank you

takbb · August 29, 2024, 12:11pm

That’s good to hear @Sebastian_Baran, and thank you for confirming the solution.

system · September 5, 2024, 12:12pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.