method request for big data

Hello, I’m trying to compile some data but haven’t been able to. I’d appreciate any help or suggestions on methods.

I have three tables. The first table has 4,000 entries, and the other two tables have 1 million entries each.

  1. A B C

  2. K_in L_in H_in P_in T_in A B C

  3. K_out L_out H_out A B C

Here’s what I want to do. If the A and B values in the table match the A and B values in the second table, then take the K_in = K_out, L_in = L_out, and H_in = H_out data from the second and third tables and add the matching rows to a single table.

Then, take the A and B values from the third table and search for them again in the second table. Take the matching data where K_in = K_out, L_in = L_out, and H_in = H_out, and add them to the bottom of the table created earlier.

I want to repeat this process 20 times. I did it as shown in the image, but it wasn’t very efficient. It works, but I’m not sure about its accuracy.

@msg90 moste likely you will have to use some sort of loop. Is it possible to provide some sample data (with dummy values) and try again to describe what should happen with which column from each table. This would be easier to follow.

https://medium.com/low-code-for-advanced-data-science/knime-paths-and-loops-automate-everything-213455788685

I think there is still a gap in the description, but this sounds like you just want to do 2 inner joins and duplicate the matching data 20 times?

you could just do those two inner joins and then either loop the result 20 times, or pipe it 20 times into a concatenate node

First of all thank you for your interest. I just upload sample review.

if 1st table’s red values = 2nd table’s value

look for a second case, if 2nd table’s blue values = 3rd table’s blue value

save the this row a another table and contunie,

which table is equal blue in 3rd table, green value is search in 2nd table again and again

@msg90 I tried a few things with something resembling your data and some dummy data. The dummy data does work your ‘data’ does not. Maybe you can provide some real data and also the desired outcome or you can try and adapt the example with three databases and a loop using either DuckDB or H2 for large datasets that you might want to process.

I think you will have some work to doe defining your exact task.

1 Like

working around Knimes inability for advanced functions, you could still do this “rather” easily:

concat the key columns to have 1 column keys only.

reference row filter your first condition (red). after that chunk loop each record and iterative loop, using again a reference row filter, until it doesnt find a match.

after closing all the loops, you can still join / value lookup all necessary information from the first and second table.

Other tools include build-in solutions to this kind of parent-child-hierarchy problem - Knime doesnt.

Thank you so much , i just setup DuckDB but i’m not sure how can i connect it wiht H2 connector. Do you have any training

@msg90 DuckDB and H2 are two different local databases. I have put both examples in the workflow.

You can read more about knime and databases here

https://medium.com/low-code-for-advanced-data-science/knime-databases-and-sql-273e27c9702a

1 Like