Joining data from two tables and comparing with third table

#1

I have 3 tables -Table A, Table B and table C

Table C is has processed data from Table A, table C has 2 new columns which takes data from the join of table A and table B based on the some join condition on the columnns .
I want to develop a workflow which compares the data in table C and table A in which I want to perform the join on table A and table B and then the result of this
Steps I have taken:

I have read data from table A and table B and now I have to use join condition like TableA.C1 = tableB.c1 and TableAC2-tableBc2, then 'new_col after join should have tableB.c2 values if join gives null then ‘new_col’ will have value string prefix to ‘Fe’ in column TableA.C1, once I have the data I want to compare the data of the join to the data in table C .
can you tell me the approach to perform this action.

0 Likes

Along with join ,use some condition
#2

Hi @analytics_sharma -

I think an example workflow including some sample data for each of the three tables, along with the steps you’ve already taken, would be helpful to post. Then you could maybe give an idea of what your output would ideally look like. It’s a bit hard to parse based on your description above.

0 Likes

#3

Hi,

I am attaching the sample workflow data (Not sure if this could be helpful, since I cannot share the data)
Table C is a table in cloud
1)Column ‘sname’ of Table C takes value as below

TableB.name=TableA.aname and TableB.Id=TableA.Id , fetch name from TableB. If the join is null, fill it from first portion of the name column TableA, (refix of ‘Fe’)

1)Column ‘name’ of table C takes value as below
TableB.name_new=TableA.aname and TableB.code=TableA.code , fetch name from TableB. If the join is null, TableB.name_new

I have to combined/joined data from Table A and TableB, and the result of which I want to compare with Table C data (to verify if the data in Table C is correct/ or is there any mismatch)

Test_demo.knwf (7.7 KB)

let me know if this is helpful if you can give me the approach to go ahead with the workflow

0 Likes

#4

I am really looking for the answer on this problem, can anyone suggest?

0 Likes

#5

Hi there @analytics_sharma,

I tried it too but not sure I managed to figure it out.

Lets see. You have three tables: A, B and C where C is somehow joined from A and B (and this is not done in KNIME). The thing you want now is to join again A and B in KNIME and then compare it to C you already got (in cloud) to verify its values. Am I somewhat close? If so are you having trouble joining A and B tables in KNIME or comparing joined table to existing C table?

Or I missed it completely? :smiley:

Br,
Ivan

0 Likes

#6

Hi Ivan,

Yes the trouble I am having while joining Table A and B, since I want to write 2 join queries for two different columns :slight_smile:
tableC.Col1 takes input from left join of table A and table B
tableC.col2 takes input from left join of table A and table B
Now to get theses columns I have to write two different join queries. I have used database reader to write one left join query , now where can I write the other join query, ? Is there a way if I can write 2 join queries in 1 databse selector node and sum the records?

0 Likes

#7

Hi there,

I see. There are multiple ways to do this. I would push all the logic down into database by using SQL Executor node. Then read newly created temporary table into KNIME. After finished reading drop the new table. Something like this:

JoinsInDB

Not sure you can use only one Table Selector node.

Br,
Ivan

0 Likes

#8

I am able to connect DB SQL Executor to hive, can you help me in creating a new table in which 2 columns of temporary table takes data from join queries ?? that would be of great help.

0 Likes

#9

OK I tried your sample data and uploaded them to a local Big Data environment where you could then use the generic KNIME DB nodes or SQL code. But to be honest I did not fully understand what you want to so. Maybe you try to explain it again or try to solve it with SQL.

An example of how you could determine what to do with nodes could be illustrated by this code.

A note: I would like to post screenshots but MacOS Catalina seems to have broken my trusted screenshot tool Greenshot.

DROP TABLE IF EXISTS default.table_join_c;

CREATE TABLE default.table_join_c
AS SELECT

CASE WHEN t1.Column1 IS NOT NULL
THEN t1.Column1 ELSE t2.Column1 END AS Column1
, CASE WHEN t1.Column1 IS NOT NULL
THEN t1.Id ELSE t2.Id END AS Id
, t1.aname

FROM default.table_a t1

FULL OUTER JOIN
default.table_b t2
ON t1.aname = t2.name
AND t1.Id = t2.Id
;

kn_example_hive_load_files_case_when.knwf (37.7 KB)

1 Like