Hello,
I am trying to implement outer join between two data frames on KNIME Node. I have a list of keys to be joined on from df1 and df2 respectively as flow variables( since I am taking this as an input from user)
So basically for file A- I have filaA_key1,fileA_key2 & fileA_key3
similarly for File B I have fileB_key1,fileB_key2 & fileB_key3 ( am taking fixed 3 keys)
What I am looking for is a statement parallel to joindf = pd.merge(df1, df2, left_on = list1, right_on = list2, how = ‘left’) in Pyspark. So that I can define my left_on and right_on based on user Input. I referred to this link
Pyspark Merge Stack Overflow Article but to no avail Please see the below image attached.
How could I address this scenario, any help on Pyspark for the same for knime Node would be appreciated. I want to implement join on 2 df’s between multiple columns such that the columns names for the respective df’s come from a list , this list is built from flow variables (string type) .A new approach /code solution from scratch is also welcome.
Thanks!