Hive- join table with itself with inequality conditions

I would like to join a table with itself to find a parent of every line with an equality conditions. Also I need to keep only one parent by taking the one who’s the field “counts” is minimum. here’s a sample data:

product  date1           date2          version   counts
A.16    15/04/2021      17/04/2021      1         20
A.18    18/04/2021      16/04/2021      1         15
A.15    17/04/2021      18/03/2021      1         18
A.20    19/04/2021      20/03/2021      1         33
A.35    20/04/2021      30/03/2021      1          8

here’s the request:

select  b.product
      , b.date1
      , b.date2
      , b.version
      , c.product as parent
from pr_convert b
left join pr_convert c
on (c.date1< b.date1 and c.date2> b.date2)

This did not work, it threw me this error:
Both left and right aliases encountered in JOIN 'date1'
Then I tried to put the inequality condition in a where clause by doing this:

select  b.product
      , b.date1
      , b.date2
      , b.version
      , c.product as parent
from pr_convert b
left join pr_convert c
on b.version=c.version
where b.date1< c.date1 and b.date2> c.date2

As the value of the field “version” is the same for all records and my table being very large, it consumes a lot of resources and takes a lot of time.

Any help would be appreciated.

@Stellak welcome to the KNIME forum

I tried to understand your task but I still have some questions. What would constitute a ‘parent’. And could you provide a more complete sample that would have the problem where one pair would constitute a problem but for the “counts” column.

And also could you provide a result that would demonstrate what you want to have. With such examples it is best to create one that would represent your full challenge. So you might want to have one with more examples of “version” than just “1”.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.