# Duplicate values table joining

Hello,

I get an error joining tables. I suspect it might come from the values, but I don’t see a way to solve it.

I have two tables. Imagine:

TABLE 1 sanction proposal
plate_number | date | other values
2345A | 2021-02-03 | other values
6735B | 2021-03-04 | other
2345A | 2021-02-03 | mre values
8754C | 2021-01-04 | more values

TABLE 2 fines
plate number | date | some more values
2345A | 2021-02-03 | even more values
6735B | 2021-03-04 | more and more values
2345A | 2021-02-03 | values and values

I want the results that match both tables, so I make an inner join. But the result is as follows:

TABLE RESULT INNER JOIN
plate | date| values and values
2345A | 2021-02-03 | even more values
2345A | 2021-02-03 | even more values
6735B | 2021-03-04 | more and more values
2345A | 2021-02-03 | values and values
2345A | 2021-02-03 | even more values

As you can see, I get 4 times plate number 2345A. I imagine that the plate in table 2 is checked twice and I receive 2x2 reults, as it matches two times in each check. I don’t really know if a join should act like this. But the thing is that plate and date are the only joining indexes and both must be in the join. Imagine that it is a sanction table, so a plate can be sanctioned twice. I don’t have the time detail (which would help me solving this), only date and plate.

Any idea how to avoid the duplicity in the results?

Hi @jquadrada, unfortunately the duplication is essentially in your data already and the join is doing exactly what a join should do in this situation. It is giving you every combination that it can for joining based on your criteria as it has no other information about which results you want.

Take for example the following returned rows

2345A | 2021-02-03 | values and values
2345A | 2021-02-03 | even more values

It has joined correctly based on your criteria. There is no basis by which it could determine which of those rows you think is the correct result. If you were doing it manually, how would you decide?

If you really don’t care which result is correct, you can follow the join with a duplicate row filter node, telling it to remove duplication based on the same columns you are using in your join and then tell it which to keep based for example on first or last row found. I hope that helps.

3 Likes

Great! Thank you so much!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.