We need a new kind of join tool - Any Takers?

Ok, So maybe this is still a part of me thinking in Alteryx, but perhaps not so.
Some tools are like the steering wheel of car, you just need them.

There is plenty of good reason why a join tool that offers an Inner Join, Left and Right as three outputs concurrently makes a lot of sense from a data wrangling perspective.

For starters, debugging joins, being able to check quickly if there are any outer results means i can confirm my join conditions are correct, or that perhaps, some elements in my data need cleaning up.

Combined with a union tool, i can then decide if i want an left outer etc… and gives me more options to work my data.

Yes i know there are ways round it, but…

ok, i’m ready for the flack… hit me :slight_smile:

or, anyone fancy making this tool?

1 Like

Hi @Gavin_Attard,

Just so i understand you correctly.
You just want a component which has two data input ports (for the joining data) and then has as a result the joined data for each join-type (left, right, inner) as output table?

Then you can just put that together by yourself:
Add 3 joiner nodes:
(for each join type you want to check)

Then use the concate node to combine the results:

*optional you could add an Indikator which join-type belongs to which data using the string manipulation node

And finally put all together as a “component” so you can reuse in the future :slight_smile:

I think there won’t be an easier solution… as I don’t believe many will have such requirements…
But I somewhat see what you want…

2 Likes

Hi @AnotherFraudUser

Yes i figured there was some work around like that, but ti still doesnt meet the requirement.

For starters, the joiner tool only provides a left outer and a right outer, which is not a left and right output.
The point of a node like this is that i don’t need to pull out 3 nodes + config every time i want to do a join.

In terms of use case, very often we are working with data that is less than perfect, including join keys we choose to use.

By having a join tool that has 3 outputs for Left, Inner and Right, means that you can easily check if all the records joined on teh keys selected, and if not why and what does that mean in terms of the data you are working with.

The current joiner assumes you have 100% confidence in your join keys

Hi @Gavin_Attard,

“The point of a node like this is that i don’t need to pull out 3 nodes + config every time i want to do a join”
<-if you create a workaround with multiple nodes and then create a component, you will only have to pull and configure once!
You can save the component as quasi new node and then pull in the component “node” the next time you are needing it :slight_smile:
(also if you update the component later on you can also update multiple workflows at once - which is nice)

Hi @Gavin_Attard,

Maybe I don’t understand your problem completely, but the Joiner node mentioned by @AnotherFraudUser has the option of a Full Outer Join as well.
When you don’t remove the the joining columns on the Column selection tab, i.e. leave the both tickmarks blank as shown below, you will have either both joined columns filled (inner), the second join-column empty (left) or the first one empty (right).

afbeelding

Is that what you are looking for?

2 Likes

Hi @JanDuo

we are actually currently rewriting the Joiner node.
There will be various improvements, not only but of course speed as well.

Watch out for the 4.2. release.

8 Likes

Hi All

Appreciate the posts.
@AnotherFraudUser - Thanks for the steer on components, bit like Macros in Alteryx. Will give that a go to create the tool i need. Now i need to figure out how to get a Left and Right output, but i think @JanDuo you pointed me in the direction to be able to filter out the rows.

@Iris - Looking forward to the tool update.

2 Likes

@Gavin_Attard Great! :slight_smile:
Hope you manage to setup your tool!

@JanDuo actually throught about the full outer join but somehow thought it was missing in the node :confounded:
Thanks for sharing :slight_smile:

A faster joiner would be :heart_eyes:

Is it live in the nightly build?

3 Likes

No, this is really live in development right now.

2 Likes

@Gavin_Attard, this kind of node has been requested before and is quite high on our priority list for the joiner rewrite, which is in progress right now.

@AnotherFraudUser using components for this is a great workaround (I love components). However, directly integrating this into the joiner would be more performant, as we save the redundant computations.

5 Likes

Really great to hear - looking forward to it.

@CarlWitt - I also noticed that the joiner doesn’t seem to handle missing values very (null) well. Is that right?

Once i convert the nulls to a blank string the joiner works to expectation.

Hi there @Gavin_Attard,

what do you mean when you say the missings re not handled very well? Can you share some example?

Br,
Ivan

1 Like

Do you mean that a missing value doesn’t equal a missing value? This can be counterintuitive, but it is analogous to SQL, where also NULL ≠ NULL. I think the intuition is that a missing value is like a hole in a table. It shouldn’t be compared to anything else : )

2 Likes

To share my experience with missing values and join, whenever I have a table in which the joining key has missing values, there is a very significant decrease in the performance of the execution. It can be a bit frustrating if you have large volumes. My workaround is to filter the missing values first, perform the join and then concatenate then back. If the the new Join node could handle this, it would be great. Perhaps that is what Gavin indicates as well.

4 Likes

For me (with my SQL background) it would be strang when a join node has an option to include records which have missing values in one (or more) joining columns. I would never expect a join to keep those rows. For me it’s often the solution to get rid of rows with those empty values.

What would such an option have to do if both left and right tables have missing values in those columns?
If you would allow this on just one side and there are many rows with missing values on that side: which row is to be joined in this case?

3 Likes

@JanDueo,
Maybe it was not clear above, but I do not expect the missing values to be in the result. I was just indicating that their presence affects the performance of the execution, sometimes even crashing. If the Join could anticipate that, I think that would be beneficial as it would mean not having to use additional nodes every time.

2 Likes

@toscanomatheus thanks for the clarification!

+1 for your idea on better anticipation on missing values (and a better performance).

1 Like

Hi All

Great response to this, all valid.

I attach a workflow with an example of observed working and then the new expectation (i simulated this by turning missing values to empty ).

I get that for many sql users this may seem a tad odd, on the other hand i find this super frustrating.
An example in sql of this is where i am searching for rows not matching a value and it omits the to return the nulls… bahhhhhh

anyhoot i digress…

In particular in my domain, we are very rarely working with clean join keys, indeed often the absence of a value is a important signal in and of itself and those rows cannot be discounted.
Changing to blank brings in other issues downstream where blank is hard to address… for example i can’t filter on empty string

Anyhoot i hope the example make sense. I think there should also be a thread for discussion on Null and Blank strings, so mething which i think Knime could handle a lot better.
Handling Null in join.knwf (16.5 KB)

kr

Gavin