Best practice for completeness checks when using join

Hi all,

I wanted to ask what the best practice is for handling Joiner node outputs, specifically in terms of not losing any data.

Example scenario:

  1. The main workflow is connected to a second table via Joiner. The second table is a shared Excel sheet that may be modified over time.
  2. I need to ensure that the Excel sheet contains all the keys from the main workflow and that no data from the main workflow remains unmatched.

I can check this manually by outputting Left unmatched rows and manually checking if the table is empty (as it should be). However, if the workflow is sufficiently large, it becomes unpractical to check each node manually.

Is there any kind of error reporting system that I can use for this? I know that Alteryx has a Test node that can error the entire workflow if the number of rows in a specified output table is not zero. Would there be anything similar to this in KNIME?

(I tried the Breakpoint node, but it only seems to work for tables that are empty, as opposed to tables that are not empty.)

Thank you in advance and greatly appreciate any advice.

Hey there,

I think if you set up the joiner node to send the different join results to the three different output ports you could catch any mismatches by using the Empty Table Switch Node for the middle and the bottom port:

In the example I used text widgets so in case you turn this into a component only the text widgets in the active branches will be shown, but you could build additional logic into the branches - one idea could be to use the Email Processing Extension to send an Email to someone if there are mismatches etc.

Depending on what you are doing downstream you could also do this validation first and only “write the final result” if the validation was passed…

Here is the WF that also has the additional three scenarios (left mismatch, right mismatch, additional keys in both):

joinervalidation.knwf (145.1 KB)

1 Like

Thank you, Martin. This is very helpful.

Is there a way to raise an error message in KNIME somehow? Anything akin to a simple JavaScript alert would be fine. Ideally, I would like to create a component that I can just plug in somewhere that will raise an error when there is an issue anywhere in the workflow. The email solution is fantastic for hosted workflows, but I think it may be a bit overkill for my local usage. :slight_smile:

well you are in luck :slight_smile: There’s a breakpoint node that you can configure so that it fails if it is in an active branch and allows you to specify a custom message.

Here’s an updated version where I added this in the “Additional Key in WF” example.

joinervalidation.knwf (151.3 KB)

5 Likes

Thank you again for the huge help, Martin! This works great for me and completely solves my problem. :partying_face:

Just one quick question that is tangentially related: is there a way for me to save the Empty Table Switch and Breakpoint combination as a node that I can easily reuse in other workflows?

I tried following this guide, but I can’t for the life of me get it to become a wrapped metanode. Would you know what the best practice is for saving pre-made batches of components at the moment? I am using the modern UI and working in a local environment.

1 Like

I think the best option is to turn it into a component. That allows you to add a string configuration node to configure the custom error message:

I’ve updated the example workflow
joinervalidation.knwf (179.6 KB)

And hae also turned that idea into a shared component which you can access on the KNIME Community hub:

If you want to learn how to do that yourself I recommend to check out this Guide - that I think is also the updated version of what you read about Meta Nodes:

https://docs.knime.com/latest/analytics_platform_components_guide/index.html#introduction

3 Likes

Thank you, Martin, this is literally perfect and more than I could possibly have hoped for! Wishing you a fantastic weekend ahead and many thanks once again!

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.