Is there an easy 'Visual' way to track # of records throughout a workflow with multiple joins?

Hi Everyone,

Often my work involves a series of joins bringing together different datasets.
Sometimes these joins will have unexpected results due to a variety of factors; none of which are KNIME’s fault.

However, the key way I use to trace back and find the source of the issue is to look at the counts of records throughout.

My current approach is to right click and check the properties of the nodes and then type the record count into the node description.

This lets me visually walk through the workflow to see where records may be being added / dropped.

However, when I re-use the nodes with updated data - these counts change and I have to go through the same manual process again.

In SAS, I used to go back through the log, which would show me the same type of information as the code executed.

Is there an option / feature I am missing in KNIME where I could see this easily to help me track these sorts of things as a workflow is executed?

Is there something I can turn on in the log to show this?

Are there other workflow based solutions?

Thanks so much for your time,

Nathan

Hi @NKlassen

A way to keep track of this is through the Extract Table Dimension node. If you are only interested in the row count, an alternative approach could be with a GroupBy node.

Below test setup illustrates how it could look like:

Here:

  • I created a few dummy datasets and perform two joiners that alter the number of rows and columns in the main flow.
  • After each joiner, I branch the Extract Table Dimension to get the count at that particular point in the flow.
  • Because the node generates a generic column name, I rename it to designate where the count is associated with (joiner number X).
  • Next, I append all the numbers per joiner which results in an overview how the count progresses through the script.

The groupBy approach is comparable. This takes away all manual intervention. Just execute the Appender node.

If you want to use the row numbers as flow variable, the Row Counter component created by @bruno29a and @takbb is very suitable for this:

The inner working is comparable to what I illustrate in full in my test set-up (the Extract Table Dimension node also automatically creates flow variables btw).

Hopefully this provides some inspiration.

WF:
track records throughout WF.knwf (66.2 KB)

2 Likes

@NKlassen the function you are looking for might be Hiliting

1 Like

If this process is repeated many times and you want a visual method used each time, you could use flow variables and the Text Output Widget after each join and wrap your all your Joiner nodes and Text Output Widget in a component dashboard. If you’re unfamiliar with this process, see The Wonderful World of Widgets! A Mini-Guide for using KNIME Widget Nodes | KNIME

I’d be happy to help if you have any more questions.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.