How to incorporate data elements from several KNIME tables into a single Excel sheet

Cosmo72 · November 24, 2022, 2:36am

Hello KNIME Community,

I am very new to KNIME, but learning so much from all of you each day!

I have a workflow problem that I am hoping someone can help with. I have eight tables generated through my workflow – one main table and seven supplementary tables – for which I need to merge one field from each table into the main table and write to an excel sheet. All eight tables contain the ID column to match on, but the seven supplementary tables may have the ID listed in more than one row. Also, the supplementary tables will not have all of the IDs in the main table, only a subset. And, to complicate things further, there are times when one or more of the tables can be blank due to not having data provided earlier in the workflow.

I am thinking that I probably need to use a loop and a joiner nodes, but I am not sure how to get the output that I need. Here is a simplified view of the scenario.

Here would be a simplified view of the tables to join:

And let’s say that Table 2 is an empty table with no results. Any ideas what the best way is to join this data?

I was able to achieve the desired result with several sequential joiner nodes, however, if a table is empty (which is common), it breaks the workflow. I was hoping to create something more dynamic, such as a loop, that would only process the data tables that are present at the time the workflow is run.

I would appreciate any ideas!!

Marlene

iCFO · November 24, 2022, 3:34am

Have you tried out the Empty Table Switch or the If Switch?

takbb · November 24, 2022, 9:23am

Hi @Cosmo72 ,

If your tables are all KNIME data tables from different parts of your workflow, then I don’t really see any way of accessing these via a loop.

On the face of it, a series of Joiner nodes should be all you need, but I agree that doesn’t scale well for a large number of tables being joined, in terms of the growing number of nodes required.

However, I’m also not sure why an empty table is causing your joins to break. Can you give an example of how that happens?

In this mockup, I’m representing tables that have been created in your workflow using CSV Readers, but let’s imagine that these are just a set of data tables from elsewhere in your workflow.

Even though Table B (Table 2) is empty, the workflow completes and generates the following:

The exclamation marks on the branch from TableB(2) are advisory warnings and don’t stop the workflow from completing.

If instead of individual tables in your workflow, you output the different tables from parts of your workflow into files (e.g. numbered CSV files), then I could see more potential for some kind of loop construct.

You could have a counting loop which causes numbered CSV (or other) files to be read in turn from a single folder, and joined to the main table. The results would be collected and the output at the end.

In the case where a CSV is actually missing (if that could occur), rather than just empty , a try-catch could be used to replace the missing file with an empty table, which again would function perfectly well.

Some additional work was required within the loop here to ensure rows were collected in the same order through each iteration, to avoid a misalignment of rows when the columns were combined on exiting the loop. There are possibly other (better) approaches that could be taken here, but time was short and it kind of worked so hopefully it gives some ideas.

Combining multiple sheets 1.knwf (149.1 KB)

Cosmo72 · November 25, 2022, 1:58pm

Hi @takbb ,

Thank you for taking the time to look into my issue. I looked closely at your mockup and despite being very similar to my actual workflow, there seems to be one fundamental difference. In the mockup, the csv contains the data structure, it does not contain data rows. In that case, yes, it will not cause the joiner node to fail. But in my case, the table does not exist, so when the joiner node tries to make the join, there is no column to join on - and so it does fail.

Is there any way to do a test like ’ if table exists, do the join, if not, skip to the next joiner node’ - probably wishful thinking! In the meantime, I have come up with a bit of a hack workaround, and that is earlier in my workflow, where the user is supposed to load in some data to feed into the process, I have created an empty file with column headers to use as the default. When this loads, the table structure is created with no data, and the workflow works as your does above.

I was also considering the empty table creator node, thinking that maybe that could be invoked when a data table does not exist, but I don’t see how to flow into that node from within my workflow. Is it only used at the beginning of a workflow?

I will also look at your combining multiple sheets workflow - as it may provide some ideas for my next iteration!

Thanks so much!

Marlene

takbb · November 25, 2022, 3:22pm

Hi @Cosmo72 , you’re welcome.

Would you be and to upload a small demo workflow to give an idea of what you mean when you say a table doesn’t exist? I’m trying to picture that situation or how it would appear. So from what you say you aren’t talking about an “empty table”. If I can better understand how the table gets to be in that state, maybe I can think of a way to handle it. Thanks

In the meantime, the closest I could think of was this where I have a Table Creator representing what I would consider a “missing” table. This is a data table with no rows or columns.

The empty table switch detects the empty table, but the trick is how to then do something about it. A Table Creator (creating a table with the required columns but zero rows) can be incorporated via a flow-variable link, but the way to add this only when the table is Empty is to supply an additional node that does nothing that can hangs off the lower port of the Empty Table Switch, merely as a means of passing the flow variable link to the Table Creator. This is an example of where a “dummy” node is sometimes called for.

I have generally used Add Empty Rows (and tell it to add zero rows) for this purpose, which is what I’m doing here, to demonstrate, although actually in my own workflows I now use a downloadable node called “NoOp” (see Simple trick for “swapping” out nodes - #7 by qqilihq ) as it better documents that it is doing nothing

Handle table with no rows or columns.knwf (23.6 KB)

Cosmo72 · November 25, 2022, 8:54pm

Hi @takbb ,

I have created a much simplified workflow that I think illustrates my problem. You will see that at the top there is a main table that contains book title information. This is the file that I want to update with information from the other sources. The two other sources that I have included are the ISBN data and the ISSN data. In this scenario, the user of the workflow does not have any ISSN data to provide. Because of this, when the data starts to be compiled for output to excel, the join fails because there is no data coming from the ISSN pipeline.

Table_does_not_exist.knwf (42.1 KB)

Thanks for helping!

Marlene

takbb · November 25, 2022, 9:06pm

Hi @Cosmo72, Thanks for the sample workflow, but could you execute it again and then reexport it and untick the “Reset Workflow(s) before export” checkbox, and then re-upload it so that we can see the sample data.

thanks

takbb · November 25, 2022, 10:03pm

Hi @Cosmo72 , actually I think I have something that should work although without your data files I cannot properly test it, so it would still be useful for you to re-upload the workflow as mentioned above if you can.

Basically, in a CSV Reader node you do have to specify a filename, as otherwise the node considers itself mis-configured and will not execute. So there will be a problem if you leave the filename blank. Usually, I would think that you would have a specific filename for the workflow to use and some other process would make that file available.

So, to make this work, you will need to put a name in the filename box. However, if there is no file available you could just put “x” in there. You just need to satisfy the node configuration.

If you put an invalid name such as “x”, the node will of course fail because the file does not exist, but failing is different to not executing and allows us to do something about it.

If you then wrap the csv reader between Try - Catch nodes, you can catch the failure and replace the non-existing file with an empty table containing the columns required by the Joiner.

The TRY node has a mandatory variable flow input port, so needs to have another node prior to it. This can be resolved by throwing in a Variable Creator, or other node that is set to do nothing. Then, fingers crossed, it will at least run.

In this screenshot, it isn’t running because the paths for all the other CSV readers aren’t available on my system, but hopefully they will work on yours

Table_does_not_exist with try-catch.knwf (65.9 KB)

Hopefully that gives an idea of a possible workaround. There might be better ways to do this from a user-experience point of view, especially in terms of specifying the file names. The variable creator, or other string node could for example be the place to enter the file name, and this could then be passed as a variable to the CSV Reader, but if they entered a blank string, then the CSV Reader could be bypassed and the empty table used in its place.

Cosmo72 · November 25, 2022, 11:34pm

Hi @takbb

Here is the workflow again. I will review what you’ve posted above!
Table_does_not_exist_new.knwf (67.9 KB)

takbb · November 26, 2022, 7:32pm

Hi @Cosmo72 ,

Thanks for uploading the workflow with sample data.

I used the sample data files (stored in the workflow’s data folder) to demo this component which may (or may not) be of interest. I’ve basically enclosed a CSV reader into a component utilising the technique from my previous post.

I may be going wide of the mark here now, as possibly your original workflow doesn’t have csv files, but if that is the case, I still don’t fully understand how you end up with tables “that don’t exist” in your KNIME workflow. so anyway this may give some additional ideas, or be useful to somebody else(!).

One of the problems with component solutions in KNIME though is that they then bury away all the settings of the thing you are trying to encapsulate and whilst some things can be exposed through configuration nodes, things like “file/folder browser” functionality can’t be easily replicated. However, something like this may suffice. Obviously the CSV Reader node has a lot of configurable options on it too, and not all of that could be replicated in the component. I have not tried to replicate any of the other settings, as it would be quite a painstaking (probably unworkable) exercise. If other settings on the CSV reader are required, then the best option would be to clone the component and adjust the settings to those required, and then reshare it yourself. If this solution is generally workable and you want more info on what I mean by that, then please ask and I’ll try to explain more fully.

So basically the component offers the following minimal settings:

Where the user can simply type in the name of the file (including the full file path if “absolute path” is chosen. Alternatively the file can be located relative to the workflow by choosing, as in the demo the “Workflow…” location options).

The other setting is the “Default Columns”. This is the minimal required set of columns that the component must return as an empty table if the file does not exist, or no file is specified. They are delimited using the | character.

This component can then be included in your workflow like this, which keeps things reasonably simple:

To make this work, I had to fix your joiners so I took a guess at what they should be. Given that you have “optional” files in your workflow, for it to end up producing anything, you need to change some or all of your joiners to include “unmatched” rows from one or other of the tables. If they only contain “matched” rows, and the user doesn’t provide anything to match, then clearly the joiner cannot return anything. As I reread your original post, I’m wondering if this is actually central to the problem you are experiencing? Are you only returning “matching” rows in your joiner and expecting it to return data when one of the tables being joined is empty?

Anyway, here is the above workflow demo

Table_does_not_exist_new with component.knwf (457.2 KB)

The CSV Reader (error-safe) component can be found here:

Cosmo72 · November 28, 2022, 5:49pm

Hi @takbb ,

There’s a lot to digest here…particularly for a newbie to KNIME! I think you have given me a lot to experiment with. I am particularly intrigued with the use of components. I think I now have enough to at least get over my initial problem and as I work with the potential users of this workflow, we’ll determine of more finessing is needed.

Thank you so much for taking the time to help out. It is great to know that there is such a knowledgeable and generous community behind this software!

Marlene

system · February 26, 2023, 5:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.