Load a table once and use it at several different points in the workflow

dobo · July 26, 2019, 11:41am

I do have data sets with many features that are in local currencies. The workflow needs to be processed in the local currencies. But many times during my workflow, I need to convert these figures into one common currency (EUR) in order to make them comparable. So at many points in the flow (within metanodes) I need to load a exchange rate table and transform the values. Imho, this is not efficient because, every time I load the table from disk, it uses up another portion of the memory and also because I tend to lose track of the nodes where I loaded this table. Loading it once and passing it to all nodes makes the flow chart seem way more complicated than it is.

Is there a way to load one data table once and then use a pointer to it? I guess I would still lose track on where I used the table. Unless, is a shared component exactly that? Which means I would create a shared component with the only purpose to load this table and then drag it into the workflow at the different places? And then if I delete the shared component I would find out exactly where I used this table because the metanodes will fail? Does a shared metanode that loads one table from disk only take up memory one single time, even if it is used many times in one workflow?

morpheus · July 26, 2019, 11:55am

Hi,
why do you load the same table several times? You can build multiple connections for each node within a workflow (including metanodes).

dobo · July 26, 2019, 12:36pm

Hi @morpheus,

are you referring to my quote “Loading it once and passing it to all nodes makes the flow chart seem way more complicated than it is.” ? Like, you load it once and then pass the table to each node which needs to do a calculation based on it? If this is what you mean, this leads to a very messy overview of the whole flow.

Best,
Dominic

ipazin · July 26, 2019, 3:27pm

Hi there Dominic,

Don’t think so. Although this could be useful in certain use cases it is not the way KNIME works.

Nop. But you can use Component and drag it into workflow every time you need it (if you create a template out of it). Here is more on Components and Metanode history: KNIME Analytics Platform: Components are for Sharing | KNIME

No. The link will be broken (if there is one created) but nodes within Component are still there and executable.

Don’t think so. But maybe this could be a feature request!

After lot of Noes here is a possible workaround for you depending on number of columns in your table. Read table once. Use Table Column to Variable node (multiple times if needed) and merge all variables using Merge Variables node (prior to that you will have to change row IDs to be unique). If you do this in the start of your workflow in every subsequent node you will have these flow variables. Now if you can use them to make comparison happen that’s it. If no you need to construct your table from these flow variables. You will for sure need Variable to Table Column node and some manipulation nodes depending on your original table (if you share it I can try to make an example). Once you did construct it create a Component out of it and save it as a template to use it where you want in that or any other workflow…

Hope this helps!

Br,
Ivan

dobo · July 29, 2019, 7:43am

Thank you very much for this detailed explanation! And also thank you for pointing to a possible solution over variables. I have been thinking about this as well a bit, but figured the marshalling and unmarshalling would add too much complexity, as all other approaches do. However, your approach would at least make it possible to ensure data is only loaded once and also to find out where the data is used (since the workflow would fail if the marshalling is disconnected). Having a component that does the unmarshalling would also not add too much complexity to the workflow. I like!

This seems to be the most feasible approach as of today, therefore accepted!

dobo · July 29, 2019, 8:33am

My current implementation:

Marshalling:

Load the table
“Column Aggregate” all relevant columns into one (split by “,” like CSV)
“GroupBy” all rows into one (split by “|”), which finalizes the marshalling operations
Now that I have all information in one cell, I write this into a variable.
-> Step 1-3 can be wrapped into a shared component. Step 4 generates a variable which by definition stays contained in the component, therefore can’t be used outside. This is an interesting behaviour, since you can actually pass the variable inside out over an output port (Flowvariable), but it doesn’t arrive at the connecting node.

Unmarshalling:

Read the variable into a table row
“Cell Splitter” splits the only existing cell by “|” to get the rows
“Unpivoting” gets all generated columns into rows
“Cell Splitter” splits cells by “,” to get all previously marshalled columns
-> Step 2-4 can be wrapped into a shared component. Step 1 reads a variable which can’t be used inside.

And of course you need to be sure that no “,” or “|” characters are in your dataset, otherwise you need to add escaping/unescaping to the marshalling/unmarshalling.

system · August 5, 2019, 8:33am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.