How about a "Table to Variable" and "Variable to Table" node?

Dear fellows,

I encountered several occasions where a certain table only serves as a helper table at many points of a complex workflow. For example an exchange rate table that is used at different points of the workflow to join against many tables.
It would be very handy if you could create a pointer to an existing table and store this in a variable which then later on in the workflow can be resolved back into the data table. This would basically need a “Table to Variable” and “Variable to Table” pair of nodes that work as counter parts. It would help me to keep workflows more tidy. But I also see that people would maybe use this too excessively, but then again with great power comes great responsibility :slight_smile:

Any thoughts on this?

Best,
Dominic

Hello @dobo

Have you tried these nodes that help converting data tables to flow variables and vice versa?

2 Likes

@dobo

There isn’t a set of nodes that converts an entire table to a variable and then reverses the process. However, you can achieve the same effect by serialising the table converting the serialised table to a variable and then reversing the process when you need it. Serialisation is the process of converting a collection of data objects and storing them in a single object, such as when you are saving an excel workbook in a file.

I’ve created a workflow that demonstrates the process using JSON as the serialisation format. It is not perfect and will not serialise all types of data. As the JSON format converts all numbers and dates to string it will cause errors due to the change in representation. Also, dates are converted to strings, but not automatically converted back. You should use this approach with caution. It does work, but you need to be aware of the cases when it does not.

The workflow can be downloaded from here.

Dummy data table

For testing purposes I created a table containing dates, integer, double and string values. The table has 20 rows.

Convert table to var

To convert the table to var - first serialise the table with a Table to JSON node. Note that the Aggregation Direction needs to be changed from the default Row-Oriented to Column-Oriented. If left on row-oriented it is not possible to recover the original table structure. This creates a one cell table containing the serialised original table. I added a column with the name that I wanted call the variable then set the RowID to that column. The one cell table is then converted to a variable - which is what you wanted. That variable can then be injected into the workflow so that it can be accessed later.

Convert var to table

Converting the variable back into a table is a reverse of the process. The variable is extracted from the workflow and appended to a one cell table. The string column is converted to JSON and the dummy column from the one cell table removed. The JSON is then converted to a table (part 1), however, the rows are grouped as a set in each column. This data is then ungrouped to retrieve the original table structure.

You now have your original table accessible from a variable stored in the workflow. Note that the date columns are strings, though numeric columns have been retained.

As components

It’s a faff to do that every time, so I created two components to convert table->var and var->table which are included in the workflow. The variable name is a configurable option. This shows how the workflow now looks with components.

Screenshot_20230125_113414

Hope that helps

DiaAzul
LinkedIn | Medium | GitHub

4 Likes

Hi Artem,
thank you very much for your quick suggestions! I use those nodes quite regularly but now I am really talking about a simple pointer to a whole data table. Using those nodes would clutter up my flow variables excessively and might not work for some data types such as images or documents.
Best,
Dominic

Hi @DiaAzul,

wow thank you very much for your lengthy and detailed answer!

In fact I did some similar marshalling and unmarshalling already in the past (Load a table once and use it at several different points in the workflow).

However, as you said this is not working for all data types and might be error prone. A standard implementation of creating a pointer to the table as a variable and a subsequent dereferenciation would be way more robust and, in my ideal world, wouldn’t be too complicated to implement in the background of KNIME :slight_smile:

Best,
Dominic

1 Like

I would agree with that. My conceptualisation of it was more visual, in that connections could tunnel across the worksheet i.e. a tunnel entry portal would create a reference to a node output port and a tunnel output portal would provide the data to a node input port. I like you idea as it helps make the calculation graph more explicit and provides a mechanism for naming the tunnels.

DiaAzul

1 Like