Reference Column Filter with 'dynamic' reference table

Kazimierz · March 5, 2023, 6:17pm

Dear KNIMErs,

I have already build a workflow that meets my needs, however I’m searching for more elegant solution now.
My current workflow contains 200+ such components with identical structure:

Background info:

Input dataset cover 600+ columns.
There are 200+ product attributes columns among above mentioned 600+ columns.
There is 1 component per each product attribute.
Each component generates report containing selected rows (not important in this description) and a couple of columns with product attributes data (this is important).
List of columns (product attributes) changes for each product attribute.
Number of column in the outcome is dynamic (it means – it varies depending on product attribute specified before)

Key functionalities:

Constant Value Column node contains name of attribute the outcome is prepared for.
Table Creator contains list of columns (product attributes) to be used for attribute specified in Constant Value Column.

Question: is there a solution that could:

Let user specify input product attribute (selection from drop-down menu would be the best) as an equivalent of the current Constant Value Creator.
Use a table like this as reference table (200+ product attributes with various composition of columns for each product attribute)?

For_KNIME_forum_2903×174 62.9 KB

Simply: instead of 200+ components (portion visible below) I would prefer to have an elegant solution with short workflow covering all dependencies between product attributes and reference columns.

Thank you in advance for any suggestions,
Kaz

mlauber71 · March 6, 2023, 6:11am

@Kazimierz you could organize your column and data selection maybe with a component. I have an article and video about how to do visualization with python but you can modify this to be used with other output as well. Also check out the additional links at the end.

Also think about using the result from one component as input for the next.

Then you might want to create a sample workflow that would represent your challenge without spelling any secrets. That way the community might be able to solve the core of your problem. Also demonstrate what a desired outcome would be.

Kazimierz · March 6, 2023, 7:39am

Hi @mlauber71 and other KNIMErs,

Thank you for your prompt response.
It seems I described my goal unclearly. So, let me explain it once again.

First of all, my ‘output’ is 1 data table for each product attribute. Output data contains identified false data fields and all product attributes that are necessary to identify root-cause for false data fields. Output data requires human intervention in the next step. Thus, no special visualization is required.

In the very beginning, I was working with Excel. At that time, scrolling right and left to find appropriate columns, hiding unwanted columns and unhiding wanted ones, and re-doing all these again and again was a nightmare.

Currently, I’m using 1 component per each single product attribute I need to focus on. This gives me 200+ components, what is manageable, however could be improved.

In the future, I would like to use short workflow which interacts with the user (to grab product attribute for which the output should be prepared) and bases on a reference containing list of all product attributes and all those columns that are necessary to prepare the output for each product attribute.

In other words:

There is ‘big’ data table as the input.
There is ‘small’ data table as the output. However, user specifies which product attribute should be showed in output data table, while KNIME ‘knows’ (form reference table) which columns should be associated with product attribute specified by user.

I hope this helps a bit.

Kazimierz · March 6, 2023, 9:31am

I could use Column Selection Widget to let user select the name of product attribute.
Then I need a multi-line reference table to specify columns that are associated with all available product attributes. Now, I’m using Table Creator to specify columns associated with 1 product attribute, but don’t know how to apply this multi-line concept.

DanielBog · March 6, 2023, 1:10pm

Hey @Kazimierz,
this sounds very interesting, though I am not 100% sure I get what you are trying to do. Could you maybe provide a minimal example with dummy data to show what you try to achieve? Going with the column selection widget to create only the relevant tables sounds like a good idea.

Greetings,
Daniel

iCFO · March 6, 2023, 1:36pm

Perhaps incorporate the item code into the associated column names, then select specific items via a widget, then filter the column names by a Regex match with the associated item code? You could also rename the column names to remove the item code before export to clean up the export table presentation.

Kazimierz · March 6, 2023, 2:55pm

Hi @DanielBog & @iCFO
Thank you for your comments.
Let me describe the way I would like to proceed with the final solution:

I have data table as the input.
I select product attribute (out of 200+ product attributes).
KNIME recognizes selected product attributes and grabs the set of reference columns (that is specific for each product attribute).
KNIME prepares filtered table that contains reference columns for selected product attribute.

Analogy in Excel - dependent drop-down menu where:

User works with 2 drop-down lists.
Users selects item from the 1st drop-down list (which is fixed).
The 2nd drop-down list is dynamic: the set of available items depends on the item selected from the 1st drop-down list.

The table I have presented above shows the concept of the content of reference table:

where:

‘Attribute name’ column refers to the product attribute selected by the user (or item selected from the 1st Excel drop-down menu).
‘Reference col.#’ is the set of reference columns for each product attribute. ‘Reference col.1’ is the same for all product attributes in my example as it would contain ID.
Number of reference columns vary per each product attribute.
Mix of reference columns vary per each product attribute.

Currently, my workflow follows such a concept:
if product attribute = x then
apply Column Reference Filter x (with associated Table Creator x).
This means, I’m using 200+ such combinations of nodes, which is not very convenient, but works properly.

Does this sound more clear?

mlauber71 · March 6, 2023, 3:23pm

@Kazimierz I think such a construct would be possible. Can you provide data that would represent your task so one might build a sample workflow? Best to make the sample so that it would contain the possible quirks that you expect to encounter.

mlauber71 · March 6, 2023, 4:42pm

@Kazimierz I have built a demo that would illustrate how such a system could work. The interface of the component would look like this:

you select the item
the list of columns belonging to the item are dynamically selected
the basic table is filtered by the Item and only the referenced columns are being selected

When you press the Execute button the select would happen. And an Excel file with the data and the selected columns would be prepared. Also the results will be available further down the stream.

Maybe you check it out:

Kazimierz · March 7, 2023, 7:16am

@mlauber71 Your workflow:

looks simple
is quite intuitive, even if some nodes are new to me
works
provides the output I was expecting.

The only think I’m not sure how to proceed with is Collect Local Metadata component. It generates links to your local space, dropbox, workspace and forum. How could I updated all those links to my workspace?

mlauber71 · March 7, 2023, 7:21am

Glad you like the workflow. The component can be reset and will then provide a link to the absolute path where the workflow is. You can ignore it if you have other paths.

Kazimierz · March 7, 2023, 7:41am

Works like a charm!
Thank you so much @mlauber71

Next, I need to populate all Reader nodes with appropriate data. This will be a huge step forward.

Kazimierz · March 7, 2023, 8:14am

@mlauber71 One more question: is there any risk connected with replacing parquet inout files with Excel files?
I’ve already read items_df.xlsx (instead of items_df.parquet) and no errors reported. However, I’d prefer to be sure .xlsx will work fine before moving forward with other modifications.

mlauber71 · March 7, 2023, 8:21am

@Kazimierz I only use parquet to transfer data between Python/Jupyter and KNIME - I employed ChatGPT to create the sample data since I was too lazy to do it on my own. The notebook still is in the /data/ folder if you want to check.

Once the data is safely in KNIME (the black triangle ports) it does not matter what import you used. Excel though quite common is also sometimes not the best format to store data

Kazimierz · March 7, 2023, 2:46pm

@mlauber71 Luckily enough, one can work smarter with ChatGPT

system · March 14, 2023, 2:46pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.