dynamic column filtering . . .

sanderlenselink · May 5, 2021, 11:48am

Hi all,

I have a semi-structured dataset. It consists of 500+ rows and 100+ variables.

As you can see below there is some structure; the YEAR preceeds a fixed number of variables (in the example 4 variables). The values of the variables are string type or missing).

I puzzled for a while to filter dynamically the right columns. For example to retrieve all the data regarding e.g. 2020. or 2022. See the screenshot below.

Is there someone with an idea to get this done?

Please find attached the example-dataset

KNIME_column_filtering_dynamically.xlsx (10.8 KB)

bruno29a · May 5, 2021, 12:33pm

Hi @sanderlenselink , do all the sets have 4 variables each? In other words, is it always:
Year_(x), var#1, var#2, var#3, var#4 ?

Do any of these headers repeat on other rows?

sanderlenselink · May 5, 2021, 1:20pm

Hi Bruno,

all sets have 6 variables each
(for simplicity reasons I took 4 variables in the example).

Every header/variable is unique (see screenshot from on the real data, hopefully that helps. )

sanderlenselink · May 5, 2021, 1:25pm

. . . the latest screenshot is with 3 variables. Don’t let that be confused (otherwise the picture become very confusing/complicated)

bruno29a · May 5, 2021, 2:16pm

Hi @sanderlenselink , since you have a fixed number of columns in a set, I think the logic would be to move the horizontal data into vertical, where you would end up with 8 columns in total:
Company, Year, and the 6 vars

I believe the unpivot node can help with this - I understand what pivot and unpivot can do, but I don’t quite understand how to use them (I’m not an Excel user per se )

Once you have the data in this form, you can then do row filter based on year, and/or on company as you like.

bruno29a · May 5, 2021, 3:30pm

Hi @sanderlenselink , as I mentioned, I am not too familiar with working with pivot/unpivot, so I have tried an alternative way via transpose to reach the unpivot results.

In this case, I want to transpose every 5 columns (every 7 columns in your real dataset) AFTER the Company column. I was trying to do this with Chunk Loop. I know how to use it with rows, but was not sure how to proceed with columns, and because of this, I transpose the whole data set so that all columns would become rows, and then proceeded transposing by chunks.

This is what the workflow looks like:

From the Excel Reader (basically the excel you provided):

Result at the end of the loop (giving what unpivoting would give):

After that, you can filter as you like, in this case I filtered on year 2018. You can do this via a row filter, or this can also be done interactively with a table view.

From Row filter:

From Table View:

Here’s the workflow: Dynamic Column Filtering.knwf (41.9 KB)

Note, if you have more variable columns in your real dataset as you mentioned, simply change the chunk size from the Chunk Loop:

ipazin · May 6, 2021, 10:24am

Hello @sanderlenselink,

here is how it would work with Unpivoting. The upper part is only necessary if the number of value columns is changing.
Dynamic Column Filtering_ipazin.knwf (30.8 KB)

Br,
Ivan

Daniel_Weikert · May 6, 2021, 5:33pm

Thanks @ipazin .
I used window loop + moving aggregation for the first part of your workflow to see if that works too.
Sometimes I got the feeling that 99% of questions can be answered with unpivot or regex
best regards

ipazin · May 7, 2021, 2:50pm

Hello @Daniel_Weikert,

you are welcome. Did window loop + moving aggregation worked? Mind sharing workflow to check it out? (Un)pivoting is really powerful operation when doing above and similar data transformations. And big plus is that it usually replaces long lasting loops. And regarding regex although I’m not a big fan of it have the same feeling

Br,
Ivan

Daniel_Weikert · May 7, 2021, 3:07pm

I am very sorry I did not save it (will do next time so more people benefit from it). Yes it worked. The main idea was just to see whether I can reproduce the 1-5 counter you implemented in your workflow.
best regards and enjoy your weekend

ipazin · May 7, 2021, 3:33pm

Ok. I see now what you meant by “first part of workflow”. Had a trouble with this part as couldn’t find easy way to determine wanted number…
Tnx and you too!
Ivan

sanderlenselink · May 12, 2021, 1:47pm

Hi all,
many thnx for the ideas. They worked for me.

The idea of @Daniel_Weikert . . . can you explain a bit more in detail what you did? I looked at window-loop and moving-aggregation but I don’t see the clue (that’s fully my “mistake”)

-Sander

Daniel_Weikert · May 13, 2021, 1:35pm

all I did was just borrowing a part @ipazin nice workflow for educational purposes and see whether I can get the repeating count to 1-5 with the two nodes
window

So it was only a part of the workflow. Hope that helps
br

sanderlenselink · May 17, 2021, 2:05pm

@Daniel_Weikert THNX

See the attached workflow where integrated the idea’s of bruno29a, ipazin and daniel.
SL_KNIME Dynamic Column Filtering_ALL_DEFDEF 20210517.knwf (64.6 KB)

So, 3 different approches produce the same result. For me an import learning to understand these methods.

In general I prefer short code and the method of daniel is really short.

THNX

-Sander

Daniel_Weikert · May 17, 2021, 4:41pm

Great work and thanks for sharing @sanderlenselink . This is certainly helpful for everyone here so your contribution is highly appreciated.
best regards

sanderlenselink · May 19, 2021, 8:23am

Hi all,

please find attached a renewed workflow with the 3 methods
(I made a mistake coping the 3rd method of @bruno29a . . . grrrrr.)

SL_KNIME Dynamic Column Filtering_ALL_DEFDEF 20210519.knwf (69.5 KB)

-Sander

ipazin · May 19, 2021, 9:23am

Hello @sanderlenselink,

you can also share workflows on KNIME Hub. Additionally you can add file(s) into data folder within workflow directory and use relative path(s) in order for workflow to be executed by others when downloaded. See here a bit more about it:

https://docs.knime.com/latest/analytics_platform_file_handling_guide/index.html#standard-fs

Br,
Ivan

system · November 17, 2021, 9:24pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.