I have a semi-structured dataset. It consists of 500+ rows and 100+ variables.
As you can see below there is some structure; the YEAR preceeds a fixed number of variables (in the example 4 variables). The values of the variables are string type or missing).
I puzzled for a while to filter dynamically the right columns. For example to retrieve all the data regarding e.g. 2020. or 2022. See the screenshot below.
Hi @sanderlenselink , since you have a fixed number of columns in a set, I think the logic would be to move the horizontal data into vertical, where you would end up with 8 columns in total:
Company, Year, and the 6 vars
I believe the unpivot node can help with this - I understand what pivot and unpivot can do, but I don’t quite understand how to use them (I’m not an Excel user per se )
Once you have the data in this form, you can then do row filter based on year, and/or on company as you like.
Hi @sanderlenselink , as I mentioned, I am not too familiar with working with pivot/unpivot, so I have tried an alternative way via transpose to reach the unpivot results.
In this case, I want to transpose every 5 columns (every 7 columns in your real dataset) AFTER the Company column. I was trying to do this with Chunk Loop. I know how to use it with rows, but was not sure how to proceed with columns, and because of this, I transpose the whole data set so that all columns would become rows, and then proceeded transposing by chunks.
Result at the end of the loop (giving what unpivoting would give):
After that, you can filter as you like, in this case I filtered on year 2018. You can do this via a row filter, or this can also be done interactively with a table view.
here is how it would work with Unpivoting. The upper part is only necessary if the number of value columns is changing. Dynamic Column Filtering_ipazin.knwf (30.8 KB)
Thanks @ipazin .
I used window loop + moving aggregation for the first part of your workflow to see if that works too.
Sometimes I got the feeling that 99% of questions can be answered with unpivot or regex
best regards
you are welcome. Did window loop + moving aggregation worked? Mind sharing workflow to check it out? (Un)pivoting is really powerful operation when doing above and similar data transformations. And big plus is that it usually replaces long lasting loops. And regarding regex although I’m not a big fan of it have the same feeling
I am very sorry I did not save it (will do next time so more people benefit from it). Yes it worked. The main idea was just to see whether I can reproduce the 1-5 counter you implemented in your workflow.
best regards and enjoy your weekend
Ok. I see now what you meant by “first part of workflow”. Had a trouble with this part as couldn’t find easy way to determine wanted number…
Tnx and you too!
Ivan
Hi all,
many thnx for the ideas. They worked for me.
The idea of @Daniel_Weikert . . . can you explain a bit more in detail what you did? I looked at window-loop and moving-aggregation but I don’t see the clue (that’s fully my “mistake”)
all I did was just borrowing a part @ipazin nice workflow for educational purposes and see whether I can get the repeating count to 1-5 with the two nodes
Great work and thanks for sharing @sanderlenselink . This is certainly helpful for everyone here so your contribution is highly appreciated.
best regards
you can also share workflows on KNIME Hub. Additionally you can add file(s) into data folder within workflow directory and use relative path(s) in order for workflow to be executed by others when downloaded. See here a bit more about it: