I’m trying to filter out all columns that only contain boolean data (represented as 0 and 1 integers), and with “filter” I don’t actually mean to remove them but rather to treat them differently downstream. Most likely by splitting into two tables, then joining back after some manipulations.
The nearest I have come to my goal was by looking at the table spec, which includes the “domain” column, telling me in which range the data spreads. I know that I can safely update the domain data with the “Domain Calculator” node. But I haven’t yet found a way to access that data in any way.
So if there is any way to access the domain data in the table spec, please tell me. If not, how would I solve the problem then?
Hi @Raph0007, welcome to the forum.
If you convert the 0/1 to True/False and set the column type to Boolean, you can use use the Column Filter (or Column Splitter) node with the Type Selection option set to Boolean value.
Thanks for replying! Sadly, the columns of concern do not actually contain boolean data, but it only consists of zeroes and ones, so it’s semantically still boolean (although it is marked as integer). Column filter with type selection therefore won’t work just like that; is there any way to autocast integer values only containing 0 and 1 to boolean values? Then I could use type selection
This can be accomplished by the Rule Engine node:
$columnnamehere$ = 1 => TRUE
$columnnamehere$ = 0 => FALSE
I fear that will not work, since this would require me to know the columns containing boolean values in advance. The data table I am working with is full with integer values, presumably also containing zeroes and ones.
I could as well manually inspect the columns and use a column splitter to choose exactly the columns that I checked to be boolean, but I’d like to make this process automatic and dynamic, so that it will work with different tables as well. Is this possible?
@Raph0007 you could use statistics node or maybe group by to find integers that would contain only 1 and 0. Depending on your data you might have to combine them. And then you would use a loop to handle only those that would be suitable for booleans.
If you could give a sample that represents your data that could help generating new ideas.
I created this workflow seperate_boolean_data.knwf (33.7 KB)
It uses the Extract Table Spec node to save all possible values per column into a Collection. That makes it possible to filter columns with a collection size =2 (=boolean).
Thank you very much! The “Extract Table Spec” seemed to be exactly the node I was looking for! From that step on, I chose to take my own path (although it might be not as elegant / efficient). I used a rule-based row splitter which checks on the following spec values:
Type has to be “Number (integer)”
Lower bound has to be 0
Upper bound has to be 1
Then I transposed both outgoing branches and fed them as reference tables into two Reference Column Filter nodes, which both take the original table as their data table. Now I end up with two branches, one with all non-boolean columns and another one with all boolean columns.
Now I can safely apply “Low variance column filter” to the non-bools, without fearing to lose all boolean columns.
That’s it, thanks!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.