Yes @bruno29a, I would like to see the data without the columns that have just one unique value
Maybe a bit late, but you could try this node: Constant Value Column Filter – KNIME Hub
Hi @tims , in that case you have to use what you are doing - and it actually make sense to download the data in this case.
Most probably the reason why it’s not working is about correctly using the “Enforce exclusion” vs “Enforce inclusion” (sort of “deny all except” vs “accept all except” or “blacklist” vs “whitelist”).
Can you show how you configured your Column Filter?
EDIT: I just saw @goodvirus 's post (I think it came in as I was writing mine). It looks like the Constant Value Column Filter suggested by @goodvirus actually does exactly the job that you are trying to do. In fact, if you don’t care about the count, you could even apply this node directly to your data without having to do a unique count. This node will check if the values are the same in the columns.
Nice share @goodvirus . Thanks for that, I never used this node.
Here’s a quick test @tims of the Constant Value Column Filter:
As expected, Column2 and Column5 are filtered out. Only the columns that do not have 1 unique value are retained.
@goodvirus constant value column filter is very convenient.
It is desirable to exclude more columns by increasing the number of columns having low unique count.
@bruno29a here’s the Column Filter. I tried both “Enforce” inclusion and exclusion.
Hi @tims ,
Can I ask why you are doing this? If its for machine learning, to remove constant column is valid (if the future data has the same constant values!), but even a low variance or low unique count could benefit your model and I would include it and if you have to remove a few columns, I would use a feature selection loop.
Hi @tims , thank you for the screenshots.
I’m not seeing the output of the Python script though. Based on the fact that you are using a Row Filter, I’m assuming that the output from the Python script is vertical (I had originally assumed that it would be horizontal). If it is indeed vertical, where in your Row Filter you are keeping those with value
<=1, I’m assuming that you still have a vertical result after the Row Filter, that is a table with multiple rows.
The Table Row to Variable will take only the first row, giving you a single column name.
In terms of Enforce inclusion and exclusion, in your case, you want to use Enforce exclusion (which is the equivalent of “allow all except those to be excluded”.
But why aren’t you using the Constant Value Column Filter instead? It’s a much better and simpler approach.
And to @goodvirus 's point, it might be better to tell us why you are doing this which can help us give you what the best approach would be.
Yes, constant filter is an excellent node. I am passing on the data to others in the team and would like to be prepared if they say - “let’s decide to get rid of columns with unique count 2 and 3 as well”
Yes, ROWID is returning multiple rows, but Table Row to Variable seems to be picking up only the first one. How do I exclude all these in the list?
Hi @tims ,
You can still use the Constant Value Column Filter even if you want to keep your unique counts. Just fork the data, like you are already doing, but like this:
And if I am to fit the above into your workflow, you would simply need to replace this:
Now, if you really want to use what you have done, or at least for your own knowledge, I’ll put something together that works. You basically need to create a list of the columns you want to exclude.
Here’s a quick demo that looks like this:
My demo varies a bit from your workflow as I am not reading from a DB nor doing the unique count via Python. It is still doing a unique count, but via the GroupBy node. You can think of my Table Creator as equivalent as your DB Reader node, and my RowID node as equivalent to your RowID node, meaning anything between these 2 nodes on my side are simulating what you have between these 2 nodes on your side.
As you see, I added a GroupBy node (Node 6) that will create a list of the columns I want to exclude.
We expect column2 and column4 to be columns we want to remove.
My RowID is as expected. It contains column2 and column4.
In my GroupBy, I create a List with the values in that column (column “exclude columns”):
Convert to variable:
Notice the type of the variable. The type is a list of String, denoted by
In my Column Filter, I want to use Enforce exclusion and pass my variable:
column2 and column4 are excluded.
Here’s the demo file: Remove data column after performing unique counts.knwf (17.4 KB)
The Table spec node can also extract the possible values which can then be counted and used as filter.
I borrowed @bruno29’s great solution
Thanks for sharing @bruno29a. Great work as always. Highly appreciate your contribution here
The solution is really cool. I was missing the following two steps based on your solution:
- Set to “Keep original names” in GroupBy
- Use GroupBy to get a column name list from RowID
I had some trouble though. There were a few column names with spaces and those starting with numbers. the workflow failed. I set “column2” to “column 2” in your workflow as an example of this.
I had to add solution from @mlauber71 to get rid of the spaces in between and numbers at the front to make the workflow work.
@Daniel_Weikert could you please share the configuration/KNIME hub workflow? What’s going on in Ungroup? I see empty lists:
Hi you probably see an empty list because you did not check the extract possible values box. The ungroup then ungroups the set returned by the extract table spec after checking the box
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.