I’m trying to use the Nominal Value Row Filter to filter a list of websites by the top-level domain (TLD). For example, I only want to include .com, .gov, .org, etc. I’ve created a column for TLD, and I processed a sample set of websites. It worked just fine. However, when I processed the full list of websites (about 3,000 records, and 98 TLDs) it seems to have broken my Nominal Value Row Filter. I now get the following error message:
I looked around on the forums, and saw a post from a few years ago mentioning a similar issue.
However, I don’t understand the solution presented in that thread. It seems like they are saying that I have to limit the number of values in my data to 60 somehow. I don’t really understand how to do that so that it processes all of my data. I tried just throwing in the “domain calculator” right before the Nominal Value Row Filter, but it didn’t do the trick.
I’m uploading a sample workflow below. Can anyone help me figure out how to make this work?
Hi,
I can’t look at your workflow since I am on mobile, but the problem should be easy to fix. Generally KNIME stores possible values of nominal columns, but only up to 60 unique values. If there are more, possible values are not stored. In the domain calculator under possible values you can specify in the bottom left, if my memory serves me right, how many values values should be stored. This overwrites the maximum of 60 for the table coming into the domain calculator. So just change that value from 60 to 2000 and you should be good. Please note that this could impact performance, though.
Kind regards
Alexander
You can also change the default from 60 to some other value by adding the following line to the knime.ini file:
-Dknime.domain.valuecount=1000
(where 1000 is whatever value you think is big enough to cover your needs and not so big as to gobble up all the available memory during node execution, as the nominal values are stored in memory as the output table is generated)
Thanks Alexander! I just didn’t know how to use the domain calculator properly. All I had to do was increase the number of possible values, and it worked fine!
Nice! Thanks for that tip. I think I’d rather just increase that on a case by case basis, but good to know I could modify the global setting if I needed.