Missing value node does not replace ? sign

Hi @clos, can you upload an actual sample of the csv file please (not just a screenshot). That way people can assist you with your specific problem. As I wrote above, I think your problem is that your data actually contains question marks, but only by seeing the physical example of the data can we be sure. Thanks

1 Like

Hi @clos,
As @takbb wrote, a question mark is a perfectly valid, normal string value. If you had a column “My favourite punctuation mark” with contents like !, ?, ., etc., how could KNIME distinguish between a ? that is meant as missing value and one actually having a meaning? This means you need to process your data with a different node than the Missing Value node. One possibility is the Rule Engine node with a rule like:

$yourcolumn$ = "?" => "replacement"
TRUE => $yourcolumn$

This will replace all “?” by “replacement” and keeps all other values the same, if you set the node to replace the column “yourcolumn”.
Kind regards,
Alexander

2 Likes

Hi @takbb

I am trying to upload my csv file but in Knime forums, apparently, this is not authorized file extension.

Do you know how I can to upload it?

Hi @clos, I had forgotten there is a restriction for csv files, and I don’t know why there is, but if you rename the extension to .txt, hopefully you can then upload that. thanks

Hi @takbb no problem.

I am sending the file, as you requested

Thanks
adult.txt (3.8 MB)

Hello @clos
You can test the following code into a ‘Rule Engine’ node:

$native-country$  MATCHES "^(?!\?$).*$" => $native-country$

As a the Rule Engine node works on a single column basis. You may want to deploy it in a column loop. Otherwise you will end up with a 15 Rule Engine nodes workflow, as the number of columns in your sample file.

BR

1 Like

Hi @clos
Thank you for uploading the file. Yes this confirms that it does indeed contain the literal “?” as data items, rather than having missing values. See how in the following output from the CSV Reader, they are black.

If you want the Missing Values node to be able to work on these, then you would have to convert “?” to “missing values” first.

As @gonhaddock has said, you can do this with the Rule Engine in a column loop to convert these.

Another way of doing this, using a single node would be to use String Manipulation (multi-column) node

Copy and paste the following code into that node:

$$CURRENTCOLUMN$$.equals("?")?null:$$CURRENTCOLUMN$$

You need to specify all of the String columns that might be affected by the presence of “?” values, as the function will work only on string columns, but that should be ok since numeric columns cannot contain a “?”.

Once that runs, the output will look like this, and you can see that the ? symbols are now red. As mentioned before, these cells no longer contain “?” but instead KNIME is displaying a red question mark to signify missing value.

After this, your missing value node should work as expected.
forum - missing values.knwf (414.5 KB)

I just realised that I’ve been here before :wink:

5 Likes

@takbb Master :tophat:

BR
@gordon.padawan

1 Like

@gonhaddock , lol… you should know by now I’ll try to find either a String Manipulation or Java snippet solution to every problem :wink:

4 Likes

Interestingly, having just looked back at that post I mentioned from 2 years ago, there is a suggestion that the File Reader node could handle this. Except that I just looked, and it can’t. So I was curious about that because from experience, @ipazin is never wrong about these things!

And it appears that the current File Reader node no longer has this “missing values pattern” option. I wonder why. :thinking:

See reference here to the deprecated version.

and what I have also just noticed is that this post also refers to the “adult.csv” file… this file is apparently in the KNIME example data sets (in the data folder of knime/Examples – 34_GDPR_examples – KNIME Community Hub), and has “?” as a placeholder for null ! :open_mouth:

I can now understand the confusion!!

[Edit: see further down:
Missing value node does not replace ? sign - #22 by ipazin]

5 Likes

Hi @takbb
Just for the records. An alternative code for your String Manipulation (Multi Column) current configuration, and based in regexMatcher()

regexMatcher($$CURRENTCOLUMN$$, "^(?!\\?$).*$").equals("True")
? $$CURRENTCOLUMN$$
: null

BR

2 Likes

Hi @gonhaddock Thank for your Support.

I am going to put in practice your recommendation.

Thanks for the tip.

1 Like

@takbb thanks so much

2 Likes

If memory serves me well at a certain point in time there were two file reader nodes - one of first KNIME nodes ever created, File Reader, which over years got many features including one I was probably mentioning in linked post and new, faster and simplified node for reading data into KNIME called Simple File Reader. Then somewhere along the process of introducing new file handling framework this was changed in a way that Simple File Reader became File Reader and File Reader became File Reader (Complex Format). So comparing File Reader from couple of years ago and now you are actually not comparing same node.

Br,
Ivan

3 Likes

Thanks @ipazin, I knew you’d have the answer! :slight_smile:

Yes I realised that the File Reader referenced in that previous post was now the “deprecated” one, but what I hadn’t spotted was the File Reader (Complex Format) which replaced the advanced features, and does indeed retain the ability to specify “missing values” patterns.

image

This then is actually the answer to @clos 's original problem… replace the CSV Reader with File Reader (Complex Format), and set the “Missing Pattern Value” to ?

Learn something new every day! :slight_smile:

5 Likes

This is fixed now. Folks should be able to upload CSVs (and PDFs as well).

Thanks for the heads up!

3 Likes

Thanks @ScottF

Of course I had to test! :slight_smile:
testfile.csv (49 Bytes)
testfile.pdf (74.8 KB)

Hi @ipazin. Interesting to know that. Thanks

1 Like

Hi @takbb you are rigth. Thank for test that node (complex format).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.