Help Needed: -9999 Not Converting to Missing Values in KNIME Workflow

Hi everyone,

I’m having trouble replacing specific numeric values with missing values inside a looped CSV Reader, and I would really appreciate your help.

Context:
I have a workflow that loops through multiple CSV files (List Files/Folders → Table Row to Variable Loop Start → CSV Reader).
Each file contains several numeric columns where the value -9999 should be treated as missing.

The problem:
After reading the data with the CSV Reader, the -9999 values are still present.
I tried using Column Expressions (legacy) to replace -9999 with null, but the values are not being converted into missing cells as expected.

This is the expression I tested:

if (column(“PRECIPITAÇÃO TOTAL, HORÁRIO (mm)”) == -9999) null;
else column(“PRECIPITAÇÃO TOTAL, HORÁRIO (mm)”);

I selected Replace Column and changed the output type to Number (Double), but the -9999 values remain unchanged in the output table.

What I tried:

  • Changing the output type (String, Double, Float)
  • Making sure the expression runs on the correct column
  • Checking the data domain and column types
  • Using “Evaluate” inside the dialog, which seems to work, but the final output does not update
  • Testing inside and outside the loop (same behavior)

What I need:
I simply want every -9999 in numeric columns to become a true missing value (?) after the CSV Reader.
I’m not trying to convert anything to string — the columns should remain numeric.

I am attaching:

The CSV file I’m using for testing

If anyone has seen this behavior or knows the correct configuration to make Column Expressions replace numeric values with missing values, I would be extremely grateful.

Thanks in advance for any guidance!

Ricardo

INMET_CO_DF_A001_BRASILIA_01-01-2001_A_31-12-2001.CSV (802.0 KB)

Hi @RicardoLeles Welcome to KNIME Forum

For this I would use the Rule Engine node, and configure it like this:

$RADIACAO GLOBAL (KJ/m²)$ > -9998 => $RADIACAO GLOBAL (KJ/m²)$

Gr. Hans

2 Likes

@RicardoLeles you could take a look at this setup. Some preparations from the CSV. Convert some strings to numbers while setting the comma as decimal separator. Then iterate over these columns and set the -9999 to missing.

1 Like

Hi @HansS,

Thanks a lot for your reply and for the example with the Rule Engine. It works perfectly for one column! Lol! :wink:

However, in my case the dataset is a bit larger: I’m processing 10,030 CSV files, each with around 8,000 rows, and several numeric columns may contain the placeholder value -9999. At the moment, the CSV Reader is loading only one file for testing, but the loop will eventually handle all files in the folder.

Because of this, configuring the Rule Engine manually for every single column would be quite time-consuming and not very scalable.

My question is:
Is there a recommended way to replace -9999 across all numeric columns at once, without needing to create a separate rule for each column? Maybe something like a column expression, a wildcard rule, or a more automated approach inside the loop?

Thanks again for your help — I really appreciate it

Hi @mlauber71,

Thanks a lot for the example workflow!
I’ve opened it and I can see that you are using a loop to iterate over all numeric columns while applying a replacement rule for -9999 → missing.

This is exactly the kind of automated approach I was looking for, since my dataset contains many numeric columns and over 10,000 CSV files.

I’ll test your setup inside my workflow.
Thank you again for the clear and helpful reference!!!

:wink:

1 Like

@RicardoLeles if you have that many files you might want to invest a moment in thinking about error handling. Maybe set up a scenario where you track which files have already been successfully processed and which are still to go. Also, if your CSVs have varying headers, you can try and detect the start of the header and then process it automatically.

In this cases it often makes sense to spend some time planning. It will make your life easier down the road.

https://medium.com/low-code-for-advanced-data-science/knime-cases-switches-and-catching-errors-5ab748cbeaa3

if you dont mind replacing the values at the source, you can use the Files to Binary data node followed by Binary to String, text replace and write back.

if you want to replace just within Knime, use the Math Operation (Multi Column) node.

(you could also unpivot, replace and pivot, but this is computationally expensive)

1 Like

Hi,
you probably want to convert these columns into numerical ones, aren’t you?

I suggest to use the “string manipulation (multi column)” node:

  1. select all columns you want to manipulate

  2. replace all “-9999” with null and then convert to double

    toDouble(replace($$CURRENTCOLUMN$$,“-9999”,toEmpty((String)null)))

4 Likes