replace empty cell with missing (?) globally in table

Another option would be a Python snippet and then with pandas you can do all the data mangling.

Dear @ipazin, Would you please take a look at the workflow I provided earlier and tell me if you think it works or not? Cause I think I could do the looping on every column and transforming all empty cells into missing.
The output of the loop would be a set of columns all in string type but I used “Column Auto Type Cast” to solve that. I couldn’t produce empty cells to test that. (Actually you have to read some data to test the workflow as I didn’t include my own)

Hi @armingrudd!

Sure. I have taken a look.

It seems it does not work. I guess it is up to logic in the Column Expressions node.

Also if you have non String column in you data set you will change those values to 0 which might be wrong. And if you have a lot of those columns and you loop last long your execution time might last much more then it should so I would leave them out of loop. Don’t see benefit of iterating over them. But a good idea is to use Column Auto Cast Type node :wink:

Here is a workflow for you to test.

Empty cells to missing.knwf (8.5 KB)

Br,
Ivan

I think you didn’t get this part correctly, I don’t change any non string value to 0. The first “if” statement in column expression exists because when the expression without this statement reaches any “0” value in integer columns, it transforms that value to missing, so I put an exception for "0"s, so that when the expression reaches those values it skips them like the other values. (I don’t know why but the expression act the same for empty cells and 0s).

Now that you provided a table with empty cells, I think the problem with my solution is that the expression cannot detect empty cells.
So what’s your opinion. How should I change this:
else if (column(variable(“currentColumnName”))==""){
missingValue();
}

It seems that simply using “” or “null” and even length = 0 or null doesn’t let us recognize empty cells here.
I guess fixing this can let us have a complete solution for the problem.
@Iris Would you please provide some solution here and let us know how to define empty cells in an expression?

The weird point here is when I check the length of values in “string manipulation” node, it returns 0 for those empty cells but when I use the condition “length(column(variable(“currentColumnName”))) == 0” in the statement, it doesn’t work.

Hi Both,

how about using the Column Auto Type Cast for this:

You can define a Missing Value pattern in the dialog. If you do this, it will automatically transform all emptys to Missing Values (And as a nice Side Effect the columns are automatically converted to the best fitting type)

https://hub.knime.com/knime/nodes/Column_Auto_Type_Cast*iQCmMxBEl2szprD5

Cheers, Iris

11 Likes

GREAT. That’s the solution, I didn’t have noticed this option before. Thank you so much.
But still I have my question: How to define empty cells in an expression?

if a String is empty this expression is true: column(“column1”)==0

And as you were first checking for 0 it always returned the orignal value.
If you first check for empty and return missing, than for 0 and return the value, it works.

Cheers, Iris

1 Like

Exactly! Thank you so much @Iris. Great lessons I learned here in this topic.
Just another minor question: How can I define the column type as a flow variable here in the loop, so that I don’t have to use “column auto type cast” after the loop. (I wish it could be possible from the “column list loop start” like the other variable “column name”)

I just send the same question to @moritz.heine who wrote the node. Let’s see what he tells us.

1 Like

Hi!

It seems the best to just use the Column Auto Cast Type node. Only you lose a bit of control with auto casting (don’t like it) and if you have a iteration per every column you can do additional things so I still like loop, column expressions node and toNull() function :smiley:

Glad you two found a same language :slight_smile:

Br,
Ivan

2 Likes

So, Here is the final workflow to solve the problem.
Empty cells to missing.knwf (36.6 KB)
2 methods are provided: 1st and the easier is using “Column Auto Type Cast” and using “empty” as the missing value pattern. 2nd is using loop and column expression nodes. Using column auto type cast is mandatory after the loop end to determine original column types. There may be a way to use column types as a flow variable in the column expression node so that we don’t need the column auto type cast node after the loop end.
Good luck everyone.

1 Like

Hi,

sorry for the late answer. Using flow variables to determine the data type of the output column is the same as using it in the Java Snippet Node. Unfortunately, as we discovered it is a little bit tricky to do so.
To change the output type using a flow variable you have to set ‘cell_class’ of outputType to the correct value. Here are the most commonly used types:

String - “org.knime.core.data.def.StringCell”
Double - “org.knime.core.data.def.DoubleCell”
Int - “org.knime.core.data.def.IntCell”

If you need more, just let me know. We will also discuss on how to make it easier for the user.

Cheers,
Moritz

1 Like

Hi @armingrudd!

not sure you got it but my first reply actually does solve it. In a loop and without Column Auto Type Cast Node :slight_smile:

Br,
Ivan

Hey @ipazin,
No I think I didn’t get it there clearly. Would you please provide the workflow instead of the image for the solutions?
As I understood, you filtered string columns to include for the loop and appended the rest of columns after the loop. Right? But I wanted to solve the problem in one loop without filtering and appending.
One thing I didn’t get at all is this part:

Would you please explain it?
Thanks.

Thank you @moritz.heine for taking the time and replying to us.
But I think I didn’t get the answer to my question. I was looking for a way to determine the column type automatically from the loop start like the other ones (currentColumnName and currentIteration ) and use it as a flow variable in other nodes (here column expression). Is it possible to add this feature to “Column List Loop Start” node to define the current column type as a flow variable?

Hi @armingrudd,

Sure. Workflow is attached. I usually don’t give people a workflow. Instead I give them explanation/idea how to do it and leave them to explore Knime and build their own workflow. I believe this is much more useful to them. If they have trouble making it I provide workflow.

Ok. Know I see. You wanted to do it without filtering and appending. Fine with me. Just I still think it is better to filter before the loop and append later on. Lets say for example you have 10 or more non string columns and more then 10 million rows. There is no need to go with all these cells in loop (Column Expressions node and possibly some other node) and converting them to string and later on outside of loop convert them again.

Remove filter node in my workflow, filter non string column in loop start node and then you will see that non string column is multiplied in your result (loop end node).

Column loop.knwf (25.6 KB)

Br,
Ivan

Yes, now I see what you are talking about. That’s because you used “toNull()” function. But I used this expression to find and convert empty cells: (This approach doesn’t make any faults on non string columns except as there is no flow variable for current column type, they will lose their original type and will become string)

if (length(column(variable("currentColumnName"))) == 0){
    missingValue();
}

And one more thing is that when you provided your solution, I thought there might be the same issue with non string columns, so I decided to find a general solution so that even if there was the same problem in non string columns, the solution solves it.
It’s really great for me to have such discussions with someone like you be cause I learn a lot.
Thanks. :heart_eyes::blush:

2 Likes

Ok. I’m glad. Helps me as well :slight_smile:
Br,
Ivan

Okay thank you for your feedback. I will redirect this feature request and we will discuss this internally.

Cheers,
Moritz

1 Like