Referencing Previous Column with Array in Column Expression

Dear KNIME Community,

I have the following report:

After importing the report into KNIME, it looks like this:

My goal is add the missing date information to the empty cells in Row0. In the end column D, E und F of Row 0 should contain “2020-01-01” and columns H, I and J of Row0 should contain “2020-02-01”.

I’ve tried adapting the solution from @ipazin from this post: Difference versus Previous Column - #7 by ipazin

After importing the file, I use a Row Splitter Node to separate Row0 from the rest and then apply the Column Expression Node to Row0.

The expression in my Column Expression currently looks like this:

arr=arrayCreate(column(0))
var i;
for (i = 1; i < numberOfColumns()-1; i++) {
        arr=arrayAdd(arr,column(i))
        if(arr[i-1]="?"){
            arr[i-1]=arr[i-2]
        }
        else{
            arr[i-1]=arr[i-1]
        };
}

But something must be wrong with the expression because I always get the following error message:

Any ideas on how to fix that?

Best,
Henan

Hi @Henan , it’s always best to share some data that we can work with to provide you with a solution that reflects the results that you want.

The code that you show is overkill for what you need to do. Here’s a simpler approach to this that makes use of the Missing Value node:
image

The Missing Value node allows you to replace empty/missing values with, for example the value of the previous row in the same column. And because it does this for columns, I first transpose the row Row0 so that it’s in a column, and apply the Missing Value, and then transpose back.

Input - Don’t mind the values as it’s tedious to type values from a screenshot, it’s the empty cells that are important:

Results:

Here’s the workflow: Filling empty column with value of previous column.knwf (14.4 KB)

5 Likes

Thanks a lot, @bruno29a ! Your workflow works great!!

I’m curious though if I could have also solved this problem with the Column Expression Node. Do you happen to know what’s the problem with the code I’ve shared?

ps: Sorry for not uploading the dataset. I’ve added it to this post.

Example Array.xlsx (10.8 KB)

Hi @Henan , I can’t really answer this without seeing the whole code. It looks to be complaining that you are generating/converting a result whose type is undefined, to type List.

One thing though:
if(arr[i-1]="?")

This is not the correct way to do comparison. For comparing, you should use double equal ==. A single equal is an assignment, like when you are doing arr[i-1]=arr[i-2]. But that’s not how you check for missing values. If you are checking for “?”, you are actually checking for the string “?”.

Knime simply displays an empty cell/missing value with a red “?”, it’s for display only, it does not mean that there is actually a “?” there - and that is why it’s red as opposed to black. It’s to distinguish between an empty cell/missing value (red ?) and an actual string “?” (black ?). As you can see, all of your string cells have values in black. A black “?” means there is a string “?” there. A red “?” means it’s empty, so you are not going to check for empty cells with arr[i-1]=="?".

The correct function to use is the isMissing() function.

2 Likes

Thanks for your tips @bruno29a !

I couldn’t make the expression work yet, but I’ll keep trying. I’ve added a sample workflow in case anyone is interested in the expression I’m using (even though it’s still incorrect).

Replace Missing Values with Expression.knwf (14.7 KB)

Hi @Henan, keep in mind also that what you have in your Column Expression will be applied on ALL the rows. As I mentioned, it might not be applicable for what you need.

1 Like

Hi @bruno29a , that’s why I’ve added a Row Splitter Node before the Column Expression which separates the first row with the missing values from the rest of the dataset. The Column Expression is only being applied to this one row.

Unlike the Java Snippet node that allows arbitrary code, the Column Expressions has one separate expression for each output column. You can still access (and iterate over) all input columns though.

This means that whatever result you create will sit in a single column. You’d have to create a separate expression for every column that has a missing value, effectively hardcoding everything to assign the value of Column3 or Column 7 respectively (without the need for any loops).

So the answer is yes, but it will be ugly. Brunos solution is elegant and scales without further configuration.

1 Like

I tend to rely heavily on the column expression node as well. I am building a component that I call the “Change Engine” which uses excel controlled formula construction to generate a “condition test” and then can apply “changes” various columns that meet the criteria. Eventually I will also build “else if” branches as well. I am bringing everything into the Column Expression nodes via Flow Variables which allows you to designate the column the formula will apply to on changes.

You can do a “condition test” via a column expression node (simple formula tests like columns1==“Revenue” will output true or false to a new test column, then pass it to a rule splitter node that routes the rows that meet your criteria toward “change” formulas or bypass, multiple column changes can then be applied via the column expression node via flow variables and then everything is joined back at the end.

I will probably share my Change Engine setup eventually when I have dialed in and simplified the dropdown based excel controller.

2 Likes

@iCFO please do share. That would be great :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.