Fill Cells if neighboring cell contains value

Hello there!
I am quite new to KNIME and I am lookin to a solution to my task:

I have a table with product data, such as length, width, weight (as numbers) … each row is a product record.
however products are not complete, so values may be filled or not.

I need to add a column for each value column with the unit of that value; but the unit must only be filled, when there is a value for that column. No value, not unit, that is.

length         length-unit      weight           weight-unit
123.45.        MILLIMETER       17.54            KILOGRAM
?              ?                 3.5             KILOGRAM
0.34.          MILLIMETER        ?               ?

Since I have dozens of these columns (and they are all different in respect to their completeness), I tried to solve this with a loop. But i just cannot get the rule engine or the string manipulation to work with the flow variable containing the respective column name. Although I have some basic understanding of code (not Java/-script), I don’t get to grips with the syntax.

I believe there must be a way to do that without java programming?
Thanks!

Hi @roberting , there are a few different ways to do this.

Can you should us what you have done so far? Can you share your workflow?

And also, do you have a mapping of which column should have which mapping? - I am assuming you have such mapping or rule since you attempted to do this via a loop.

1 Like

Hi @roberting

I don if I understood your question correctly, but I came up with this idea of a workflow
fill cells.knwf (46.0 KB)


I added a reference table and Rule Engine dictionary to do the mappings.

gr. Hans

2 Likes

Hi @roberting , I put something together, slightly different from @HansS 's:

I am also assuming that there might be other columns in your dataset, but only some of the columns need to have a corresponding -unit column, therefore I added an extra column called ID in my input data:
image

@HansS , fyi, this (having other columns that do not need to have a -unit column) is not handled in your workflow. As it stands, your workflow will create duplicated columns of ID :slight_smile:

Results:
image

Also, I added the logic directly in the Rule Engine as this:

NOT MISSING $generic_col_name$ AND $${ScurrentColumnName}$$ = "length" => "MILLIMETER"
NOT MISSING $generic_col_name$ AND $${ScurrentColumnName}$$ = "weight" => "KILOGRAM"

So, you can add whatever rules you want right there. This way, it will add the -unit column only for the columns that you add to the rules (length, weight, etc).

Here’s the workflow:
Fill Cells if neighboring cell contains value.knwf (19.6 KB)

3 Likes

Hi @bruno29a and @HansS,

thanks for your replies, which I haven’t yet had the time to evaluate in detail.
My approach looked similar to bruno29s’s, but I failed to get the expression right:

I use loop to cycle through the columns containing the values; the string manipulation (variable) creates names for the new columns (simple concatenation, adding “-unit” to Col name) and create a new constant value (empty) column. this new column should then be filled according to the rule node

however I fail to get the expression right; but this should be the most simple, straightforward expression, but it just doesn’t work?!

hi @bruno29a ,

thanks for your reply; but I do not understand the term:

NOT MISSING $generic_col_name$ AND $${ScurrentColumnName}$$ = "length" => "MILLIMETER"

what would be the “generic_col_name” and why would it be linked with “AND” to the actual column name?
I was thniking of

NOT MISSING $${ScurrentColumnName}$$ => "MILLIMETER"

In the long run, I want to avoid using fixed names (like “length” and units (“MILLIMETER”) in the expression, because there will be dozens of other columns with volume flow, electrical, and whatever data with other units …

Hello @roberting,

You can’t use column names stored in flow variables to reference column values in Rule Engine node(s) (String Manipulation, Math Formula also). There are couple of workarounds and most usual one is with changing column inside loop to a generic column name and then back after manipulation is done.

Regarding your task. From experience ideas/suggestions/solutions come faster and are more complete with input data (dummy works just fine) and desired output provided including logic behind it if not obvious :wink:

Br,
Ivan

3 Likes

Hi @roberting , the $generic_col_name$ is a generic name I used after renaming a column.

In my case, I was dealing with the 2 columns that you had in your example: columns “length” and “weight”.

Normally, if I want to use the column “length”, I would refer to it as “$length$”, and similarly for “weight” as “$weight$”. That would work if I was referring to each of them directly. However, here I’m trying to refer to them within a loop. So, I can’t do $length$ or $width$ in the loop, as the column name would change in each of the iterations.

In other words, in the first iteration of the loop, only $length$ will be available, and in the second iteration, only $weight$ will be available.

I had to find a way to use each of these columns “generically” in the loop. So I temporarily change the column being used in the iteration to “generic_col_name”. That’s done in the Column Rename node (3rd node), and after the column is processed, it gets renamed back to the original name via the other Column Rename.

That’s where $generic_col_name$ came from. It’s not linked to AND. The AND is just an AND operator. It’s part of the logic.

NOT MISSING $generic_col_name$ AND $${ScurrentColumnName}$$ = "length" => "MILLIMETER"
means if both conditions “NOT MISSING $generic_col_name$” and “$${ScurrentColumnName}$$ = "length"” are satisfied, then return the string “MILLIMETER

“In the long run, I want to avoid using fixed names (like “length” and units (“MILLIMETER”) in the expression, because there will be dozens of other columns with volume flow, electrical, and whatever data with other units”

Well, the mapping has to be defined somewhere, there is no getting out of this, since the unit values are different per columns. Whether it’s defined there, or via a mapping table, it’s still need to be defined - that’s why I was asking you if you had a mapping. You need a place where it says that units for “length” is “MILLIMETER”, for “weight” it’s “KILOGRAM”, etc.

I you have a mapping table, we can integrate it in the workflow and have the logic read from the mapping table.

1 Like

Hi @bruno29a ,

thanks so much for the explanation – thanks to @ipazin’s hint, that dynamic column-variables cannot be used in expressions, it becomes clear why you had to do the workaround with the column rename …

btw, I am aware that I will have to use some sort of mapping table or dictionary to feed the appropriate unit in my newly created columns; this is what I’m going to explore next … :wink:

best regards
Robert

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.