How can I replace content of a cell in a row with another cell in the same row when a condition is true?

Hello,
I have a data set of cities and provinces in two columns, but in a few rows there are the names of the provinces instead of cities. For example, there is “bushehr province” in the ‘city’ column while it should be in the ‘province’ column. I want to replace the content of the cell in the column of ‘province’ with the content of the ‘city’ column when there is the word “province” in the end of the content of the cell in the ‘city’ column.

The Rule Engine should be able to do this. If you have more complicated rules you might have to combine it with other nodes or use the Java Snippet.

2 Likes

Hi,

As @mlauber71 replied, you can solve this issue simply by the Rule Engine node.
I have seen such an issue before. There are 3 columns for City, Province and Country. If any city is missing the province and country shift left and the column for country has missing. If city and province are missing then the country goes to the City column and both province and country columns have missing.

To solve that with the Rule Engine node, If you have several countries, you may prefer to use Rule Engine (Dictionary), but if there is one country for example country x (with some preceding or subsequent word) and all your provinces have the word “province” subsequently, you can use 3 Rule Engine and a joiner. One Rule Engine for the country column and one for the province column and the third one to clean up the city column.

For country, use this (replaces the country column):
$city$ LIKE "*x*" =>$city$
$province$ LIKE "*x*" => $province$
$country$ LIKE "*x*" => $country$
(Do not define the TRUE => xxx statement so that you will have missing for the rows without a country or if you want you can add it with something like “No Country”)

You can use this rule for the province column (replaces the province column):
$city$ LIKE "*province" => $city$
$province$ LIKE "*province" => $province$
(The same here about TRUE => xxx statement)

Now that you have provinces and the country, you can clean up the city column.
If the both appended columns for provinces and country are not missing, you keep the value of this column as well (city), else, the city is missing.

Use another Rule Engine which gets its input directly from the original dataset (not after the last Rule Engine) and use this expression (the second statement is for the cases that only the province is missing):

NOT MISSING $country$ AND NOT MISSING $province$ => $city$
NOT $city$ LIKE "*province" AND NOT $city$ LIKE "*x*" => $city$
(The same here about TRUE => xxx statement)

And finally use a Joiner to join this new city column and those new province and country columns.

missing%20city

The input table:
input

And the final output:
output

The workflow:
missing city.knwf (21.2 KB)

Best,
Armin

3 Likes

Thank you very much for your reply. It seems my problem should be solved by ‘rule engine’ node as you said.

1 Like

Thank you Armin for the comprehensive reply. My problem is exactly as you described. I have some missing data and that’s why the name of provinces are in the column of cities.
I think your workflow is exactly what I need, I just need to apply it on my data to see what happens, and I will let you know how it worked.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.