Hi, I have a table which has a column with some missing values, as long as the data is properly sorted, I can just use the value from the prior row. Because I might have more than 1 row of missing data at a time, I decided to use a loop.
The loop does the following
- Sorts the data
- Create a lag column on source data
- If the column is empty, use the value from the lag column
- Remove lag column so that I don’t need to worry about the lag column name changing on me on every iteration
- check if all rows in source column have a value, if not repeat loop
The loop seems to work fine for iteration 0, but then nothing happens on subsequent iterations. Can anyone tell me what I’m doing wrong?
My workflow is currently set to finish after 3 iterations, but it should ideally finish when Sum(rows_to_fix) = 0
KNIME_project11.knwf (27.4 KB)!
Example of source data, where I need to backfill category column:
What output are you expecting?
If you step through the loop 1 iteration at a time, the issue becomes clearer. You set up the loop to have the same data entering it on every iteration, therefore it shouldn’t be surprising that you get the same result after every iteration. If you want the updated data to recirculate through the loop you’d need to use a Recursive Loop. For this to work, I had to create a variable that returns “true” when the sum(rows_to_fix) = 0. When that condition is satisfied, the loop ends.
Having said that, I’m not sure I understand why this loop is necessary. It seems to me that you can just sort the data by the order column then use the Missing Value node to fill in missing values in the category column with the previous value. Using only 3 nodes gets you the same result:
Thank you so much for your response! Ok, it’s good to know that I needed the recursive loop. I did try and step through the loop and was wondering why my values weren’t updating - this makes sense.
In this data set using the missing nodes work fine, but I have a lot more columns in my original data set that should not be updated so I wanted to test this out. I could split the category column out, use the missing values node, then join it back in and that would be much easier/faster, so that would actually work too, thanks!!!
Column Settings tab of Missing Value node allows you to specify missing value strategy separately for each column so splitting can be skipped as well
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.