Oke, this might need some finetuning later on but you should be able to manage that yourself. As typically with KNIME, this can be approach in various ways (this is just one).
To properly work with the months, a string to date conversion is first due. I opt to do this with a string manipulator to make it yyyy-mm-dd and subsequently convert it to a date format.
Next step is to establish the range of months that need to be evaluated. Meaning I need to know which months are in the data set and what its predecessor is to make the comparison on an employee level.
First step is pretty straightforward with a groupBy to get the months.
The relationship to the previous month can be determined with a Date&Time Shift node with -1M duration
Now you can start to loop through each month. Since the data range is off course dynamic, this is initiated by a Table Row to Variable Loop start. This basically means that it will iterate through each row and make month and month_prev_month available as flow variable to other nodes.
If you are unfamiliar with flow variables, I would recommend doing some research on this. Then it will make a lot more sense. Example:
In the main data flow, those flow variables are used to filter the subject month from the main data set. For example, for the first iteration of the loop, it takes:
You’ll see that there a flow variable connection (red line) between the loop start and the row filter for current_month. This means that the values can be used.
For the row filter, you would normally fill in something in the seach pattern (blue), however this time it is controlled by the variable as indicated(red) so KNIME automatically fills in 2022-01-01 in the first iterations, 2022-02-01 in the second iteration, etc.
The flow variable is applied under the Flow Variables tab sheet.
data:image/s3,"s3://crabby-images/01b6f/01b6fad4b2b97837a7a9ec76eb841be20fd9bc0f" alt="image"
Same principle applies to the next_month row filter, although it’s empty in the first iterations because there is no data for December 2021. Next step is then a left outer join based on the column user to retrieve the associated information from the current month and the previous month.
Then the hard work needs to happen, the rules. I’m opting for a Java Snippet in this case because it allows updating multiple columns in one go pretty conveniently. Something which the Rule Engine or Column Expression cannot do that quickly.
The comments in the below code outline what is going on.
if (v_currentIteration == 0) { // first month in the dataset is always the first iteration, the loop keeps a record of this as flow variable.
out_Attribute = "New";
out_OldOrgL1 = "First Month = Unknown";
out_OldOrgL2 = "First Month = Unknown";
out_OldOrgL3 = "First Month = Unknown";
} else if (c_OrgL1_nextval == null && c_OrgL2_nextval == null && c_OrgL3_nextval == null) { // if all the previous month valuesare empty, the employee is new.
out_Attribute = "New";
out_OldOrgL1 = "External";
out_OldOrgL2 = "External";
out_OldOrgL3 = "External";
} else if (c_OrgL1 == null && c_OrgL2 == null && c_OrgL3 == null) { // if all current month values are null, the employee left
out_Attribute = "Left";
out_OldOrgL1 = "External";
out_OldOrgL2 = "External";
out_OldOrgL3 = "External";
} else if (!c_OrgL1.equals(c_OrgL1_nextval)) { // if current L1 is not equal to previous L1, L1 is new.
out_Attribute = "New Org L1";
out_OldOrgL1 = c_OrgL1_nextval;
out_OldOrgL2 = c_OrgL2_nextval;
out_OldOrgL3 = c_OrgL3_nextval;
} else if (!c_OrgL2.equals(c_OrgL2_nextval)) { // if current L2 is not equal to previous L2, L2 is new.
out_Attribute = "New Org L2";
out_OldOrgL1 = c_OrgL1_nextval;
out_OldOrgL2 = c_OrgL2_nextval;
out_OldOrgL3 = c_OrgL3_nextval;
} else if (!c_OrgL3.equals(c_OrgL3_nextval)) { // if current L3 is not equal to previous L3, L3 is new.
out_Attribute = "New Org L3";
out_OldOrgL1 = c_OrgL1_nextval;
out_OldOrgL2 = c_OrgL2_nextval;
out_OldOrgL3 = c_OrgL3_nextval;
} else {
out_Attribute = "No change"; // otherwise, nothing has changed
}
After some clean-up, this is the outcome.
See the WF:
Identify changes in data compared to previous period.knwf (81.5 KB)
Hope this provides some inspiration!