Identify changes in data compared to previous period

LostInTheFlood · October 14, 2022, 1:33pm

Hi

Imagine working in HR and you need to track resource changes between months.
It could be org changes, job changes and so on.
How can I achieve such check using KNIME?
Bear in mind I’m on a early beginner level in KNIME.

See dummy table below for potential structure and wanted output.
The date is unfortunately in String format by default but it can be reworked if needed.

ScottF · October 14, 2022, 4:58pm

Hi @LostInTheFlood and welcome to the forum.

Could you upload your dummy table as an Excel or CSV file, please? Probably no one wants to take the time to type that all in

LostInTheFlood · October 14, 2022, 6:16pm

dummy.xlsx (12.8 KB)
Good point

ArjenEX · October 14, 2022, 11:29pm

What’s the logic behind external exactly? Is it that anyone after the first month that is either new or leaves is considered as external?

It should definitely be possible to do this in KNIME but just want to be sure on the ruleset

Currently have it like this:

LostInTheFlood · October 15, 2022, 3:43pm

Hi @ArjenEX
This is a movement & changes analysis within a company.

The “rule” for external is that somebody is joining the company from outside any org level 1 or leaving the company i.e. no new org level 1.
My reason for stating “First Month…” on 2022-01 in this example is to do a easy filter as first period will all be “new” and not relevant for any movement analysis.

From what I see in your picture it looks really promising. I knew there were smart people here
Looking forward to view and understand your workflow.

ArjenEX · October 15, 2022, 4:50pm

Oke, this might need some finetuning later on but you should be able to manage that yourself. As typically with KNIME, this can be approach in various ways (this is just one).

To properly work with the months, a string to date conversion is first due. I opt to do this with a string manipulator to make it yyyy-mm-dd and subsequently convert it to a date format.

Next step is to establish the range of months that need to be evaluated. Meaning I need to know which months are in the data set and what its predecessor is to make the comparison on an employee level.
First step is pretty straightforward with a groupBy to get the months.

The relationship to the previous month can be determined with a Date&Time Shift node with -1M duration

Now you can start to loop through each month. Since the data range is off course dynamic, this is initiated by a Table Row to Variable Loop start. This basically means that it will iterate through each row and make month and month_prev_month available as flow variable to other nodes.

If you are unfamiliar with flow variables, I would recommend doing some research on this. Then it will make a lot more sense. Example:

In the main data flow, those flow variables are used to filter the subject month from the main data set. For example, for the first iteration of the loop, it takes:

You’ll see that there a flow variable connection (red line) between the loop start and the row filter for current_month. This means that the values can be used.

For the row filter, you would normally fill in something in the seach pattern (blue), however this time it is controlled by the variable as indicated(red) so KNIME automatically fills in 2022-01-01 in the first iterations, 2022-02-01 in the second iteration, etc.

The flow variable is applied under the Flow Variables tab sheet.

Same principle applies to the next_month row filter, although it’s empty in the first iterations because there is no data for December 2021. Next step is then a left outer join based on the column user to retrieve the associated information from the current month and the previous month.

Then the hard work needs to happen, the rules. I’m opting for a Java Snippet in this case because it allows updating multiple columns in one go pretty conveniently. Something which the Rule Engine or Column Expression cannot do that quickly.

The comments in the below code outline what is going on.

if (v_currentIteration == 0) { // first month in the dataset is always the first iteration, the loop keeps a record of this as flow variable. 
    out_Attribute = "New";
   	out_OldOrgL1 = "First Month = Unknown";
   	out_OldOrgL2 = "First Month = Unknown";
   	out_OldOrgL3 = "First Month = Unknown";  	
} else if (c_OrgL1_nextval == null && c_OrgL2_nextval == null && c_OrgL3_nextval == null) { // if all the previous month valuesare empty, the employee is new.
   	out_Attribute = "New";
   	out_OldOrgL1 = "External";
   	out_OldOrgL2 = "External";
   	out_OldOrgL3 = "External";
} else if (c_OrgL1 == null && c_OrgL2 == null && c_OrgL3 == null) { // if all current month values are null, the employee left
   	out_Attribute = "Left";
   	out_OldOrgL1 = "External";
   	out_OldOrgL2 = "External";
   	out_OldOrgL3 = "External";
} else if (!c_OrgL1.equals(c_OrgL1_nextval)) { // if current L1 is not equal to previous L1, L1 is new.
 	out_Attribute = "New Org L1";
   	out_OldOrgL1 = c_OrgL1_nextval;
	out_OldOrgL2 = c_OrgL2_nextval;
	out_OldOrgL3 = c_OrgL3_nextval;
} else if (!c_OrgL2.equals(c_OrgL2_nextval)) { // if current L2 is not equal to previous L2, L2 is new.
 	out_Attribute =  "New Org L2";
 	out_OldOrgL1 = c_OrgL1_nextval;
	out_OldOrgL2 = c_OrgL2_nextval;
	out_OldOrgL3 = c_OrgL3_nextval;
} else if (!c_OrgL3.equals(c_OrgL3_nextval)) { // if current L3 is not equal to previous L3, L3 is new.
   out_Attribute =  "New Org L3";
    	out_OldOrgL1 = c_OrgL1_nextval;
	out_OldOrgL2 = c_OrgL2_nextval;
	out_OldOrgL3 = c_OrgL3_nextval;
} else {
   	out_Attribute =  "No change"; // otherwise, nothing has changed
}

After some clean-up, this is the outcome.

See the WF:
Identify changes in data compared to previous period.knwf (81.5 KB)

Hope this provides some inspiration!

LostInTheFlood · October 16, 2022, 9:06am

@ArjenEX
wow. awesome! Thank you for spending your valuable time on this and to create the step by step tutorial for dummies
I will dig in to this do start understanding all the steps you have taken.

From what I understand so far this will give me a very good base to tweak the analysis depending on what type of changes I need to analyze.

I will mark as solution when I have spent time studying the solution in case I have any questions.

LostInTheFlood · October 17, 2022, 10:50am

Work like a charm! I have tried on my real data and I get the correct output.
I think I understand the logic but I would not have thought of it myself.
The java code is basically where I will do my small adjustments.
Once again, big thank you.

ArjenEX · October 17, 2022, 10:58am

Happy to hear! Indeed the Java snippet is the core engine of it all.

system · October 24, 2022, 10:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.