Working with historic client data

JPM · November 27, 2017, 7:39pm

Hi Everyone,

I am a complete Knime noobie, so I hope this isn´t completely dumb.

In my line of work (financial institution) I have a data source that has a monthly snapshot of many of our customer characteristics. In our personal loans portafolio analisis scenarios, is commonly true that past months customer behavior predicts future default rates pretty well.

I want to design a template that enables us to easily test different hypothesis regarding the customer's historic behavior. Example input table:

Customer ID	Date	Days Past Due	Account Balance	Available to disburse
1111	01/17	0	100	10
1111	02/17	31	100	10
1111	03/17	0	93	17
1111	04/17	0	86	24
2222	01/17	64	50	0
2222	02/17	95	50	0
2222	03/17	125	50	0
2222	04/17	95	45	5

This is a simplification of what we have, but it works as an example. This table would continue downward for 400k customers and about 5 years of snapshots.

Common features that we would like to create on the fly with this two example dimensions:

- Max Days Past Due in last 6 months (hopefully this would be some kind of lag calculation so that all the snapshots of the clients have this calc)

Ex:

Customer ID	Date	Calc_MaxDPD6M
1111	01/17	0
1111	02/17	31
1111	03/17	31
1111	04/17	31
2222	01/17	64
2222	02/17	95
2222	03/17	125
2222	04/17	125

- Days past Due Delta between Row Date and 3 Months ago. ( For example customer 2222 would have 156-64 = 92)

- from 6,9, 12 months ago ( Account Balance/ (Account Balance + Available to Disburse)

All this features would be used in a classification model that predicts the probability of default N months after a point of observation.

I would really appreciate any help.

Thanks a lot!

JPM

johannes_clarifydata · November 28, 2017, 6:49pm

Hi JPM,

a combination of the "Group Loop" node and the "Moving Aggregation" node should do the trick here. The group in your case is the customer_id. For each customer_id you sort by date and then apply the "Moving Aggregation" node to calculate, e.g., the max of days past due in the last six months. Now you have this aggregate value for all customers for all months in your dataset. I suggest you play around with the "Moving Aggregation" node a bit as I think all your calculations can be done in this way (probably using the help of the "Math Formula" node occasionally - e.g. for calculating deltas between aggregate measures).

Hope this helps, Johannes