Working with historic client data

Hi Everyone, 

I am a complete Knime noobie, so I hope this isn´t completely dumb. 

In my line of work (financial institution) I have a data source that has a monthly snapshot of many of our customer characteristics. In our personal loans portafolio analisis scenarios, is commonly true that past months customer behavior predicts future default rates pretty well. 

I want to design a template that enables us to easily test different hypothesis regarding the customer's historic behavior. Example input table:

 

Customer ID Date Days Past Due Account Balance Available to disburse
1111 01/17 0 100 10
1111 02/17 31 100 10
1111 03/17 0 93 17
1111 04/17 0 86 24
2222 01/17 64 50 0
2222 02/17 95 50 0
2222 03/17 125 50 0
2222 04/17 95 45 5
         

This is a simplification of what we have, but it works as an example. This table would continue downward for 400k customers and about 5 years of snapshots.

 

Common features that we would like to create on the fly with this two example dimensions:

- Max Days Past Due in last 6 months (hopefully this would be some kind of lag calculation so that all the snapshots of the clients have this calc)

Ex:

Customer ID Date Calc_MaxDPD6M
1111 01/17 0
1111 02/17 31
1111 03/17 31
1111 04/17 31
2222 01/17 64
2222 02/17 95
2222 03/17

125

2222 04/17 125

- Days past Due Delta between Row Date and 3 Months ago. ( For example customer 2222 would have 156-64 = 92)

- from 6,9, 12 months ago ( Account Balance/ (Account Balance + Available to Disburse)

 

All this features would be used in a classification model that predicts the probability of default N months after a point of observation. 

 

I would really appreciate any help.

Thanks a lot!

JPM

Hi JPM,

a combination of the "Group Loop" node and the "Moving Aggregation" node should do the trick here. The group in your case is the customer_id. For each customer_id you sort by date and then apply the "Moving Aggregation" node to calculate, e.g., the max of days past due in the last six months. Now you have this aggregate value for all customers for all months in your dataset. I suggest you play around with the "Moving Aggregation" node a bit as I think all your calculations can be done in this way (probably using the help of the "Math Formula" node occasionally - e.g. for calculating deltas between aggregate measures).

Hope this helps, Johannes