Hi Everyone,
I am a complete Knime noobie, so I hope this isn´t completely dumb.
In my line of work (financial institution) I have a data source that has a monthly snapshot of many of our customer characteristics. In our personal loans portafolio analisis scenarios, is commonly true that past months customer behavior predicts future default rates pretty well.
I want to design a template that enables us to easily test different hypothesis regarding the customer's historic behavior. Example input table:
Customer ID | Date | Days Past Due | Account Balance | Available to disburse |
---|---|---|---|---|
1111 | 01/17 | 0 | 100 | 10 |
1111 | 02/17 | 31 | 100 | 10 |
1111 | 03/17 | 0 | 93 | 17 |
1111 | 04/17 | 0 | 86 | 24 |
2222 | 01/17 | 64 | 50 | 0 |
2222 | 02/17 | 95 | 50 | 0 |
2222 | 03/17 | 125 | 50 | 0 |
2222 | 04/17 | 95 | 45 | 5 |
This is a simplification of what we have, but it works as an example. This table would continue downward for 400k customers and about 5 years of snapshots.
Common features that we would like to create on the fly with this two example dimensions:
- Max Days Past Due in last 6 months (hopefully this would be some kind of lag calculation so that all the snapshots of the clients have this calc)
Ex:
Customer ID | Date | Calc_MaxDPD6M |
1111 | 01/17 | 0 |
1111 | 02/17 | 31 |
1111 | 03/17 | 31 |
1111 | 04/17 | 31 |
2222 | 01/17 | 64 |
2222 | 02/17 | 95 |
2222 | 03/17 |
125 |
2222 | 04/17 | 125 |
- Days past Due Delta between Row Date and 3 Months ago. ( For example customer 2222 would have 156-64 = 92)
- from 6,9, 12 months ago ( Account Balance/ (Account Balance + Available to Disburse)
All this features would be used in a classification model that predicts the probability of default N months after a point of observation.
I would really appreciate any help.
Thanks a lot!
JPM