PU Learning

I have an extensive list of payments on contracts from the last 6 years, and need to train a model to predict future payments.

In order to do so, I need to answer two questions:

  1. when will a payment concerning a specific contract take place?
  2. how much will each payment be?

Through a regression tree I managed to train a model which is quite reliable in answering question 2, but I am now stuck at question 1.

Is this to be treated as a case of PU learning? In fact I have a broad sample of positive cases, which I need to match with another sample of unlabeled mixed cases to build a classifier.

Would you have any suggestion on how to proceed? I am also ready to reconsider the general strategy, if you have concrete cases to bring to my attention.

Many thanks for your help.

Hi lucamanu, 

Let's say you're currently in the moment in the past when the 2nd last payment happened for each contract. Then you can try to generate new features like number of days the last due date was exceeded, the last 2nd due date was exceeded and so on. The feature you will be trying to predict then will be the number of days from the last payment day to the 2nd last payment date. Then you feed your data to train a model and with this model you should be able to predict the number of days from the last payment date until the next one. The feature generation should be also a part of the preprocessing step for all new examples.