I am completely new to Knime and experimenting with machine learning.
I would like to predict buy/no buy decision. I have a following data set:
Client ID, contact date, result of contact, type of product sold.
Basically for the same client ID, I have more than 1 contact trials. Some contacts result in purchase, some not. For example, within a specific time period, I could have 10 contacts with specific client, but only 2x he has purchased anything.
I do not how to handle “contact history” to prepare an input table for a predictor/e.g. decision tree. In KNIME examples, like “churn prediction”, in input tables 1 client = 1 row. In my case: 1 client = mulitple rows (this is because each contact generates a new record/row in CRM). How to best approach it? Shall I built some supporting statistical indicators such as “average time between contacts” to transform my table to 1 client=1 row table? or somehow aggregate these rows into one. These seem highly impractical.
Some thoughts about your question.Take some time to have a clear view on your business question, before you translate it to a data science question What is it exactly what you want to predict?
if a customer will by or not
if a contact will lead to a sail
E.g. In a certain situation? ; within a period of time? ; or ?
I guess your question is more like the first one, and in that case it is best to end up with a table based on individual customers. The way to come from transactions to individual customers is the Pivot Node.
You will end up with a table like:
customer_id ; contact_1_features, contact_2_features (…) contact_n_features
Another thing to consider, is if there is an order in the contacts, make sure that the every group of variables in the contact_features describes the same event (e.g. first visit; or first time send brochure…).
The next step is to derive new features within a group of contact_features and between groups of feature_contacts (e.g. time between the contacts). Maybe you can create some RFM variables (Recency, Frequency, Monetary value)
And yes this is highly impractical. But data understanding, cleaning and preparation costs a lot of time and effort, but will result in better predictions.
To add on to already well rounded suggestions and questions/ideas from @HansS and @beginner. If you want to predict if client will make a purchase or not you need explanatory variables that can lead you to that conclusion. Only explanatory variable in your data set you have (or at least in one you presented) is date and it is not so explanatory. So I would suggest you try to get more variables on your clients to predict their behavior. You have to ask yourself what might impact client’s decision to buy or not to buy your product(s). Some answers might be product type, price, seller, client’s need for it, time of day, try number x to sell same product…
Thank you both for your comprehensive comments, seems I have to return back to school:) Your comments inspired me to spend more time on data understanding/exploration, building hypotheses (so basically some data mining) and then think about modelling.