Hi Community,
I’m fairly new to KNIME but have quite extensive experience with predictive analysis. I have a use case and data for a Customer Lifetime Value prediction model but not the KNIME node experience to put this together. I am planning on using SVM in regression mode to do this but please let me know if there are other suitable methods instead. The output of the model should be the predicted spend of each customer from the point of modelling until they stop purchasing.
My desired input data to the model would look like this:
Customer ID - The unique ID of each customer
Recency Score - My data set spans over 12 months and I would like each customer to be given a score from 1 to 12 based on the month of their latest purchase
Frequency - The numeric number of orders placed by each customer over the last twelve months
Spend Q1 - The total numeric sum of spend per customer in the first quarter of the past year
Spend Q2 - The total numeric sum of spend per customer in the second quarter of the past year
Spend Q3 - The total numeric sum of spend per customer in the third quarter of the past year
Data prep questions:
-
I have used the GroupBy node to get the latest order date for each customer but it returns this in a format of for example “2018-04-02T10:32”. How can I transform this to the number of the month (in this case 4)?
-
I have transactional spend on an order level for each customer, for example:
CustomerID OrderNo Date Spend
123 345 2018-01-12 £54.65
123 478 2018-04-24 £32.21
123 678 2018-11-15 £75.32
What is the best way to calculate the spend per quarter for each customer and insert the sum of this in to the above mentioned “Spend Q” columns?
Modelling questions:
-
I couldn’t find any specific SVM regression nodes but found the more generic “SVM Learner” and “SVM Predictor” nodes. Will these work for the purpose of my modelling?
-
Do I still need to use a “Partitioning” node before the SVM and how should this be configured?
-
What is the best way of evaluating the results from a Correlation Coefficient and Root Relative Squared Error perspective?
Input data set - I’m using this sample data to set my model:
https://archive.ics.uci.edu/ml/datasets/online+retail
I really appreciate your help with this and thanks in advance.