Data Warehouse Lifecycle

Hi, I am new to Knime and new to data science as well.  I come from the world of traditional data warehousing and I am looking to begin introducing customers and clients to predictive capabilties at each of our operational/analytic reporting engagements.

I'll start by saying that I may not be understanding and applying this technology correctly.  However, I envision this technology living along side a traditional warehouse so that prescriptive analysis is continuously feeding future predictions.


Here is where I am struggling (in this case I am applying kmeans clustering to wrap my head around customer churn, I am using the example here:

I have fed some of our data in and been able to generate some high level insight. Additionally (and here is where this breaks) I want to apply analytic analysis, so for example.  I recognize that I have a cluster that has very low retention, and they are utilizing very specific product features/behavior.  My next question may be, ok, where did we acquire these customers.  (i.e. were they part of some crazy promotion?  Next lets find the sales person and department responsible and slap their hand,etc.)

To do this, I need to assign a cluster key to specific customer accounts and use that account information to join back to the warehouse and analyze aquisition source, department, etc..  

Kmeans only accepts numeric inputs, and seems to calc on every field. 

My first question is, am I even using this technology correctly?  Or do we stop at the high level cluster and inform the organization that if you do promotions in the future, be sure to market other features since some of our product/features are resulting in high churn?

My original goal is that the prediction system would almost close the loop and provide insight and foreign keys to join back into the warehouse to find out more about the clusters, but that may need to be something humans are doing rather than a technical solution.



If I was going to be stubborn about this... 

I guess I could devise a way to make reporting bins auto update, or record the bands of the various attributes in a lookup table and some ETL to assign a cluster based on output of KNIME but I want to make sure I'm not missing something basic within the tool that would assign these clusters.

Also if you have taken the time to read this, please reference from a newbie perspective that has just cracked open the example churn job and may not be all that familiar with the xpath mappings, etc. in the clustercenter tables. 




First, welcome to KNIME!

Your application sounds reasonable, and it should be fairly easy to do a couple of things to help link your results back to your original data.  

It sounds like you might want to use a Database Update node to tag interesting users in your databse.  To do this, you willl need a column for the k-means segmentation in your customer table on the DB side (or something similar) then you can do an update where you SET the customer segment WHERE the Customer ID's match. In your subsequent SQL queries you should then be able to reference your segmentation as generated in KNIME.  Also possibly worht mentioning is the Cluster Assigner node which will let you apply an existing clustering model to new data.  This is handy if it is computationally prohibitive to calculate your k-means on the full data.  

Let us know if this helps or if you have further questions.