Hi, I am new to Knime and new to data science as well. I come from the world of traditional data warehousing and I am looking to begin introducing customers and clients to predictive capabilties at each of our operational/analytic reporting engagements.
I'll start by saying that I may not be understanding and applying this technology correctly. However, I envision this technology living along side a traditional warehouse so that prescriptive analysis is continuously feeding future predictions.
Here is where I am struggling (in this case I am applying kmeans clustering to wrap my head around customer churn, I am using the example here: http://www.knime.org/knime-applications/churn-analysis)
I have fed some of our data in and been able to generate some high level insight. Additionally (and here is where this breaks) I want to apply analytic analysis, so for example. I recognize that I have a cluster that has very low retention, and they are utilizing very specific product features/behavior. My next question may be, ok, where did we acquire these customers. (i.e. were they part of some crazy promotion? Next lets find the sales person and department responsible and slap their hand,etc.)
To do this, I need to assign a cluster key to specific customer accounts and use that account information to join back to the warehouse and analyze aquisition source, department, etc..
Kmeans only accepts numeric inputs, and seems to calc on every field.
My first question is, am I even using this technology correctly? Or do we stop at the high level cluster and inform the organization that if you do promotions in the future, be sure to market other features since some of our product/features are resulting in high churn?
My original goal is that the prediction system would almost close the loop and provide insight and foreign keys to join back into the warehouse to find out more about the clusters, but that may need to be something humans are doing rather than a technical solution.
If I was going to be stubborn about this...
I guess I could devise a way to make reporting bins auto update, or record the bands of the various attributes in a lookup table and some ETL to assign a cluster based on output of KNIME but I want to make sure I'm not missing something basic within the tool that would assign these clusters.
Also if you have taken the time to read this, please reference from a newbie perspective that has just cracked open the example churn job and may not be all that familiar with the xpath mappings, etc. in the clustercenter tables.