Hi all –
I was wondering if, at a high level, someone could explain how one can go about deploying, in a nightly batch process, a model developed using a Weka node (Bagging w/CART). I thought I read somewhere here that the Weka nodes don’t spit out PMML or other standardized code for deployment, rather they use some code specific to Weka/KNIME.
Just a little background, the model scores incoming leads/hand-raisers on likeliness to purchase. We receive leads via call center and online form throughout the day. In theory, higher scoring leads could be prioritized for more highly involved follow up activities. We have a vendor that handles our database marketing so I just want to be able to give them a few paths to explore if we agree to deploy the model.
For the "nightly" part of your task, we reccomend the KNIME Server which is one of our commercial offerings. It allows you run KNIME jobs on a schedule as well as providing a webservice layer that you can use to interact with other applications. There are other ways to schedule jobs in KNIME but they would require using something like cron and that is a bit more complicated.
For model deployment and scoring, you would probably end up doing something like training, validating and saving the model in one workflow, and then deploying it for use in a separate workflow. You can read models from weka that have been saved with the Model Writer using the Model Reader.
That is pretty "high level" but hopefully it helps.
Perfect, thanks for the feedback Aaron!
I have successfully implemented a Knime/Weka job that runs every five minutes. I have a related job that stores a Weka model and then runs data against that model. This is on a Windows box, using Windows' version of Cron (Windows Task Scheduler). The "top" program is written in C#, and passes a DOS command with Knime in it.
The hardest part (by far) was learning the escape and quotation protocol to call Knime from the DOS command line with the parameters I wanted. These parameters included Knime workflow variables that have dates in them that are in turn quoted. As I result, I feel that I am somewhat of an expert now. :-)
The second hardest part is making Weka robust to changes in my run-time data. My experience (at least with J48) is that Weka does not like new attributes that show up in run-time data (and are not in the training data). I have some work-arounds for that that I am willing to share, as well. I beleive the original idea for those work-around came from Aaron or one of his friends at Konstanz, as all good ideas do!
Let me know if any of this is of interest to you, and I can share what I know.
Best wishes and Merry Christmas,
I am interested in understadning the methods for scheduling, say every 5 mins or every night, a KNIME stream apart from using the KNIME server .
Any help on this wud be appreciated !!