Hello, i am looking for some help with bootstrap validate method.
I´d like to implement it in knime, is it possible? i´d need split the data with replacement for each iteration of the method, but i´m not able to do it in knime. data can be partitioned with the option DataManipulation-Row-Partitioning, but it is without replacement.
Anyone knows how could i develope it??
Bootstrapping can't be implemented with an "ordinary" node because you would want it to work with (almost) any learning algorithm and you also would need a repeated execution of that algorithm.
For that sort of problems we have the meta nodes (the current release contains a Cross Validation node, which is a meta node and has similar functionality as bootstrapping (I believe)).
I am tempted to suggest to have a look at the implementation of the cross validation node (look for the class XValidateModel) but on the other hand I must admit that the meta node concept undergoes a major revision. Whatever you implement with meta nodes right now, it won't work in KNIME 2.0 (scheduled for early 2008), ...unfortunately. It will be a lot easier to define a meta node in 2.0 (look at XValidateModel to see what I mean). To make a long story short: If you can wait for 2.0, wait...
Regarding the partitioning node: If you needed a customized partitioning you would need to implement that node yourself (or have us adopt the existing partitioning node to allow for the desired functionality). How does the resampling take place? Is it just random partitioning (without any interaction between two successive runs)? Is it partitioning at all or may the training set contain duplicates?
Thank you for your reply Bernd,
I need to implement Bootstrap because of my dissertation, i am working on it and i´d need finish it as soon as possible, so i can´t wait too much.
I am goint to look the implementation of the cross validation node to see if i can do soemthing...
What i would need in the partitioning node is just a randon partitioning with replacement (so i would have duplicates). All the instaces which have been choosen will form the training set (just once, if one instance has been choosen three times, it will appear just once in the training set), and the others will form the test one. Ii´d have to do this for each iteration of the bootstrap method.
If you need it now, you will need to look at the cross validation node (it shouldn't be too much pain since the cross validation node gives you a good prototype).
I still don't get the partitioning idea: If you say, you may draw duplicates but then in a second step you remove the duplicates from the training set... where is the difference to drawing random samples without replacement (training size may vary?)? Sorry, this is not KNIME related but you triggered interest.
i need to do it so because bootstrap works like this. i´m going to try to explain it:
if i have a set with 100 instances, bootstrap choose 100 instances to build the training set, if there isn´t replacement, bootstrap will choose the 100 instances which exist in the set. And it isn´t so. The replacement is necessary because the instances which haven´t been choosen will build the test set.
For example, dataset with 5 instances, bootstrap choose with replacement 5 of them, and the selected are: number 2 (twice), number 3 (twice) and number 5 (once). So the training set will be (2,3,5) and the test set will be (1, 4). do you know now?
Yes, got it. But then again: It's a matter of determining the size of the training set and doing a sampling without replacement. And for that you could use our partitioning node (you would just need to set programmatically the partition size and the random seed in each epoch).
Thanks for the clarification!
Hi again Bernd,
I am going to implement a new node to use it in my Bootstrap project. This node will consist in a partitioning with replacement and my question is if you could show me the partitioning node or the row sample node. I am doing the new node now, and it would be easier if i had that code.
if you want to develop a node by your own the best thing to do is to download the KNIME Developer Version (if you not already have it)
A new node wizard may support you by implementing your node (choose File->New->Other->new KNIME Node-Extension and follow the instructions).
Another helpful resource might be the documentation section: http://www.knime.org/documentation.html
Especially the extension guide should help you to realize your plan.
If you type Ctrl+Shift+T and enter "PartitionNodeModel" you'll find the code of the partitioner node (the execute is a good point to start looking at).
Hope that helps.
I have the knime developer version yet, i´ve developed quite methods and classes of my own node, but i had some doubts about something.
I´ll see the code of the partitioning node tomorrow and it will help me a lot, anyway, if i had some doubt i´ll ask you.
I am very grateful to all you because you are helping me a lot.
just don't hesitate to ask if you get stuck.
It would definitely be very helpful to have the ability to sample with replacement in the row sampling node.
How is hte development of the bootstrap node coming along? This type of node will be very useful as well!
I´ve done the sample with replacement node. To implement Bootstrap, i have used it, together with the looper and the cross validation meta nodes, and with a variation of the Scorer node made for me too and the Java Snippet for mathematics operationes. All these nodes are the workflow which i´ve made for the Bootstrap method. So, i haven´t made a Bootstrap node, but a workflow.
The KNIME Team will very soon (planned to be done in the next weeks) provide a possibility to share own implemented nodes and created workflows.
It will be our next step to support all of you who are doing great stuff with KNIME.