we would like to use SVM Learner and SMW Predictor for a larger amount of data.
Therefore we have a workflow which uses a loop.
Chunk Loop Start --> Concatenate (Add Training Data to table)--> Document Vector --> Row Splitter (Split Training Data) --> SVM Learner --SVM Predictor --> Loop End
We would like to avoid that we run the SVM Learn in every Loop excecution. If we do so we get a different column structure for training data and analysis data.
How can we use the model to run SVM Learner only once and us it for the Predictor within the loop execution?
Thanks for any support.
Every node between the loop start and loop end is executed in each loop iteration. However if one of the inport comes from outside the loop this is not reseted.
So your flow should be as follows:
Training Data -> Document Vector (Training Data) -- > SVM Learner -> (goes to SVM Predictor)
Data -> Chunk Loop Start --> Document Vector --> SVM Predictor --> Loop End
Let me know if this helps, otherwise I can sent you an example workflow :)
Thank you for your answer.
I tried again as you suggested, but failed.
The error messge from SVM Predictor is something like
"SVM Predictor Column 'xyz' not found in test data" and therefore does not execute.
I attached a screenshot of my workflow.
An example would be really helpful.
the problem might be, if you split up the training and test documents and create document vectors separately they will end up in different feature spaces. The feature space is created by the preprocessed terms of the documents. To create the same feature space for all documents you need to create the document vectors for all the documents at once. Keep all documents in one data table and apply the document vector node. To create training and test data split the document vector result table afterwards, e.g. with the partitioning node. If the feature space is different for learner and predictor, the predictor will end up with the warning message "column xy not found".