Hi all
I have a question about duo mining, first of all, I have 200 text doc with 150,000 terms after pre-processing. I have used random forest to select 1000 features for subequent doc classifiction using ANN. Due to the small sample size, I have applied 10-fold cross-validation to make the prediction. Secondly, I have another set of numeric figures (3 continuous variables) which can be treated as market data to enrich the doc classification.
Questions:
-
If I would like to use 2-stage modeling, I.e. results from ANN + market data (3 variables as stated above), it seems the sampls size is too small as the output from ANN is only 20 (output from 10-fold cross-validation).
-
If I use the market data (3 variables) with selected features from doc (1000 features), then the contribution from the market data should be neglected (as 3 vs 1000).
As I don’t want to use votting or simple weighted approaches to combine the result, may I have your professional view to make a sensible prediction?
If this is possible, is there any built-in nodes to do it?
Thanks
Lawson