Identifying variables in Boosting Trees

svalerov · July 13, 2016, 8:54am

Hi,

I would like to know what variables are finally used in the determination of a boosting process in decision tree analysis:

boosting learner loop star-decision tree learner-decision tree predictor-boosting learner loop end.

Is the result transparent?

Thank you very much.

christian.birkhold · August 2, 2016, 10:17am

Hi Svalerov,

I'm not sure if I understand your question. The output of the boosting is a table of weighted models (also called weak classifiers) which are then combined to a global model (ensemble). The weights of the models are determined using a variant of AdaBoost, called AdaBoost.SAMME which reweights the instances during training in each iteration. Instances which are classified correctly by the ensemble in the current iteration have a lower weight, instances which are classified obtain a higher weight. Like that instances which are harder to classify will have more influence on the selection of the next weak classifeir.

Does this help?

Best,

Christian

Ergonomist · August 2, 2016, 1:41pm

Hi,

This may help, too (taken from https://www.knime.org/summit2016):

16:45 – 17:30 Special Session:

Dean Abbott (Abbott Analytics): Measuring Variable Importance with Target Shuffling

<drupal-entity data-embed-button="file_browser" data-entity-embed-display="file:file_default" data-entity-embed-display-settings="{&quot;description&quot;:&quot;&quot;,&quot;link_url&quot;:&quot;&quot;}" data-entity-type="file" data-entity-uuid="89e4dd36-ee19-4c95-a824-cb7e545c7ba1"></drupal-entity>
</li>

Works for any model!

Cheers
E