This workflow deploys an advanced parameter optimzation protocol with four machine learning methods. In this implementation the choice of features (fingerprints) and one hyperparameter per method are being optimized. However, we encourage to use this workflow as a template if you have completely different data and customize it by including additional parameters into the optimization loop. Parameter optimization is performed on 80% of the original dataset. The optimization loops are encapsulated in Metanodes which carry the name of the machine learning methods. The model performances can be evaluted and the best model can be selected in the interactive view of the Pick best Model component. Finally, the selected model is scored using 20% of the dataset (that was not part of optimization cycle) and results are displayed with Model Report component. The dataset represents a subset of 844 compounds evaluated for activity against CDPK1. 181 compounds inhibited CDPK1 with IC50 below 1uM and have "active" as their class. More information is available https://chembl.gitbook.io/chembl-ntd/#deposited-set-19-5th-march-2016-uw-kinase-screening-hits. See Set 19. This workflow a revised version of the original workflow: https://kni.me/w/-ATVMu9EmIURm8kr
This is a companion discussion topic for the original entry at https://kni.me/w/VxE7Y-PGz8jP6LaU