Lately, I’ve been working on building a QSAR/QSPR modeling workflow in KNIME (predicting properties from chemical structures). From what I discovered in the literature, partial least squares (PLS) regression is widely used for this and is essential for other applications with wide data sets or highly correlated features, such as chemometric data.
Over the last week, I worked on running the R “pls” package in KNIME. This worked for simple examples. However, I started running into problems when I tried to expand the workflow.
Specifically, it runs extremely slow with the leave-one-out (LOO) cross-validation method (x-partitioner/x-aggregator), unlike other stand-alone regression nodes in KNIME. This makes it impossible to use in workflows that include parameter optimization and feature selection.