I'm struggling with the input format for the Model Acceptability Criteria node, which seems to require vector-based input data.
I have a data set described by RDKit descriptors, and convert this to a bitvector using the Create Bit Vector node, but the Model Acceptability Criteria keeps complaining that 'Inputs 1, 2, and 3 should be vectors'.
How should I format the input data?
It turns out that this node only needs the experimental and predicted data columns, no descriptors or anything else as input (thanks to Daniel Mucs, SweTox).
This node is applicable only for continuous models (not classification).
For the specific nodes there are 3 inputs:
0 Values for the dependent variable, predicted by the model (ypred)
1 Values for the dependent variable for the test set (yexp)
2 Values for the dependent variable for the training set (ytr)
For each input you need one vector (the dependent variable) not the descriptors.
Please note that you need to pass to the node only values of the dependent variable not the whole data (you can put a splitter before the node for this job).
The values of the dependent variable are also needed, please have a look at the attached paper in which Tropsha’s equations are included (eq. 3, 4 & 5).
Since there are several questions about the training set. The training set (ytr) is used in equations 1 and 3 (actually what is used is the averaged value for the dependent variable for the training set).
I have problems with the input of this node, i transform only the experimental and predicted values in vector using the “Create Bit Vector” node, but after the transformation the “Model Acceptability Criteria” node doesn’t work because need data or all “double” or all “intreger” but my experimental data are all “double”
There is no need to generate bit vectors, you should use double values as input for all ports.
But now, why input 1 and 2 should have equal lenght?
test set is a different set respect the training set
ok I understand ports 1 and 2 are ports 0 and 1 ok thanks!!
This is a bit misleading, what is meant is that you only need the column with the dependent variable as input for the Model Acceptability Criteria node