MDS Projection problem

Hi,

I have a dataset of lets say 1,000 cpds, and these have been transformed into 2 Dimensional space using MDS Distance Matrix node for the purpose of visualisation on a 2D scatterplot. This was done via an RDKit Fingerprint, Distance Matrix Calculate node, and then the MDS Dist Matrix node.

Now if I have an additional 10 cpds, is there any way I can project these onto the existing the MDS result from earlier without having to recalculate the whole Distance Matrix and MDS. I see there is a MDS Projection (DistMatrix) node which sounds like its what I want, but I cannot fathom out for the life of me how to get it to do what I want.

Any ideas?

Simon.

Hi Simon, 

As far as I know, it is not possible to recalculate the projection without first re-computing the entire distance matrix. Would something like a Distance Matrix Extender which allowed you to add more compounds to a distance matrix after the initial calculation solve this problem?

Firstly, I think a Distance Matrix Extender would be a useful addition anyway, as the number of times I've had to recalculate a Distance Matrix of 1,000's of compounds just to add a handful new compounds is quite numerous. Common in the Pharma industry is constantly building on an existing dataset of cpds, so I can only assume it would be advantageous to others also.

However, in this instance, it doesnt fully solve the problem, as the MDS Distance Matrix node takes some considerable time to process a Distance Matrix of 5,000 cpds. So if I want to project new compounds onto the MDS scaled columns, I have to go through a hugely time consuming process each time.

There is a commercial offering which seems to be able to project new compounds onto an existing MDS scale very rapidly (effectively instantly), but I am unsure what processes they are undertaking to do this.

Do you think there is anyway to do this with a future node somehow ?

Simon.

Hi Simon,

the PCA nodes create a model that can be reused. It's not a MDS but maybe helpful.

Cheers, Kilian

Thanks for the tip Kilian,

I am not sure PCA is exactly what I need as it tries to put as much of the diversity into the first dimension(component) and less into subsequent dimensions.

I am using MDS for the sole purpose of converting a chemical fingerprint into a 2D or 3D scatterplot representation for visualisation purposes. And unfortunately PCA doesnt support the facility of BitVector columns

It would be great, if something could be considered a future KNIME release to do something like this. An MDS Compute and MDS Apply nodes which support the new Distance nodes.

Simon.