I would like to calculate RMSD values of 3D coordinates of a small molecule / fragment versus larger compounds containing the fragment as a substructure.
more in detail:
let’s say I have a list of molecules with both SMILES as a column and 3D coordinates in SDF as a column. These are results of a docking workflow. That means the column contains many poses of different molecules, but also several poses from the same molecule. All these molecules are about 300-450 Dalton and contain a common substructure that is the size of a small fragment, 120-200 Dalton. Also for this substructure, the fragment, I have 3D coordinates as SDF, found experimentally via X-ray crystallographic screening.
What I want to know:
how big is the 3D RMSD of the fragment vs the same atoms inside the bigger molecule?
In order to use that for further filtering, i.e. which docking poses moved far away from the input fragment coordinates.
my unsuccessful tries so far:
As far as I have tried the RDkit RMSD filter cannot do it, as
a) I cannot give a reference molecule
b) computes the RMSD between the entire molecule as far as I tested
(thus I would have to cut back the large molecule to the fragment (=MCS), but how to do that in KNIME? )
secondly I tried the KNIME the 3D RMSD node of CDK,
similar problems as with the other, plus
c) does not seem to give pair-wise RMSD but just overall RMSD in the whole column
I have a machine with 64 CPU (128 threads) and it should work on about 1 million poses or so in a reasonable time (e.g. 2h).
So as long as the solution can be parallelized, should be fine.
Has someone here maybe already dealt with a similar problem before and can help me, or point me in some directions?
I know it is a quite detailed problem (at least for us), but any tip is very much appreciated.