I am developing a workflow that ingests SD files in mixed format (V2000 and/or V3000, sometimes in the same file).
Is there a way to force the SDF Writer for example to generate the output file exclusively in V2000 format?
Alternatively, is there any functionality I could use to ensure all mol entries are in V2000? this is required for downstream applications, e.g. a converter that allows me to configure output format?
Hi @Jo1607,
While the SDF Writer does not have a flag to specifically write in v2000, you can use an RDKit To Molecule node to āconvertā it to v2000, by selecting SDF as destination format.
See attached screenshot or workflow.
Hi @tkaynak ,
thanks for your quick reply. I tried the hint you suggested with partial success.
Basically, I have an input file of 48,028 molecules, all in V3000.
Using your suggestion, I created an output file that contains
41,396 molecules in V2000
6,632 molecules still in V3000 format
Will investigate further, what causes this split, I know that V2000 is limited to a max of 999 atoms, for example.
Can I get back to you in case I canāt solve the puzzle?
Hi @Jo1607 ,
Hm, then it seems this solution is not thorough⦠Could you offer a small example dataset with a few of the structures that remain in v3000?
Thanks, best regards,
Tugrul
sure, I randomly sampled 100 entries from the 6.000+ remaining in V3000. Please find attached a corresponding file for your review Test_remainingV3000.csv (146.0 KB)
.
Hi @Jo1607 ,
Thank you for providing the sample.
As for example seen in
M V30 BEGIN COLLECTION
M V30 MDLV30/STERAC1 ATOMS=(2 1 6)
M V30 END COLLECTION
ā¦those structures contain v3000-only features (enhanced stereochemistry). It seems to me that the RDKit to Molecule node chooses automatically to write v3000 in those cases, to not lose this information.
If you can afford to lose the stereochemical information and specific coordinates, you can quickly work around by converting to SMILES and back. If you need to retain this information, you have the nontrivial problem of simplifying the stereoinformation into what v2000 supports.
Hi Tugrul @tkaynak
thanks for this very quick turnaround.
this explanation makes sense to me that the node is ācarefulā not to lose information.
I will review this and use the workaround you provided.
Thanks a lot for your support, as usual, the forum is absolutely great.