CDK nodes

I run a software company (Treweren Consultants Ltd) and we are contemplating integrating our THINK software into KNIME. We are very reluctant to proliferate "trivial" but essential nodes providing functionality such as reading and writing SD files but it is also apparent that there are some significant issues with the CDK nodes. Our understanding is that the TRIPOS alternatives are not freely available and that should the ChemAxon functionality be integrated these will also not be freely available.

Does any-one know if there are plans to upgrade the current CDK implementation? Is there a list of planned enhancements?

Thanks.

THINK wrote:
Does any-one know if there are plans to upgrade the current CDK implementation? Is there a list of planned enhancements?

We plan to add the following nodes to the next major release:
  • A node to extract the properties in SD files into columns in a data table
  • A node that does the reverse (i.e. column into a SD property)
  • Nodes to convert from the internal CDK representation to SDF, Mol2 and Smiles
  • A substructure search node (essentially a node that splits the input table based on if a molecule contains a certain fragment or not)

Regards,

Thorsten

[/]

The SD file improvement plans are encouraging - and address my concerns. Thanks.

You're probably aware that the described substructure search functionality falls short what is often useful for medicinal chemistry R-group series of molecules. For such series a Markush style R-group query can be used to identify the R-groups. The R-groups can then be displayed (2D coordinates) in a separate column. Some series (including Combinatorial Chemistry Libraries) have more than one variable R-group position. I'm not sure how much interest there would be either in Combinatorial Chemistry Libraries or in breaking down a series into R-groups.

Hmm, finding those groups using Markush structures would already be a problem in itself. We don't have the algortithms in house to do that and I am not sure we can use CDK tools here either.
Other than that I would be all in favor of someone else putting such tools in there and add functionality to deal with combinatorical chemistry spaces...

- Michael

For typical Medicinal Chemistry series of molecules (and also for Combinatorial Chemistry libraries) there is sometimes more than one possible choice for a core group and side-chains. The choice can be impacted by the series of molecules as might be the case for a set of peptides but is often a matter of personal preference. In other words, I wasn't envisaging a maximum common substructure type algorithm being used to identify the "core" and "r-groups" but instead a substructure search with the core as a user specified query.

The appropriate output would then be a CDK column for each R-group. For convenience it may be best to have a separate table for each set of R-groups - so that R-group properties could be easily computed. Joining these tables should not be a problem in principle.

At present, implementing such functionality is not a priority for THINK integration into KNIME. We are giving priority to computational chemistry tools - although THINK does have reagent searching and combinatorial library enumeration functionality.

Hmm, that would require a bit more chemical "intelligence" than our internal tools offer. Our discriminative substructure search node could be convinced to search for fragments and maybe even split the underlying molecule in pieces but we currently have no way to identify specific side groups as R1, R2, ... to make sure it's always the right R-group in the output.
This is only an implementational issue, of course, but we have a lot of these issues on our plate right now :-(