Setting column headings for Substructure Counter node

I'd like to set the name of the susbstructure count column to a name rather than the SMARTS -

so I have a SMARTS file

SmartsValue    SmartsName
[F,Cl,Br,I]    hal
[$([F,Cl,Br,I]),$(OC(F)(F)(F))]    pseudhal

When I use this in the Substructure Counter I get columns labelled [F] and [*H] whereas what I want is to use the names in the SmartsName column.  I've poked about a bit but can't work out how to set the column name to that from the  SMARTS file.  This becomes a major issue when the same column name (e.g. [*H]) is assigned to two columns at which point the node substructure count node fails.

KNIME provides a number of nodes dealing with column names: a) Column Rename is probably the easiest one, b) Column Rename (Regex) is a little bit more advanced but also more generic, c) Insert Column Header is just another node to replace the column header that is available as _data_ (to translate a column header to data, use the Extract Column Header node). Hope this somehow helps.

Mmm, this doesn't really help - I can't rename the column headers post processing as the Substructure Counter Node generates the column headers - if it generates two the same, then it fails. 

Thanks for sharing your issue. I have had a look in the code after reproducing your example. A few things come together here:


1. You found a bug located in the RDKit Substructure Counter code that makes a new column name unique. Thanks! :-) I fixed it and it should be available in the next nightly RDKit Node Community build (please check after 24 hours).


2. You are using SMARTS as input. The substructure counter node was written originally for SMILES substructure queries. However, SMARTS work as well, but when the column headers are derived from the query molecules we were using its SMILES value, which looks strange for a SMARTS and really is not what you want in the resulting new column headers. This works also better now in most cases with the code I checked in.


Your suggestion to improve the node in the way that you could select another column of the query table that would be used as the target column name that contains the substructure count was well received. We may add this in the future.


For now, the node should not fail anymore in the latest version, and may follow Thomas Gabriels suggestion now to use the Column Renamer as subsequential node to change the column names.


Kind regards,

Manuel Schwarze

This ability would be one of the most important features. I too have been stumped numerous times. Looking forward to when this would be an inbuilt feature.

Thanks for actioning this so quickly - I've tested it and works fine.  The column rename method as suggested by Thomas Gabriel works too, though it's a bit of a clunky work-round  if you have reasonably sized SMARTS files, or those you use repeatedly.  Look forward to seeing the target column namer added to the counter in the future.

The feature of having now a names column beside the SMARTS column, which is processed when performing substructure counting, has been implemented and made available in the last nightly build.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.