Generating Bit Strings using SMARTs query node

I have a database of chemicals and want to evaluate each chemical against a list of SMARTs patterns also from a database.  The goal is to generate my own BitStrings for each chemical that relate to the SMARTs patterns.  I have successfully done this BUT I can only do it one SMARTs pattern at a time (attached workflow).

Ideally, the workflow would be to run each chemical from one database table through all the SMARTs patterns (ie: 1-10) from another database table and generate a concatenated bit-string like "1010101111".  Then do a database update for each chemical bit-string.  Each position in the bit-string means match(1) or non-match(0) relating to the SMARTs ID (1-10) in the database.

I've tried to use Chunk Loop Start on the SMARTs patterns but this doesn't allow the Bitstring to be looped back so that it can be updated (at a particlur bit position) using the String manipulation node.

I downloaded the example workflows for the Recursive loop nodes but have had limited success.  I must be missing something but this seems like a doable task for KNIME.  In essence, I need to do a nested FOR loop like this:

for x=1 to numChemicals
     for y=1 to numSMARTs
         run SMARTs query and generate BitString using String manipulation node

My recommendation is to use the RDKit nodes instead.

The node which should deliver what you need is;

RDKit Substructure Counter (with Add Column with tags for matching queries selected).


You can then use the Column Aggregator node to Aggregate all these 0 and 1 columns together using Concatenate option in the Options tab, and make sure to clear out the Value Delimiter box.


Thanks for the input Simon.  Your instructions were clear and concise. 
I modified the SQL query that retrieves the SMARTs to have a prefix:
SELECT concat("SubStruc-",FGroupName) AS FGroupName, SMARTS FROM tbl_ChemFuncGroups

This allowed me to select columns in the aggregator by wildcard(SubStruc*) avoiding the need to manually add each column as my list of SMARTs grows.

I didn't foresee the value of having "counts" makeup the bitstring as opposed to true/false.  The only problem arises when a count is greater than 9; the bit will be 2 chars (unless I convert to something like a 36-base  0-9-A-Z).

Do you have any suggestions on how to handle the possibility of a substructure count >9? For most of my substructures, they will not be present more than 9, but it's possible. 

I could also delimit the bitstring with a comma as this would be adaptable in my application.


Hi, if you use a comma delimiter, in the column aggregator node, you could follow up your resultant combined column with a string replacer using RegEx to replace all values greater than 0 with a 1, setting the option as a pattern rather than the whole string. For example;

[2-9]          and then in replace box put a 1

then in a second string replacer node use; (note commas are deliberate here)

[10|11],       and then in replace put a 1,

You should now be left with just 0's and 1's,


if if you finally want the commas out you could use a third string replace node with , in the find box and replace box as empty. 


Thanks Simon.
Using string replacer node was a good idea.

I was able to use 1 string replacer node to capture counts 1-99 using the regex expression:


1-9 in the 2nd digit OR
1-9 followed by a 0 OR

The order of this regex pattern matters; putting [1-9] at the beginning will negate the [1-9][0] condition.