I have thousands of structures to be searched with a few substructure queries, with the key requirement that multiple substructures match in a hit and are non-overlapping. i.e the substructures in the hits are in different parts of the molecule.
Since there is a neither any option in the substructure match node nor coordinates of the matches , is there some way to achieve this? Can such options be added? Note that no of substructure hits option is not useful here.
I think the attached workflow is a start towards what you're looking for.
The key is to use the "Match handling" pulldown in the RDKit Substructure Filter node to add a column that contains the indices of the atoms matching each substructure. This information is then used in a java snippet node to count the number of overlapping atoms between two queries. The number of overlapping atoms is added as a new column.