I've come across a strange issue I was able to reprocude in a simple example workflow. The issue is quiet hard to explain in words but I try:
I create Murcko scaffolds with the according node. Te original Molecule column is kept in the table. I then split of all molecules that don't have a Murcko scaffold (eg. linear ones). In a python snippet which uses RDKit I then assing a scaffold to the linear molecules (code omitted in example, everything set to CC for simplicty).
The code (which migh also be cause of the issue?) due to me using Pandas wrong:
for index, row in input_table.iterrows(): row['Murcko'] = Chem.MolFromSmiles("CC") output_table = input_table.copy()
This code actually works. But with a limitation. I figured out that limitation later on when suddenly code did not work anymore. What does "not work" mean? In the output of the Python Snippet the column "Murcko" is set to missing for all rows and is now a String type (and not RDKit Molecule liek it was before).
When does the code fail? It seems if there is a double column in the table, the code falls over without any error and the column "Murcko" is converted to a String column containing all missing values. If I filter out double columns with column filter, it works again. (shown in example).
You really need to look at the example as the issue is very obscure.
What can cause this behaviour?
For some reason I can't see the embeded image in the post. It wokrs in the preview but not after submiting?. Here the link to the image showing the issue: