If I join the SDF with 18300 rwos with the XLS of 18300 rwos by Compound_ID I get 32505756 rows. What is going on?
Alex
If I join the SDF with 18300 rwos with the XLS of 18300 rwos by Compound_ID I get 32505756 rows. What is going on?
Alex
If you have more than one instance of a Compound_ID it will do the join on each instance. This can lead to a larger output table (I haven't looked at your workflows).
Because the compount ID is not unique in these files.
E.g. one of your compounds (TRA006372) appear 5700 times in the xls as well as in the sdf.
for each joined pair you get a new row 5700*5700 = 32 490 000
Cheers, Iris
Thanks a lot, the duplicate IDs were the problem. I grouped on SMILES and did not expect different structures with the same ID.
Problem solved, thanks
Alex