stange behaviour of SDF writer

stoeter · March 30, 2010, 6:37pm

Hi, I am using KNIME to combine SD files from different sources than contain different columns. After renaming, converting, resorting... of columns and concatenate I removed some unwanted columns with column filter. However, if I write then a SDF and read in the same file again, the previously filtered columns are still in the table. Even unselecting them in the read node, ->write SDF, ->read SDF they are back again. This topic seems similar to this previous one: http://tech.knime.org/node/20637 I just want to make clear that the columns I tried to remove are not extrated from the sd field and I dont want to change the sd content. Somehow it should be possible to delete columns from a table containing SD fields, some of the columns I created myself in order to merge the different sdf tables properly and want to delete them afterwards. Thanks for help, Martin

thor · March 30, 2010, 6:45pm

As I already mentioned in the post you are referring to, KNIME does not touch the SD cells. If you read in the file modify the table and write out a SD file again the structures are still the same as before since they haven’t been changed. You need to extract the Mol part of the SD structures and then add only the columns you want to include (SD properties insert) and finally write this column.

stoeter · March 31, 2010, 2:43pm

Sorry, may be I didnt understand the concept/format of SD files!?
I have a column of SD cells and other columns of integer, double, string… values, which are independent of the structure. These columns I would like to delete and save only the untouched SD fields column. I can filter these columns out, nevertheless they are written to the sdf. Thanks for your help, Martin

thor · March 31, 2010, 9:01pm

Did these columns originate from the SD files you read in? Or were they added by other means to the data table? If the former is the case, then my previous comments applies in that the SD structures are not touched by KNIME. Therefore deleting the extracted columns will not change the contents of the SD cells and thus they are written to the file.
If you want to deleted properties out of SD cells (if I interpret your post right), then you need to extract the Mol entry, possible add some properties you want to included with the SD Properties Insert node, and then write this modified SD column, but not the original one.

stoeter · April 6, 2010, 1:56pm

As I said, I dont want to touch the SD field!
There are columns that are read-in with the SD file, some of them theoretically could have been generated from the SD field, others I have created created myself, however all of them are present in the SD file from the beginning, they can be filtered out by column filter, however the SD writer writes even those columns.
Martin

thor · April 6, 2010, 3:53pm

Sorry, I don’t understand your problem. If “all of them are present in the SD file from the beginning” then yes, they are there and will be there forever and will be written out. If they were not present in the original file but “you have created created yourself” then they are only written, if you selected them in the SD writers dialog. If you don’t include any properties there and they were not present in the original SD file than they do not end up in the written SD file (just tested this, to be absolutely sure); only those properties that were in the original file.

s.roughley · March 26, 2012, 12:29pm

The entire contents of each sdf record is contained in the 'SDF molecule' cell from the sdf reader, as well as the fields being tabulated into data columns. Therefore, when you process the table and write it out, you will add all those columns from the sdf column back into the output table. (You can see this if you view the sdf column as an sdf string - the structure data is followed by any other property/data fields in the cell)

The solution is to check the 'extract mol blocks' option in the sdf reader, which will give you a second structure-containing column of type 'mol'. Now you can process any other columns as you wish, and then when you use the sdf writer, select the 'mol' column (not the 'sdf' column) as the molecule-containing column. If I understand your problem correctly, this should solve it.

Steve

richards99 · August 12, 2012, 10:45pm

You can also use the SDF Extractor node, choose to only extract the MOL block (i.e. the molecule part). Then this will give you a MOL column which is identical to the SDF structure column except all those property columns are now removed. Now simply use a Column Filter to remove the original SDF column.

If you now use the SDF Writer node, simply select the MOL column for the structure column, and select all those property columns you are interested in. Don't worry, the SDF format of the structure is exactly the same as it was in the original SDF.

Simon.

Nico1990 · February 26, 2015, 1:17pm

Hi,

I have noticed this problem recently when I saw that hundreds of columns supposed to be removed were still in the sd file.

Weirdly, before this, I am sure that filtering the properties using a column filter node or by unselected them in a sdf writer was enough... I though it was due to a recent modification in Knime, but when I see this thread is 5 years ago, I am puzzled!

Anyway, the solution of Steve avoid using any other node, which is perfect.

Thanks,

Nicolas

richards99 · February 27, 2015, 6:50am

Even filterng columns doesn't work as the data is kept and hidden inside the cell with the structure. Because of the number of users having issues with this, there is now a new node called SDF Stripper.

This will clear out any data columns hidden inside the SDF Structure cell. After using the SDF Stripper node and then the SDF Writer, you will no longer get any surprise extra columns appearing.

simon.