Hi, i am a new user to Knime and i've been looking through the forums for an answer but not understood if what i was looking at was what i wanted, nor if it was possible.
I have a few molecular databases i've created, and i wish to combine 2 at a time as an input, then filter those entries which are duplicates, leaving those which are not in a separate datastream if possible.
I'm not sure if i can use a component with this feature already included, or i need to write a piece of code for another component to carry out the task
An example of what i'm thinking of. Two databases with some molecules in common:
I have a few molecular databases i've created, and i wish to combine 2 at a time as an input, then filter those entries which are duplicates, leaving those which are not in a separate datastream if possible.
I'm not sure if i can use a component with this feature already included, or i need to write a piece of code for another component to carry out the task.
I cannot think of any node in the current version that can accomplish such things, not even for "simple" datatypes like strings or number. For molecules it would be even more complicate I guess, because Smiles is not unique.
i was just looking for a simple way of filtering duplicates out of datasets in a knime workflow exactly in the way dan asked for - dan, did you post the java commands you mentioned in your last message, would be really helpful for me!?!
If the molecules have unique values in both tables' columns, you can use the Row ID node to put them into the virtual Row ID column, and then use the Reference Row Filter to include/exclude all row ids presents in the second, reference table. This node is new in KNIME 1.3.5.
thanks gabriel, i didn't work with reference row filter yet - but indeed: it works quite well.
at least if the molecules aren't in different protonation states - i use smiles as reference row, so different protonation states of the same molecule are not recognizerd as double. nevertheless, if i put all molecules in the same protonation state with *other* programs the problem is solved. or is there a node in knime to change protonation state of molecules?
I have written a node which filters duplicates, as soon as I have access to the nodes4knime project, I will publish it there. The node will take two input streams, and compare them, if a match is found then it will return that row from the selected input port into the +ve stream, else place this in the -ve stream (no match found). The node works by using string comparison. I would not recomend its use for SMILES comparison, unless you have standardise the SMILES strings so that they should be the same, but again I wouldn't do this.