PDB connector requests

Hello,

As you know, the PDB connector node is very helpfull and its integration in Knime remarkable. However, in order to improve the PDB connector node, could you add some improvements (if feasible)?

- when you perform a similarity query with a ligand, that would be great if the similarity indice could be returned in the output 1 in the same row as the ligand. Currently, when you need this information, you have to calculate the indices separately.

- when you perform a similarity query with a ligand, the node returns every molecules (ions, solvent etc...) attached to the PDB code that contains the similar ligand. I think in most of the case it is unnecessary, so the user needs to filter the molecules. Maybe an option to give the choice of what molecules you want could be solution.

- In the flow variables tab, the list of variables is huge and it is not easy to find the right variable you want to modify. Maybe you could group some of them?

Thank you,

Nicolas

Nicolas,

Thanks for the feedback.  I will attempt to answer these points in order:

1.  Similarity indices - it would be nice if the PDB supplied this in their webservice return - un fortunately they dont.  I think think it would add to the complication of the node if it was then to try to calculate a similarity score (and mis-leading, as it would not be the exact score used in the search).  Obviously, if the PDB start returning that in their query, then we can update to include it in the node

2. We entirely agree!  Unfortunately, the PDB query/report system is very 'structure-centric' - i.e. no matter what query you run, it always returns a list of Structure IDs to report on.  It is very frustrating when you then run a ligand report and get a lot of extra ligands which do not relate directly to the query.  We dont have a solution to this at the moment, but there are several possibilities - there are other PDB webservices for e.g. data around a heterogen ID - which might make some filtering easier.  Would a node which could filter based on either an internal list of solvents, inorganic ions and so on, or an optional list supplied via a second input port of heterogen IDs to be removed be useful (similar to e.g. the RDKit SaltStripper node)?  Also, there is a ligand query webservice which returns a list of ligand/structure ID combinations - we might be able to build something to query that eventually...

3. Flow variables tab - also agree.  Unfortunately, this tab is built by the knime core and so cannot be controlled.  However, we are shortly (within the next week, we hope) going to release a new sister node, which will allow then entire xml query (as you see it in the main query dialogue tab) to be pasted in, or supplied via a flow variable (which will be at the top of the flow variables tab!).  The xml can be copied from the PDB website after clicking on the "query details" link, and also both nodes will supply this on their output side as a flow variable.  There will also be a few improvements following an upgrade to the PDB's reporting service, which should make long queries much quicker again.

Hope some of this helps!

Steve

Thank you, Steve, for your answer. I cleary understand that you arr dependent on what the PDB gives you access and they cleary lack flexibility.

1 - Yes it is better for the user to calculate the similarity by itself if the PDB does not return it.

2- Any help you could provide to help to filter non-ligand molecule will be welcome. Although, currently, in my case, it is not a problem. Actually, it becomes a problem when the PDB connector is part of a loop and it returns a lot of results. The filtering resulting may take time.

3- Such node would definitely help a lot.

Thank you again.

Nicolas

We've just released the new XML query to the nightly build - it should be available now.  See this post for further details, and this link for a more detailed description of it's use.

Regards the PDB - it may be worth contacting them directly about the similarity search - they are open to suggestions, although there may be a long delay in implementing them, unless they get a lot of requests

Steve