Vernalis PDB Connector node

Dear All,

I am pleased to announce that we have just officially released our first (hopefully of many!) Vernalis KNIME node - PDB Connector.  The node was developed in collaboration with Enspiral Discovery.  Further information, including the update site address and installation instructions can be found on our website here.

To give an idea of utility, I have attached a sample workflow (plus screenshot), and a page from the corresponding KNIME report - as a picture is worth 1000 words (the report was too large to upload here - but of course you can generate your own using the workflow)!  Finally, the example workflow description (from the meta data) is copied below; and if you have any questions please email <knime at>






This workflow uses the Vernalis PDB Connector node to connect to the RCSB PDB and retrieve data on newly-released PDB structures.
Once retrieved, any ligands are assessed for likely 'interest' - based on counting number of aromatic rings and assuring MW is greater than a threshold value.  Phosphorous-containing ligands are removed from the resulting list, due to the subsequent search taking too long when eg ADP and ATP are present!
The list of interesting ligands is then looped back over using another PDB Connector node - passing in the SMILES and running a 95% similarity search to find PDB structures that contain a similar (or identical) ligand to the ones in the new releases.
All of the interesting ligands are also searched for on ChemSpider, ChEMBL, and PubChem - using their respective web services.
The information is then formated in a report that contains hyperlinks to the external data repositories.
Nodes from the following community contributors were used:


I am passing gene names via a flow variable to the pdb connector 's text field (tablerow to variable /tablerow to variable loop). However it returns a null. I thus tried using the PDB connector node without passing any parameter and it returns a blank.

If I use the vernalis node from the attached example workflow, it always fetches the latest 129 entries, regardless of whether i pass the gene name as a parameter to the text field flow variable..I can't make out how to retrieve pdb's id's by passing the gene name.

Also which tab is configured in the workflow's pdb connector node so that it fetches the latest entries? It's hard to find this.


many thanks in advance



Glad to see you are using the node(!) - hopefully I can help with the questions:

Latest entries can be fetched by selecting the 'Latest released structures' option in the 'Deposition' tab.

When you say that the example workflow always passes the same number of results back regardless of the flow variable - my guess is that you are setting the TEXT_SEARCH.VALUE, but have not manually checked the corresponding check-box to use this value (or this can also be set with a variable - TEXT_SEARCH.SELECTED)?

It is, unfortunately, quite a complex interface - which is purely a result of the amount of functionality that the RCSB provide!  As a double-check, you can always click the 'Test Query' button on the 'Query Options' tab - which will show you the actual XML that will be sent, which usually shows when something isn't set as you think (this process will also display how many results will be returned with the query).


Kind regards


Yay! It worked. Thanks again!



Hi James,


Would it be possible to add column(s) to the result table that indicate what the query field and query itself was?

In this case column name would be text_search and row values would correspond to each query passed by the variable.





I think the easiest way to do this is at present would be to run the output from the node into a variable to table column node - if you are specifying your query from flow variables anyway, then this should work - you will need to put your flow variable connection to the second node in addition to the PDB connector node.


Havent tried this yet, but will this enable me to correlate the query with the returned values? Especially if some queries do not have results?

I'm assuming you are running multiple queries in via a tablerow to variable loop start?  In which case, it will add a column to each query output containing whatever the flow variable contains (NB If your query is in several separate flow variables, you can use a edit variable node to combine them into a combined query variable first - or just add multiple columns) on each iteration.  The only ones you will not see separated are those for which there are no results.