Query Pubmed from Knime ?

Dear Knimers,

I have a trivial question: given a file with Pubmed ID, I would like to download them from NCBI.

How to do it ?
I've tried the Generic WebService node, using all the SOAP bindings that NCBI offers (from version 1.0, 1.1, 1.2, ... up to the latest 2.0), however none of the worked.

I've also tried the EBI WebServices, however I could not find a "simple" WebService which given a PMID as parameter returns the plain PubMed record...

Given my attitude, I was thinking to create a Perl script and then embed that script as Knime node, however I feel that this is not extremely elegant...


Is there a node ready to do that ?
Looking forward your comments :-)

thanks !

There is a node in the Texprocessing plugin named "Document Grabber" this node is able to query PubMed and download the search results, parse them and bring them into KNIME as DocumentCells in a DataTable.

Cheers, Kilian

Hi Kilian,

this sounds great. Could you be a bit more specific on how to do that?



Hi Sebastian,

you simply specify a PubMed query, e.g. "IL6 AND IL12 AND cancer AND mouse" in the node dialog of the DocumentGrabber node (you need the Textprocessing plugin for that node). Then you specify a directory and if you want a maximum number of results to download, and then you execute the node. The node sends the query to PubMed, downloads the (free available) results (titles, abstracts, etc.) parses them and converts them into DocumentCells. DocumentCells are Data cells containing documents. For these type of DataCells you need the Textprocessing plugin as well. Now you can use other nodes of the Texprocessing plugin to process the text data you got from PubMed.

Cheers, Kilian

PS: It is possible to specify a PubMed query with all its features and specifics, e.g.:

IL6[All Fields] AND ("interleukin-12"[MeSH Terms] OR "interleukin-12"[All Fields] OR "il12"[All Fields]) AND ("neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]) AND ("mice"[MeSH Terms] OR "mice"[All Fields] OR "mouse"[All Fields])

Thanks a lot! I will try that at once!


I am facing few issue in knime as below.
1. i am trying to convert Value  Filepath to a Variable using "Value selection Quickform" under quickform node, and am feeding it to variable based file reader to ready file to read file in from the input path. but when i do this, for the first time, Output from "Value selection Quickform" reflects the correct value as input. However in next iteration, if  the input value(Filepath) changes, output of "Value selection Quickform" still throws Previous Value(filepath). Which means the output of the same is not dyanamically changes the input changes. 

please let me know if there are any nodes which convert input value to variable dyanamically as input changes. so that i can use variable based file reader.
2. In my work flow i have many CSV output to track the data. However when i give this entire workflow to my client, the Output path has to be configured once again for individual CSV output. Is there any way to use flow variable to write CSV as output? so that i need not configure all CSV output as per clients path.

3. When i transfer entire workflow to some other desktop, XLS reader will be correpted. It will fail to read the file if at all we browse file from correct location. At this point of time we will have to delete current XLS reader an will have to use new XLS reader.




Hi Shrini,

this is the wrong thread and forum to ask this kind of questions about FileReader and flow variables. Better use the general KNIME forum (http://tech.knime.org/forum/knime-general). Anyway, i attached a workflow which hopefully helps you using flow variables as paths for the File Reader and CSV Writer nodes.

You can use the Quickform nodes (String Input) to specify a input base directory, containing all the csv file to read, as well as a output base directory to write all the output csv files to. The "List Files" node creates a table with the locations of all csv file contained in the input dir. Use the "TableRow to Variable Loop Start" node to loop over these locations and read the files using the "File Reader". The next "String Input" Quickform node specifies the output base directory to write csv files to. The "Java Edit Variable" node appends an dynamically created file name. Finally the "CSV Writer" makes use of this file location which is passed over as a flow variable. When you want to apply this workflow on an other machine you just need to adjust the input and output base dir set by the two Quickform nodes.

I hope this will answer your first two questions. The third question i do not fully understand. Is it possible that the structure of the XLS file you want to read on the other desktop is different to the structure of the XLS file with which you configured the node?

Cheers, Kilian