Possible Document Data Extractor bug with PubMed queries

s.roughley · December 8, 2011, 4:55pm

I have run a PubMed query using the 'Document Grabber' node, and ten use the 'Document Data Extractor' node to attempt to extract some of the relevant data from the query results. Whilst most of the fields seem to be extracted OK, the Abstract field appears to only retrieve the title. Viewing the results in the Document Viewer node, the abstract is displayed fine there, so I suspect a bug in the document data extractor?

Also, are there plans to expand range of databases covered - the dropdown currently only has PubMed - what about e.g. PubChem?

Steve

kilian.thiel · December 9, 2011, 10:59am

Hi Steve,

the text of one document resulting from PubMed queries is considered as the full text of the document (even if PubMed delivers only abstracts). If you want to extract the full text of an PubMed document you need to extract the "Text" field with the "Document Data Extractor".

There are so far no plans for accessing PubChem but i put it on the wish list.

Cheers, Kilian

s.roughley · December 9, 2011, 11:19am

Kilian,

The problem appears to be that the document data extractor is not extracting the same text as both title and abstract, (the text output contains both title and abstract). The document viewer appears to be correctly assigning the abstract text and title text.

Steve

kilian.thiel · December 9, 2011, 6:11pm

I see your point and you are right, it's a bug (in the Document Data Extractor). It is fixed already and will be released with the next bugfix release of the plugn (together with the next KNIME bugfix release).

Kilian

system · June 2, 2023, 9:50pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.