For a literature review I would like to apply the workflow ‘topic modeling on biomedical literature’, also explained in detail in a webinar.
However, I noticed the document grabber node did not allow search queries in a different search engine than PubMed. My current literature review focus is not in biomedical literature and thus I’d prefer to use an alternative search engine.
Is there a work around to use the same workflow in which loops are running to extract documents from, for instance, Google Scholar or Scopus?
I’d prefer to let KNIME run the search and pdf extraction. I would expect the alternative to be a pdf parser node after manually saving all pdf files.
Hope to hear your thoughts on the matter, thank you in advance.
As far as I know there isn’t a completely automated way to approach this with a single node in KNIME. There have been a couple of recent threads around this topic on the forum:
The complicating factor when you look at something like Google Scholar is that the search results point to all sorts of different landing pages, and available formats change depending on the document (HTML, PDF, DOC, etc).
Now, if you already have a bunch of documents downloaded in a folder, this is much more straightforward since you can just use the Tika Parser to ingest the files.