Big Data Mining

masterfloxxl · February 23, 2016, 12:20pm

Hello fellows,

first of all, I could not find a forum search, so I am not sure if this was questioned before. If there is a search, which I just missed, please refer to it. Big thanks.

So my question is quite fundamental right now. I just want to know if it is possible somehow, before I take weeks of my freetime to develope something which is impossible.

I am a chemist doing my PhD in a field with a lot of papers published during the years and also beeing published right now. So if I want to read every paper about it, to find out if it is useful it would probably take my lifetime doing so. As a problem oriented person, I thought about filtering all the data.

So my idea:

Using as databasis all paper which are published, which my bib has access to. These are of course textfiles (pdf or html or htmlx) with structures, links, pictures, etc.

The fundamental question is, would it be possible to use this as a database for further data analysis. So knime would need access to this databases somehow (there are of course databases which are used by scifinder, reaxys and so).

Afterwards I would like to build intelligent scripts which can filter the text for the content of interest by keywords, structures, etc. But first things first.

So, what do you think in general? Is it possible or should I just scrawl though everything in scifinder?

Greetings

Z1745566 · February 24, 2016, 8:33pm

Hello masterfloxxl,

See if this is of some use for you.

https://tech.knime.org/forum/knime-textprocessing/find-similar-documents

I am also new to KNIME and while searching for some thing I found this.

Hope this helps you.

//Z1745566

Iris · February 26, 2016, 7:14pm

Hi,

yes this is possible with our text processing extension. You can also find examples on our examples server which demo how they can be used.

Best, Iris

(we do have a search for the webpage, which will bring you the forum entries as well, but. I know, this is not optimal)