Fetching sentences with particular word

Hi Killian,

Long time.. I hope you recall me. I had visited your group last year. Great job..

I have two questions.

Can we fetch information from structured databases otherthan Pubmed like USPTO or ESPACENET? If yes, then how?

Is it possible to fetch sentences with some particular word. (like TNF and toxicity..) So once we fetch pubmed abstracts using document grabber, what would be ideal way to go? Convert it into sentences and then how to filter sentences?

Best,

Anil

Hi Anil,

1.) There is no dedicated node which queries other databases than PubMed. In general there are a few ways to create a List of documents:

- If you have a database you can access via sql you can use the KNIME database nodes to select/extract the text as strings, i.e. select title, text, authors etc.

- If you have no such database access you need to write a script or something comparable to access USPTO or ESPACENET. You can save the documents in dml or sdml (xml format to represent documents in KNIME). Or simply as strings where the sections like title, and text, etc are separated in a certain way. By using the File Reader node you can read the file into KNIME and create string cells out of it.

Once you got strings, containing the title/text etc of documents you can create documents out of these using the "Strings To Document" node. From each line a document is created using specified stringcell column as title, text and so on.

2.) To extracte sentences containing specific words you need to extract the sentences first using the "Sentence Extractor" node. The you can use simply the "Row Filter" node and use a regular expression to filter sentences. Or you can create a document for each sentence using the "Strings To Document" nodeĀ  and than use i.e. the "RegEx Filter" node to filter the terms you are looking for. Alternatively you can tag the terms of interest using the "Dictionary tagger" with you own dictionary and than filter for these tags.

Hope this helps,

Kilian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.