Text Processing Basics

Buglish · February 11, 2013, 10:30pm

Hi,

I am new to Knime and found the tutorial for text processing but the action blocks being refered to is not in my version of Knime.

Would one have to import the relevant library or is the document just refering to older modules that has been renamed?

Regards,

Bug

kilian.thiel · February 12, 2013, 11:23am

Hi Bug,

which tutorial do you mean? Is it from the text processing website? If it is not from the website it may be that it is an older tutorial. To run the examples from the website, of course the required plugins need to be installed. Which modules are missing / can not be found in particular?

If you want to start with KNIME text mining i recommend to download the newest version of KNIME and install the newest version of the plugins you want / need. To get a quick start the example workflows, available on the website (http://tech.knime.org/knime-text-processing-0) may help you. Additionally there is a documentation available. If you are willing to read a bit more, i also recommend the introduction report: http://tech.knime.org/files/knime_text_processing_introduction_technical_report_120515.pdf

Cheers, Kilian

Buglish · February 13, 2013, 10:38pm

Hi again,

The correct function block were avilable on the latest version after updating all the expansions.

The Document grabber wants a folder thats empty so how do you get data intput into the first item?

Also is there a method for unstructured text processing?

Regards,

Bug

knimeknoob · February 14, 2013, 10:44pm

I'm also in search of beginner-level training materials. I learn best through examples and am willing to document what I learn.

I am hoping to use KNIME to extract data from reports and organize it into spreadsheets. The reports I have contain several rows of data relevant to a single record and I'm challenged with how to associate an index value from the first row in the series with the rest of the rows in the series. The data may be identifiable by either fixed location across in a given row in the series or a field label identifier may proceed the data that can be used to determine its location.

Is KNIME capable of doing this?

Are there search terms I should be using in my search or modules i should focus on?

kilian.thiel · February 19, 2013, 10:49pm

Hi Buglish,

about the DocumentGrabber: The grabber nodes wants an empty directory since the node will download documents from PubMed resulting from the specified query. The documents are downloaded, parsed and stored in the (empty) directory.

If you already have text you want to process, say a csv with e.g. title and fulltext columns, you can use the "FileReader" node to read the csv text data, and then transform the text data, represented as strings, into documents using the "Strings to Documents" node. At the point you have documents (Document cells) you can use all the tagger nodes (e.g. to POS tag terms) and the filter nodes (e.g. Stopword filter etc.) provided by the Textprocessing plugin. These nodes work on unstructured text.

Attached you find an example of how to convert a table containing string columns into documents and use some text mining nodes.

Cheers, Kilian

stringstodocumentsexample.zip

kilian.thiel · February 19, 2013, 11:04pm

Hi knimeknoob,

i didn't get exactly what you want to do but i say yes, it is possible. If you have unique identifiers for each row to which record they belong, you can join them based on these IDs, using the joiner node. Thereby you join the extracted data for a record in one row.

If you have several text documents for one record and you want to extract e.g. terms from those, you can given them the ID as category. Then you can apply the text mining and extract the ID afterwards again, in order to join the extracted data together.

Is this roughly what you want to do? Hope this helps. If you could specify your use case a bit maybe i could give you some more specific tips.

Cheers, Kilian