Way to visualize long texts in tables

Hi,

 

I am dealing with texts from articles' abstract pasted into excel sheets and then imported into Knime. (workflow attached)

 

I have used 'interactive table' to do my analysis, but the it is tough to visualize the text, since it appears in a long width column. I tried to enlarge row height, but even when the row height is  100 or more, the text is kept in just one line (I was expecting some kind of 'wrap'). Also, I tried to hover over the mouse pointer on the text to see if the full text could appear in a pop-up view, but this feature seems not to exist.

 

I would appreciate help to understand how a long text (from 250 to 500 words) can be better visualized through Knime.

 

Many thanks,

Cadu

Hi Cadu,

 

you can change the column style by right clicking the header. There you have the choice to select between String (which is your current choice) and Multi Line String.

However did you install the Text Processing Extension? http://tech.knime.org/knime-text-processing . It has a document viewer and a tag cloud viewer for texts.

 

Cheers, Iris

Hi Iris,

 

Many thanks for your answer. I wasn't using Text Processing Extension so far. I've started my learning about that and I would appreciate further understanding to a couple of points:

 

a)  I could use the 'document viewer' node. However, to a long list of documents, I couldn't find a way to search for specific documents. I had to peruse the full list to find that I wanted. Is there any way to search content (find string) in the 'document viewer' node?

 

b) I couldn't use the 'tag cloud'. With the 'document viewer' node I just connected it with 'Strings to Document' output. Is there any other node that needs to be connected to 'tag cloud' to it work? I would like to see in the 'tag cloud' abstracts + title content.

 

c) I could use the 'Multi Line String' in 'interactive table', but the rows are still as one single line. 'Multi Line String' is just centering the line. Besides set the rendering as 'Multi Line String', is there anything else to do?

 

I've attached a workflow with data and the nodes I mentioned above.

 

Any clarifying will be very welcomed!

 

Cheers,

Cadu

Hi Cadu,

 

a) In the Document Viewer you can click the button with the magnifier lens icon on the top left and type a word or regular expression into the text field on the right, e.g. problems, or probl.* and all terms matching that expression are highlighted.

 

To filter rows containing a substring use the Row Filter node. You can specify strings or string with wildcards or regular expressions to find.

 

b) For the Tag Cloud you need a bag of word data table with a frequency column.

 

c) I guess your strings contain no line breaks, so they will not be broken be the Multiline renderer.

 

Attached you find a workflows with examples for a) and b).

 

Cheers, Kilian

Hi Kilian,

 

Many thanks for all information. Amazing to work with 'text processing' tools: document viewer, term frequency, tag cloud, etc. My usage of Knime increased a lot after that!

 

A last point: in the 'tag cloud' workflow you sent, there is one part starting direct from the 'strings to document' node (1), and other part a 'POS tagger' was added (2).

 

At the end, (1) and (2) outputs seems very close, except for the fact that in (2) the tags are coloured and more 'beautiful' than in (1), some tags positions are slightly different as well.

Probably there is a more technical implication/difference between (1) and (2). I mean, what does the insertion of POS tagger in the begining of the flow change in the output?

 

Cheers,

Cadu

Hi Cadu,

 

the Tag Cloud is able to color terms based on their assigned tags. To assign tag, e.g. Part of speech tags (POS), or named entity tags a tagger node needs to be applied, such as the POS tagger. Based on these tags the terms are colored, meaning that nouns have a certain color, verbs have another color and so on. You can choose the color in tha Tag Cloud view and you can specify which terms are colored by applying the appropriate tagger node.

 

Cheers, Kilian