get the frequency of the words according to the date

 

I have to questions :=

1- I have a table containing two columns ... date of the post and the post ... and I want to get the words (terms) frequency tf according to the date .... in other words, I want to get the word frequency on each day .... could the knime do this ... 

2- some times when I filter some documents with Punctuation Erasure node... I note that there are still some terms contain punctutions such as  (I'm) or ("design) or ('s) ... how can I remove these

Hi Shaaban,

1.) Yes, KNIME can do this. Group the data by the date column using the GroupBy node and concatenate the strings (posts). Don't forget to add a separator character (e.g. whitespace) when concatenating the strings. Then create one document for each date containing the corresponding concatenated strings. Then do preprocessing (e.g. filtering) if necessary and finally use the TF node to count term frequencies.

2.) The punctuation erasure node does not filter characters like ' since they belong to the word and do not indicate e.g. the end of a sentence. To eliminate these charcater use the Replacer node. In the dialog of the node a regex can be specified together with a replacement string. The matching substrings will be replaced.

Cheers, Kilian

Thank you for the very usefull reply ... but in the second question ... how I can remove any non word characters like (-/.,') using  regex together with a replacement string.

ok thank you there is no problem now .... every think is working now

Can this be applied to search for a phrase?  In your example could I count the number of times "good food" occurred or "good *** food"

To do this you could first use the "Wildcard Tagger" node to search for "good *** food" e.g. with a regex. Then filter all documents containing the tagged terms (matches). Then use the "Dictionary Tagger" to search for "good food" exactly and filter again.

Cheers, Kilian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.