get the frequency of the words according to the date

shaaban · December 11, 2013, 11:26pm

I have to questions :=

1- I have a table containing two columns ... date of the post and the post ... and I want to get the words (terms) frequency tf according to the date .... in other words, I want to get the word frequency on each day .... could the knime do this ...

2- some times when I filter some documents with Punctuation Erasure node... I note that there are still some terms contain punctutions such as (I'm) or ("design) or ('s) ... how can I remove these

kilian.thiel · December 12, 2013, 9:01am

Hi Shaaban,

1.) Yes, KNIME can do this. Group the data by the date column using the GroupBy node and concatenate the strings (posts). Don't forget to add a separator character (e.g. whitespace) when concatenating the strings. Then create one document for each date containing the corresponding concatenated strings. Then do preprocessing (e.g. filtering) if necessary and finally use the TF node to count term frequencies.

2.) The punctuation erasure node does not filter characters like ' since they belong to the word and do not indicate e.g. the end of a sentence. To eliminate these charcater use the Replacer node. In the dialog of the node a regex can be specified together with a replacement string. The matching substrings will be replaced.

Cheers, Kilian

shaaban · December 13, 2013, 12:20am

Thank you for the very usefull reply ... but in the second question ... how I can remove any non word characters like (-/.,') using regex together with a replacement string.

shaaban · December 13, 2013, 3:01pm

ok thank you there is no problem now .... every think is working now

bruce31511 · April 2, 2014, 7:52pm

Can this be applied to search for a phrase? In your example could I count the number of times "good food" occurred or "good *** food"

kilian.thiel · April 3, 2014, 10:57am

To do this you could first use the "Wildcard Tagger" node to search for "good *** food" e.g. with a regex. Then filter all documents containing the tagged terms (matches). Then use the "Dictionary Tagger" to search for "good food" exactly and filter again.

Cheers, Kilian

system · June 2, 2023, 9:50pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.