Term Frequency (TF): applications to individual documents and set of documents

Hi,

 

Is it possible to apply Term Frequency (TF) to individual documents and to set of documents? How to?

 

(a) By 'individual documents' I consider that TF counts words in each document, being same word could be found in the output column 'term' (e.g. if the word exist in two different documents it will appear twice).

 

(b) By 'set of documents' I consider that TF counts words in whole set of documents, being that each word would be found just once in the output column 'term'.

 

For instance, in the attached workflow which contains 3 documents, Term Frequency (TF) works for 'individual documents' (a), since the word 'institutional' appears 3 times in the column 'term'.

 

I would like to find a way that 'institutional' appears just once, being the frequency calculated considering the set of documents (b).

 

Many thanks in advance,

Cadu

 

The best I can suggest is to subsequently use a groupby node in which you group by Term, and aggregate by sum or average of the TF frequency, depending on what you want.

simon

Yes, that is right. (a) Use the TF node to compute the frequency of a term in a single document (use absolute count in dialog). (b) To compute the frequency of a term in the complete corpus simply aggregate all tf values of a term, by using the Group By node. To compute the frequency of terms in subsets of the corpus, append a column containing the label of the subset and use the Group By node on term column + subset label column.

 

Cheers, Kilian

Hello good day! It's my first time to visit here but I'm so grateful and thankful because I've found so much of interesting articles in your website. I'll definitely recommend this site to my friends. Thank you and Godbless... buy wooden watches

What a great inspiring article you've got here, this is really impressive. The content was very informative and detailed. I'll definitely be back here for updates... buy high retention youtube views