I'm having trouble understanding the nGram creator node (output table = nGram frequencies):
Obviously, document frequency explains "in how many documents the respective nGram occurs" and sentence frequency vice versa. Accordingly, the corpus frequency "should" explain how many times the nGram occurs in the "corpus". However, which corpus does this refer to? Some predefined corpus like the brown corpus? Or is the corpus equal to the whole "bag of words".
the term frequency (TF) is the frequency of a term in a document. The inverse document frequency (IDF) is the inverse number of documents (of a corpus) that contain a certain term. See: http://en.wikipedia.org/wiki/Tf%E2%80%93idf for details.