“Statistics of document-wise co-occurrence may be collected in two different ways. In the first case, fww0=fw0w is simply the number of documents that contain both w and w0. Alternatively, we may want to treat each instance of w0 in a document that contains an instance of w to be a co-occurrence event. Therefore if w0 appears three times in a document that contains two instances of w, the former method counts it as one co-occurrence, while the latter as six co-occurrences.”
So far, I have counted the nr. of docs where two terms co-occur, but now I want to count the co-occurrence events per doc as part of a doc-relevance metric. The Term Co-Occurrence Counter node, however, counts only two co-occurrences in the example above just as if w0 appeared only twice, which is a problematic result as shown next.
Imagine that we have a doc describing the mechanism by which a chemical compound could induce a disease. The name of the disease might appear just a couple of times, whereas the compound might be referred to in multiple occasions as its effect on various bodily tissues is described. The node’s underrated co-occurrence count of disease and compound would, in this case, not be representative of their relationship’s prominent role in the doc.