I want to generate co-occurrence data for terms in a large collection of documents, but I'm not sure that the term co-occurrecnce counter can provide what I need.
What I want is a complete co-occurrence matrix (or pivoted equivalent). I want to be able to index the co-occurrence of every term against every other. In matrix form, this would mean a full square (symmetrical) matrix. In pivoted form (which is actually what I want), every term would be listed n-1 times (n being the number of unique terms).
The term co-occurrence counter node does something along these lines but as far as I can tell, it does not package the results in a way that be can readily coverted to the complete format. The distance matrix calculate node creates just the format I want, but I do not want to calculate distances, I want to record co-occurrence.
Is there some way I can do this? I don't care whether my input data needs to be in the form of documents or a table of counts, as I can easily convert between the two. Or perhaps the solution is to manipulate the co-occurrence counter outputs in some way... but so far I can only picture very complicated solutions along these lines. I've also seen references to the Statisitcs node producing an occurrences table, but I don't know how to make it provide the co-occurrence information that I am after.
Any help would be appreciated!