Data Generator

I've been looking at the data generator and I've become a bit confused. I'm not clear on the meaning behind Parallel Universes and the influence of dimensionality, cluster number, etc.

Any clarification would be helpful.


Hi Kirk,

I'm surprised that no one has asked any earlier :-) The Data Generator is in the standard release by accident (at least a bit). It was released as part of 1.0; at this point we didn't really have a policy as to what we should bring to the public and since then it has survived (mostly because we refrain from removing nodes from the repository because IF someone used it then his old workflow wouldn't completely load after an upgrade).

Anyway: The motivation behind the node is best described in the paper Fuzzy Clustering in Parallel Universes. The nodes generates Gaussian distributed clusters in different universes (descriptor spaces, i.e. here a set of columns - if you switch to the "DataColumnProperties" tab of the outport view, you'll see which ones belong together).

There are currently no nodes in the standard release that consume the notion of Parallel Universes (but there will be soon). Most people in the group use it to generate cluster in one universe only, that is they enter simply "2" in both text fields (generates two clusters in a two-dimensional space). Try it and attach a scatter plot to see the outcome.


I'm pleased to see the Data Generator node available. I've often had to resort to my own (read crappy) code to generate test data.

Just to check that I'm clear on the concept, you can generate data for multiple "universes" which all have an independent dimensionality. For any particular universe, there are a specified number of clusters. The dimensions of one universe are uniformly distributed for any other universe and therefore there is no clustering between universes. In other words, if I have 2 clusters in Universe 1, those clusters do not exist in Universe 2, and vice versa.


Perfect. I'm tempted to copy your summarization into the node's description. :wink:

The data generation process is a bit strange but it nicely demonstrates pros and cons of some algorithms.

Thanks again

You are welcome to use my description, or any modification thereof, if you like. I would be careful, though, as it may result in user confusion. 8^)

Thanks again!