Cluster-Analysis with KNIME

Hi, we have to do a group project in class with KNIME.

We have to analyze a dataset. Now we want to create a Cluster Analysis to identify different personas in this dataset.

Can anyone help us how we can do this with KNIME? Do you have any tips? Which Node is best?

Our Dataset is in string.

Thank you for every hint you have!!

Cady

Hi Cady,

KNIME offers a bunch of clustering algorithms (see the image in attachment).

In the nodeguide you can find a couple of examples: https://www.knime.org/nodeguide/analytics/clustering

These examples are also available in the EXAMPLES Server (knime://EXAMPLES/04_Analytics/03_Clustering). The type of clustering algorithm that could be used, really depends on the data that you have.

You said that your data is string. Maybe, you may find interesting the following example showing document clustering: 

https://tech.knime.org/document-clustering-example

Hope that helps,

Best,

Vincenzo

 

Thanks Vincenzo. Good example.

I am dealing with a similar issue, but the problem is that our dataset containg about 10 millons rows with string type fields ( name + last name ; adress ; gender ).

We intend to cluster this dataset in order to detect duplicates ( same person wrotten with errors, not normalized format, etc )

For example, we want to detect that following both registers probablly are the same person clustering them into the same cluster:

Santiago Gonzalez; Road Street 1241 NY
Santiago Gonsales; Road St. 1241 NY

Is it correct trying to use analysis cluster for this purpose ? In the example posted before, knime will need generate a distance matrix of 10 millons x 10 millones, i think is not possible this solution for this volume of data set .. do u have any suggestion ?

Thanks in advance, 

Regards

Clustering for duplicates sounds as suboptimal solution. Look at KNIME Similarity Search examples.

Hi All, looking for some guidance… does knime support the neural gas clustering method? Is it called something else in Knime?