Clustering with data visualization

Learthgz · May 13, 2018, 4:36pm

Hi everyone,

The format of my input file is the following:

PERSON1 BUILDING1
PERSON2 BUILDING4
PERSON3 BUILDING4
PERSON5 BUILDING3
PERSON3 BUILDING2
PERSON3 BUILDING1
PERSON5 BUILDING6
PERSON4 BUILDING6
1000 more rows like this

Each row should be read like this “the person X visited building Y”

I simply want to have clusters like this:

Cluster 1 : Persons that visited only 1 building (the same building)
Cluster 2 : Persons that visited only 2 buildings (the same buildings, let's say building 1 & 2)
Cluster 3 : Persons that visited only 2 buildings (the same buildings, let's say building 3 & 4)
Cluster 4 : Persons that visited only 3 buildings (the same buildings)
etc..

Is it possible to do it with KNIME? I tried many nodes with no success.

Thanks in advance,

morebento · May 14, 2018, 1:19am

I would use a group node to do a unique count of the buildings for each person, then different filters to select out the buildings they visited.

Martin_K · May 14, 2018, 6:52am

Hi,

See attached workflow, hope it helps !

Martin K.

Clustering Data.knwf (10.1 KB)

morebento · May 14, 2018, 7:07am

That’s a very elegant solution.

Learthgz · May 14, 2018, 10:50pm

This is amazing! Could you please briefly explain what you have done? thanks

Martin_K · May 15, 2018, 6:11am

Hi Learthgz,

Basically, there are two important “Group By” nodes.
The first one creates list of visited buildings per each person, we can call them “clusters”. The clusters might repeat. There is also “Unique count” function to evaluate number of visited buildings in each cluster.
The second node outputs dataset of unique clusters with related list of persons.
Hope my explanation is sufficient .
Best Regards !