Clustering with data visualization

Hi everyone,

The format of my input file is the following:

PERSON1 BUILDING1
PERSON2 BUILDING4
PERSON3 BUILDING4
PERSON5 BUILDING3
PERSON3 BUILDING2
PERSON3 BUILDING1
PERSON5 BUILDING6
PERSON4 BUILDING6
1000 more rows like this

Each row should be read like this “the person X visited building Y”

I simply want to have clusters like this:

Cluster 1 : Persons that visited only 1 building (the same building)
Cluster 2 : Persons that visited only 2 buildings (the same buildings, let's say building 1 & 2)
Cluster 3 : Persons that visited only 2 buildings (the same buildings, let's say building 3 & 4)
Cluster 4 : Persons that visited only 3 buildings (the same buildings)
etc..

Is it possible to do it with KNIME? I tried many nodes with no success.

Thanks in advance,

I would use a group node to do a unique count of the buildings for each person, then different filters to select out the buildings they visited.

Hi,

See attached workflow, hope it helps !

Martin K.

Clustering Data.knwf (10.1 KB)

3 Likes

That’s a very elegant solution.

This is amazing! Could you please briefly explain what you have done? thanks

Hi Learthgz,

Basically, there are two important “Group By” nodes.
The first one creates list of visited buildings per each person, we can call them “clusters”. The clusters might repeat. There is also “Unique count” function to evaluate number of visited buildings in each cluster.
The second node outputs dataset of unique clusters with related list of persons.
Hope my explanation is sufficient :slight_smile: .
Best Regards !

1 Like