Identifying strongly clustered data points

I am have a data set of events that I would like to identify similar events when I record a new item - the initial simple data set of a URN, easting, northing and date Eventually it will have more info on event type and more time data (precise time, day of week etc).

My aim is that when a new item(s) is added the full dataset is analysed to throw up the any similar events with a score to show how similar this is. I would then use the score to sort and filter so only the most similar are brought out - I do not just want a set of clusters based on the best fit across the whole data.

Is anyone able to point me the right way on this beyond some simple maths of trigonometry/date difference which would work but does not seem to be clever enough for me…

Some example data:

Ref Easting Nothing Date
J9T7C7 627523 257885 23/06/2018
L3P3L3 632793 264367 30/06/2018
J5X1G9 602185 238969 28/06/2018
N9D3Q4 612422 246446 25/06/2018
O8L0L5 612184 253255 28/06/2018
Y5K5J0 601229 256356 21/06/2018
A3E0V7 631644 235853 26/06/2018
Z1Z0Z3 632080 259475 30/06/2018
R4E1Y8 628048 237484 20/06/2018
L6Z3V1 615038 263243 30/06/2018
E1V4Y5 622783 246729 29/06/2018
J7J0X5 636660 249387 21/06/2018
T5Z6E5 612761 242451 27/06/2018
K6X6M6 602560 251602 21/06/2018
R4K2X6 625430 240024 30/06/2018
W7B3C8 596605 264910 21/06/2018
D8O5T4 604807 257592 28/06/2018
R7B9J0 633304 256405 26/06/2018
V8W1R3 609156 258966 30/06/2018
E0J7W2 620461 253935 26/06/2018
S3J9G1 601836 257057 22/06/2018
V4E3B8 632926 244902 24/06/2018

How about distance matrix + similarity search? It depends a bit on how you want to define similarity in terms of the date field but from what I understand custom distance functions are possible in KNIME.

1 Like

Thanks - this was the approach I have started to look at. The easting/northings are effective here since they are just plainer grid-references and I am going to start on this aspect and see how well this works in terms of the strength of similarities.

Any tips on getting this right?

I don’t know much about geospatial data but I would send a pointer to the Hierarchical clustering nodes which can be useful and have a hand js view ( Hierarchical Cluster Assigner (JavaScript)).

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.