Clustering geo datas by user and by date

Thyme · January 14, 2022, 8:56pm

Yes, the problem is non-trivial. I have to say though, that @andrejz’s idea of using the Haversine formula was genius. I’d have done some sketchy angular pythogoras because I didn’t knew that formula. Thanks for bringing that up, I learned something.

supersharp · January 15, 2022, 6:17am

I haven’t read the full thread, so apologies if someone has already solved for this you.

Check out this presentation from the KNIME 2019 Summit for a geo-clustering example using DBSCAN to cluster locations based on lat/long, defining locations within 50m as the ‘same’ location. Once that ‘same location’ identifier is added to your table, it becomes another field you can group on or perform ‘duplicate’ removal.

Slides (slide 16 onwards): https://www.knime.com/sites/default/files/S05_02_BGIS_KNIMEConference_Slides_Final_Approved.pdf
Video explanation (Starts at 20 mins): KNIME User Session: Financial Applications of Self-Service ETL and Geospatial Analysis - YouTube

Cheers!

aworker · January 17, 2022, 11:01am

Hi @supersharp

I quickly went through the presentation and saw (as you said) that the Lat/Lon is clustered based on DBSCAN. This is a very interesting approach although I wonder how fast it could be when working with big Lat/Lon files. DBSCAN is a density based clustering method which is a very wise option to use here although time consuming if not previously optimized.

Thanks a lot for sharing these very nice and informative presentation and work!

Best wishes,

Ael

system · January 24, 2022, 11:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.