Yes, the problem is non-trivial. I have to say though, that @andrejz’s idea of using the Haversine formula was genius. I’d have done some sketchy angular pythogoras because I didn’t knew that formula. Thanks for bringing that up, I learned something.
I haven’t read the full thread, so apologies if someone has already solved for this you.
Check out this presentation from the KNIME 2019 Summit for a geo-clustering example using DBSCAN to cluster locations based on lat/long, defining locations within 50m as the ‘same’ location. Once that ‘same location’ identifier is added to your table, it becomes another field you can group on or perform ‘duplicate’ removal.
Slides (slide 16 onwards): https://www.knime.com/sites/default/files/S05_02_BGIS_KNIMEConference_Slides_Final_Approved.pdf
Video explanation (Starts at 20 mins): KNIME User Session: Financial Applications of Self-Service ETL and Geospatial Analysis - YouTube
Cheers!
Hi @supersharp
I quickly went through the presentation and saw (as you said) that the Lat/Lon is clustered based on DBSCAN. This is a very interesting approach although I wonder how fast it could be when working with big Lat/Lon files. DBSCAN is a density based clustering method which is a very wise option to use here although time consuming if not previously optimized.
Thanks a lot for sharing these very nice and informative presentation and work!
Best wishes,
Ael
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.