I have a workflow which tries to identify groups of people living close to each other. I have their lat/long coordinates and I use the palladian lat/long to coordinate and geo distances nodes before feeding into DBSCAN. The trouble is, I do not know how exactly this is clustering in terms of distance! Epsilon is currently set at 2.0 because I plucked that number out of the air and it worked for what I was looking at the time, but I would really like to be able to identify people who live within, say 1km of each other. Can anyone suggest how I can work it to cluster those together by distance? Thanks!
For this, I’d rather suggest to use a different clustering mechanism which allows to define a real distance threshold instead of the DBSCAN’s epsilon. I guess you could use the following nodes for that:
Caveat - probably not as memory efficient as a DBSCAN, which is afair quite well-suited for huge datasets?!
Anyways … Good luck!
Do you know how I can assign the 1km distance with that? I tried distance threshold of “1” (whatever that means!) and they were all cluster 0. Changing that to 0.5 has clustered people together who live much further away. So I am not really sure what this distance threshold means.
Haven’t done it myself. The documentation says that it’s normalized to the maximum distance - so you can probably work backwards from that to determine the desired threshold.
Aha, normalisation. That explains it! That could complicate things as the maximum distances could well change. I will have to have a think about how best to do it. Thanks
I would assume that you can “automate” that using the flow variables? I.e. determine the maximum distance for your current dataset, based on that, calculate the desired threshold, and feed that value as variable input into the Hierarchical Cluster Assigner.
Hope that helps!
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.