Closest Store algorithm

I have a list of 200 stores, and want to determine the closest store for each of 1MM people. I can create lat/lon for each person and store, but am trying to figure out the most efficient way to determine the closest store for each person.

I am able to get the distance between any two points through the Geo Distance and Column Distance nodes.

The way I approached it initially was to do a cross join between the two tables, ending up with 20MM records, and just picked the min distance for each customer. That seems super inefficient. Is there a better way?

Thanks for any insights!

since i don’t have real data, i created dummy data to calculate the workflow. assuming you have calculated the distances from the customer’s location to any of your stores, here is my unoptimized approach workflow with 4 stores’ location distances.
KNIME_dist_MM.knwf (925.2 KB)

if it doesn’t meet your expectations, hope a knime expert can provide assistance.

linux knime 5.1.x

1 Like

Maybe our Geospatial nodes can help here. The Nearest Join node seems like what you need. There is also an example workflow. If you try it, please let us know if you can achieve better performance than with the cross join. Would be interesting to know.
Thanks to my colleague @tobias.koetter for the hint!

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.