Is it possible with KNIME to take two datasets, both have latitude and longitude coordinates and filter out based on proximity of the other data set? The use case is to exclude records from data set 1 based on variable proximity (Ex: half-mile) from records on data set 2.
Ideally I’d like to reference data set #1 accounts against data set #2 occurrences and if any occurrences appear within a quarter of a mile of data set #1 then it adds to data set number #2 occurrences a column “Potential Match” and the answer is “Yes”
So the new data set would look like this:
Data set 2 - Occurrences
Name | Coordinate | Latitude | Longitude | Potential Match
Issue 1 | 30.360189 -97.67258 | 30.360189 | -97.67258 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 2 | 30.277993 -97.757407 | 30.277993 | -97.757407 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 3 | 30.345073 -97.68933 | 30.345073 | -97.68933 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 4 | 30.355573 -97.6841 | 30.355573 | -97.6841 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 5 | 39.78373 -100.445882 | 39.78373 | -100.445882 | Yes/No (Depending on the result would show “Yes” or “No”)
This was very helpful. However, when I attempt to utilize those nodes I’m getting the distance of 0 in my output in my rows when I hook up my occurrences data set to Port 0 of my Input table for Column Distance.
My account data set has 118 rows of data.
My occurrence data set has 8001 row of data.
In the example shared it appears the latitude and longitude coordinates line up equally. Is there a way of using those nodes in my particular use case?
I adapted the workflow from the thread above, used the sample data you posted, and came up with this. It uses the Cross Joiner to match up all combinations of your Issues and Accounts. (Note that this is a computationally expensive procedure and will not scale well if your dataset becomes large, so you may need to do some more sophisticated matching or filtering.)
It then calculates distances between all combinations of points, and returns a Yes/No value based on whether the point are within 0.402 km of each other.
The output table is a bit messy and could be cleaned up, but maybe this is closer to what you need?