Is it possible with KNIME to take two datasets, both have latitude and longitude coordinates and filter out based on proximity of the other data set? The use case is to exclude records from data set 1 based on variable proximity (Ex: half-mile) from records on data set 2.
Thank you for sharing that link. I reviewed that discussion and that scenario is based on a single specific location. My scenario is based on two different data sets with multiple variables.
Here is an example of the two data sets that may be of help:
Ideally I’d like to reference data set #1 accounts against data set #2 occurrences and if any occurrences appear within a quarter of a mile of data set #1 then it adds to data set number #2 occurrences a column “Potential Match” and the answer is “Yes”
So the new data set would look like this:
Data set 2 - Occurrences
Name | Coordinate | Latitude | Longitude | Potential Match
Issue 1 | 30.360189 -97.67258 | 30.360189 | -97.67258 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 2 | 30.277993 -97.757407 | 30.277993 | -97.757407 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 3 | 30.345073 -97.68933 | 30.345073 | -97.68933 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 4 | 30.355573 -97.6841 | 30.355573 | -97.6841 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 5 | 39.78373 -100.445882 | 39.78373 | -100.445882 | Yes/No (Depending on the result would show “Yes” or “No”)
This was very helpful. However, when I attempt to utilize those nodes I’m getting the distance of 0 in my output in my rows when I hook up my occurrences data set to Port 0 of my Input table for Column Distance.
My account data set has 118 rows of data.
My occurrence data set has 8001 row of data.
In the example shared it appears the latitude and longitude coordinates line up equally. Is there a way of using those nodes in my particular use case?
I adapted the workflow from the thread above, used the sample data you posted, and came up with this. It uses the Cross Joiner to match up all combinations of your Issues and Accounts. (Note that this is a computationally expensive procedure and will not scale well if your dataset becomes large, so you may need to do some more sophisticated matching or filtering.)
It then calculates distances between all combinations of points, and returns a Yes/No value based on whether the point are within 0.402 km of each other.
The output table is a bit messy and could be cleaned up, but maybe this is closer to what you need?