Filter Records based on geo coordinates

Greetings,

Is it possible with KNIME to take two datasets, both have latitude and longitude coordinates and filter out based on proximity of the other data set? The use case is to exclude records from data set 1 based on variable proximity (Ex: half-mile) from records on data set 2.

Example Data set 1:

Name | Date | Latitude | Longitute

Example Data set 2:

Name | Date | Latitude | Longitute

Thank you and look forward to your reply.

Signed,
John

Hi @jhandatx,

This topic may be a help:

To be more helpful, I need a sample dataset as input and your desired output.

:blush:

Greetings Armin,

Thank you for sharing that link. I reviewed that discussion and that scenario is based on a single specific location. My scenario is based on two different data sets with multiple variables.

Here is an example of the two data sets that may be of help:

Data set 2 - Occurrences:

Name | Coordinate | Latitude | Longitude

Issue 1 | 30.360189 -97.67258 | 30.360189 | -97.67258
Issue 2 | 30.277993 -97.757407 | 30.277993 | -97.757407
Issue 3 | 30.345073 -97.68933 | 30.345073 | -97.68933
Issue 4 | 30.355573 -97.6841 | 30.355573 | -97.6841
Issue 5 | 39.78373 -100.445882 | 39.78373 | -100.445882

Data set 1 - Accounts

Name | Latitude | Longitude

Account 1 | 30.32238177 | -97.68083156
Account 2 | 30.2304008 | -97.72977552
Account 3 | 30.32545194 | -97.73798132
Account 4 | 30.29467933 | -97.62999101
Account 5 | 30.45092881 | -97.78260329

Ideally I’d like to reference data set #1 accounts against data set #2 occurrences and if any occurrences appear within a quarter of a mile of data set #1 then it adds to data set number #2 occurrences a column “Potential Match” and the answer is “Yes”

So the new data set would look like this:

Data set 2 - Occurrences

Name | Coordinate | Latitude | Longitude | Potential Match

Issue 1 | 30.360189 -97.67258 | 30.360189 | -97.67258 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 2 | 30.277993 -97.757407 | 30.277993 | -97.757407 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 3 | 30.345073 -97.68933 | 30.345073 | -97.68933 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 4 | 30.355573 -97.6841 | 30.355573 | -97.6841 | Yes/No (Depending on the result would show “Yes” or “No”)
Issue 5 | 39.78373 -100.445882 | 39.78373 | -100.445882 | Yes/No (Depending on the result would show “Yes” or “No”)

Let me know if I can provide further background.

Thank you, and look forward to your reply.

Signed,
John

Hi @jhandatx -

You might also find this thread helpful:

2 Likes

Greetings Scott,

This was very helpful. However, when I attempt to utilize those nodes I’m getting the distance of 0 in my output in my rows when I hook up my occurrences data set to Port 0 of my Input table for Column Distance.

My account data set has 118 rows of data.

My occurrence data set has 8001 row of data.

In the example shared it appears the latitude and longitude coordinates line up equally. Is there a way of using those nodes in my particular use case?

Thank you and look forward to your reply.

Signed,
John

Hi @jhandatx -

I adapted the workflow from the thread above, used the sample data you posted, and came up with this. It uses the Cross Joiner to match up all combinations of your Issues and Accounts. (Note that this is a computationally expensive procedure and will not scale well if your dataset becomes large, so you may need to do some more sophisticated matching or filtering.)

It then calculates distances between all combinations of points, and returns a Yes/No value based on whether the point are within 0.402 km of each other.

The output table is a bit messy and could be cleaned up, but maybe this is closer to what you need?

2019-09-03%2015_32_42-KNIME%20Analytics%20Platform

LatLonDistancewithGeoNodes.knwf (19.6 KB)

3 Likes

Greetings Scott,

That was very helpful and solved my challenge. Thank you :grinning:

Signed,
John

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.