Address Cleaning in Knime

Hello All,

Need help in correcting the messy addresses received from clients and get the nearest correct Geocodes for those addresses. I am attaching the file for your reference. Please help me with the process or python code to correct the addresses in Knime or Python.

Thank you in advance for your support.

Regards,
Samir Nagure

Address_Correction - Knime.xlsx (11.3 KB)

Hello Samir,

I’ll try to give you a hint. If you need more precise information, I can help after work.

As for address cleaning, you can follow this workflow address Data Standardization - #5 by bruno29a. I recommend using Here API, you have more free requests and it’s more precise. The API call should automatically try to translate the input (messy) into an ordered output. If it’s not working, we can think about another way to do it.

As for computing the distance between each address and its nearest geocode, follow these steps. Firsts, take a point belonging to a geocode (eg the centre). Then, using the Cross Joiner node, calculate each possible combination between each address and each geocode. Third, refer to this post calculate distance between two latitude longitude points - #3 by danielesser to calculate the distance between each address and each geofence. Sort results so that for each address, you get the geofence with the smallest distance first (Sorter node). At last, use a GroupBy node to select the first result for each geocode (Group: addresses, Manual Aggregation: geocode, show first).

Hope it helps. Let me know if you need further information. :slight_smile:

RB

2 Likes

Hi RB,

Thank you so much for your response. Actually I am very new to Knime and I tried following above workflow, however I am not getting the output from that. If you could help me with the address correction part, then I will manage to get the geo coding for those addresses from python. I have a python code to extract geocodes.
Please let me know when your shift time gets complete, so we can connect if its okay with you.

My email id is sameer.nagure@gmail.com

Regards,
Samir

Hello Samir,

I should be free at 18:00 UTC+1. Ok for you?

RB

It will be 23:30 IST. But it is okay for me as its an urgent requirement for us.

Thank you RB!!

Regards,
Samir

Hi @lelloba @Samir_Nagure , just to make sure, the geo distances referred in the thread is not driving distance, but rather based on the Haversine formula, which is the “straight” line distance.

If you can’t add extensions, I have a component that does this calculation:

2 Likes

If you want the driving distance to be the real one, you can use a copy of the above loop to call another API service called routing (still referring to Here). The output is driving time and distance.

RB

Hi Bruno and RB, what I am looking for is to split the street address and take the correct one which matches with Google database automatically and fetch the geocode for that address.

Example Address: No 34,1st main 5th cross Kaverylayout nagarbhavi main road moodalapaly,No.9/17, Nagarabhavi Main Road, Govindaraja Nagar, Amarajyothi Nagar, Vijayanagar, 560040, Bangalore, Karnataka, India

In the above address “nagarbhavi main road” is mentioned twice, however it should consider only once and fetch geo code for “nagarbhavi main road” from google API database.

I have thousands of address as same as above and I want to automate it for my client. Can we do that? Can we correct the spellings in Knime? Can we standardize/normalize the addresses in Knime?

If we correct the street address then we will be able to fetch the geocodes for them, I guess.

Hope you understand my query.

Regards,
Samir

@Samir_Nagure what I can add at this time is this collection about KNIME and some additional ressources for adress handling.

For your specific case this Python package libpostal (1|2) might be of interest. Would have to see if we can set up an example:

For all things using external services please keep the remarks in this thread in mind concering data privacy and (over-)use of external services.

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.