Hello everyone,
I am looking for a method to standardize an address
So, I have a field with several information such the first name, last name, address, city, postal code…
The idea is to identify each part and isolate the information in a new field.
For example, I will have a column with the first name, one with the last name, a column with the street number, one with the name of the street, one with the city, one with the Zip code…
The problem is that thoose informations are in a different order. I was thinking about using regular expressions but it seems to be difficult to find a pattern wich work all the times.
I would like to be able to use machine learning techniques, for example by creating an algorithm that could identify each piece of information, based on data that has already been clean. Perhaps with a multitude of data, the algorithm will be able to identify the name, the city …
Unfortunately I don’t know how the machine learning algorithms work in this case, but it’s something I’d like to learn how to use.
So if you can help me move forward on this project, I would be very grateful to you.
Standardization_data.xlsx (12.2 KB)
I am attaching an example file to show you the expected result.
Thank you