I try to find the best way to identify facilities on a questionnaire based on an existing database.
The questionnaire includes the following fields: Facility name, Parent company, Country, City
As the questionnaire is an Excel file completed manually, these fields might not be perfectly filled and don’t correspond perfectly to what is available in the database.
- the facility name and the parent company can be inverted,
- the city field can contain the whole address,
- there can be empty fields
- some facilities can be new and need to be included in our database,
- the value in the questionnaire is not exactly written as in our database. For example I could have “ChemComp Health and Care co., inc.” for the facility name in the questionnaire and “ChemComp” in the database)
To do that, after mainy tries, I’ve built the following workflow, mainly base on the component Fuzzy Address Matching
The problem is that it is still not very efficient. I’ve tried various strategies, but there is still work to achieve a decent efficiency.
I still have other strategies in mind, like to clean the location fields with an official database of countries and cities, or to split the facility name into words and perform the research on only one word or a group of words. But that seems quite complex.
To summarize, I need to find the facility ID in the database, and to identify new actors.
Do you have an idea of a better strategy to apply?
Identification_Test.knwf (276.8 KB)