Spanish Dictionary for address

elaadani · February 23, 2021, 10:38pm

Hello, good morning everyone.
I am creating a flow to normalize addresses. The addresses are in Spanish.
I would like some guidance with the following.
There are avenues that are written in different ways, such as “Avenida, AV, AVDA, AVD” For this, I made a dictionary of words, with some ways to write avenue. Then, I use POS tagger for Spanish, “Dictionary Tagger” and finally I add “Tag filter”.
How can I make it so that it can consider that if it writes something similar to write “avenida” as for example “avnida” it can be tagged as an “avenida”? Like a lemmatizer ? Like when you type in google something wrong and it says "Did you mean: … "?

elaadani · February 25, 2021, 2:19pm

anyone who can help me? thanks for reading anyway.

julian.bunzel · February 25, 2021, 2:54pm

Hey @elaadani,

having a good dictionary is a great start. A nice addition would probably be to flexibly extend this dictionary with values that are similar to the ones that are already available in the dictionary or to replace similar words.
Therefore, I would recommend to create a list of terms first by using the Unique Term Extractor and afterwards use the Similarity Search node to find the most similar word from the dictionary. Afterwards you could add the similar values to your dictionary or replace the misspelled values from the text with the correct values from the dictionary by using the Dictionary Replacer node.

I hope this helps. If there are any questions feel free to ask.

Best,
Julian

system · August 27, 2021, 2:54am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.