Hi @qqilihq,
I’d like to propose a few optimizations for the Phone Number Formatter Node:
- Some area / city codes are recognized, some are not (I got a few land line numbers in Germany I can share in private)
- Ability to not assume a Default Region
- Ability to extract the Country name / ISO Code based on the recognized country code like 0049, +49, (0049)
- Ability to identify validity of recognized country and area / city code
The primary goal would be to identify part of the data, like the country and area / city or if it’s a land or mobile number (or unknown?).
I also implemented a few optimizations to tackle poor data quality which you might consider adding too:
- Replace o by zero
- Remove HTML Characters using
&[^;]+; - Remove duplicated country codes
^(\+\d+)\s?\1to$1 - Remove duplicated country codes
^\+(\d+)\s?00\1to+$1 - Harmonize country codes (##) to +##
^\((\d{2})\)\s?to+$1 - Harmonize country codes (00##) to +##
^\(00(\d{2})\)\s?to+$1 - Harmonize country codes 00## to +##
^00(\d{2})\s?to+$1 - Fix wrong country code not starting with 0
^([^0+])to+$1
Note: I am not fully confident this is not causing false positives as some area / city codes, like in the US, might not start with a zero - Remove (0)
\s?\(0\)by - Replace [-/] by space
[-/]by - Replace multiple whitespaces by one
\s{2,}by
If you like I can send you the part of the workflow.
Best
Mike
