numbers in Indian numeral system (tag and process)

This is not an urgent question and likely it will not attract a lot of attention, but I’m sure that there are users interested in appropriate solution or an idea how to approach this task.
In NLP when we are dealing with numbers and/or words representing numbers we are not concerned as the language of math is intended to be easy to understand, but what if the text contains other numeral systems, like Indian, for example?
From time to time I need to process texts with numbers in Indian numeral system (like At Rs 5.03 lakh crore, Centre's second half borrowing in line with Budget estimates - The Economic Times for example). In this and similar cases we can’t simply “get” numbers and process them (e.g. get the exchange rate and have the value in USD), first we need to make it a common international number.
Certainly, there are rules how to “move” between different numeric systems - is not the worst example.
Also Indian system is not the most complex and far not the only widely used which is not easy to process. The same is with Chinese numeral system (luckily in Chinese media it is rarely used, but anyway).
The question is:

  • Have you ever met this challenge? If you have, probably you may share hints on how you had approached this task? I use regex extract and then tens of nodes to get “far not perfect” result.

Wish you a great and lucky day!

1 Like


Here is described how to convert between the numbers system. There is also the code in java and python…



This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.