Need to get: First - 34039900, Second - KG and Third (the one not part of the code yet) - 0,6000
Here’s the code: .(\d{8})\s[0-9\s]+(KG|L|LT|Tambor|PC)\s[0-9.]+,\d+\s.
I put (KG|L|LT|Tambor|PC) because the value of unit differs but the positioning of the data is fixed. Hope someone can help get the third one. Thank you!
Hello @gonhaddock, I’ve run through with other data and it seems the code not working for the below and not getting any results.
Data: 31081800 OEM SYNTHETIC DEO 5W30 27101932 830 6651 L 9.801,0000 13,1454 128.838,07 0, 00 0, 00 0, 00
Appreciate if you could check. Thank you!
Though - I’ve tried this code and it works! But not sure if this will be fine for all data or maybe you could suggest something to improve in this code.
Hello @trafalgarlaw
The problem here is that sometimes regex is too literal, if casuistic is wide, then is better to test it with as many samples as possible.
Said so, in the first sample; the code was using the [,] as reinforcement to find the allocation of the first numeric sequence.
In this new version there is not a comma any more (it has been removed), now the identifier is just the first sequence of eight digits after a white space character. Otherwise you can get messed up with the starting string numeric sequence; which is eight in length as well.
This code work for both examples provided till now:
Just one last question @gonhaddock, which one do you prefer me to use? Code 1 or 2? And appreciate if you could share some insights so it will help me as well in the future to assess which one is better. Thank you!
Hello @trafalgarlaw
There isn’t an absolute answer to your question, as it depends on the background of the user, and the data constrains (in this case, I don’t know any of them)
As a general rule the one that works for your use case, and accomplish the next bullets (not necessarily in same order):
It’s more efficient
It’s easier to track
It’s more robust
Sometimes robustness can require less efficiency…
Said so, the Code1 just cannot work in example 1, as the sequence starts with a 10 digits length, the . at the beginning requires a quantifier (+) making it greedy. The following \s in Code2 may look redundant (less efficient), but increases in robustness, especially from my side without seen the casuistic. And so on
Just the analysis of this this part of the code, returns me back to the first paragraph.