Hi all,
still struggling with the output of some stupid OCR files. The output looks like this:
Document ID | weight | material type | Number of drillings | Number of bendings |
---|---|---|---|---|
ABC_1 | 1,31 | steel | 22x | 90.00º down 90.00º down |
ABC_2 | 2,35 kg | aluminium | (3x) | 85.00º down. 90.00º down 90.00º down |
ABC_3 | 3,5 KG | alu typeA | (14x | Bend Angle 90.00 |
ABC_4 | Gew.: 5,67 kg | steel Black Sea | 5 | unten 90º |
ABC_5 | …4,3 | steeeel | Diameter 8, 12x | 3x 45.00 |
The target structure should look like this:
Document ID | weight | material type | Number of drillings | Number of bendings | Number of bendings_COUNT |
---|---|---|---|---|---|
ABC_1 | 1,31 | steel | 22 | 90.00º down 90.00º down | 2 |
ABC_2 | 2,35 | aluminium | 3 | 85.00º down. 90.00º down 90.00º down | 3 |
ABC_3 | 3,5 | aluminium | 14 | Bend Angle 90.00 | 1 |
ABC_4 | 5,67 | steel | 5 | unten 90º | 1 |
ABC_5 | 4,3 | steel | 12 | 3x 45.00 | 3 |
So there are 4 transformations that I have to do:
- Extract the weight as an integer from “weight” column
- Harmonize “similar” text expressions, containing steel or aluminium as “search” words
- Extract the integer from column “number of drillings”, but ONLY if it is marked with an “x”, so make a 12 out of “Diameter 8, 12x”
- Create a new column that counts the number of “angles” in column “number of bendings” - unfortunately separated by blank, so that counting the words would result in a wrong number - so I must count “90.00 down” and “85.0 up” or only “90.00”. etc.
I already tried string manipulator node to replace “kg” with “” and then use string to number node, but I failed to use several “replace” commands in the string manipulator and also replace “Gew.:” with “”,
Hope you guys can help on that!