Ok, so I took a look again (it was late at my place as well), I found out that the different RegEx expression is actually necessary (+ the summation at the Pivot node). @Amanda252’s example has "-"
in the molecular formulas, so those are removed as well. The rest is pretty much the same (1 String Manipulation node got split into 2, to reduce the neural RAM usage).
The sample input is now also processed and the results compared against the existing values. Ael’s solution with my modifications works, but there’re 2 molecules that have wrong values in the Excel file:
C11-F14-O2
C11-H13-F13-N2-O2-S
I should mention that this does not work with duplicate values (=molecular formulas)! To remedy this, a different ID column should be used as grouping column in the Pivot node (e.g. the column “name” in the excel file).
The workings are not changed, so I won’t explain that again (see above), but I can mention the general idea behind this: The molecular formulas need to be fragmented into its parts. This is done in two steps, both times with the cell splitter. The String Manipulation is used as helper node. All of this is done to bring the data into a format that the Pivot node understands.
Now starts the tl;dr part.
Table Validator (Reference):
I like to use it to resort columns by reference and insert missing columns. Can also check data types and optionally fail. Multi-node node that’s especially handy before a Table Difference Finder.
Table Difference Finder:
Compares two tables cell by cell. Any differences it finds will be in the first output. If it doesn’t find any, the output will be empty. It’s very literal though, so the RowID and ColumnNames have to match exactly, or you’ll get lots of unmatched cells. Most of the nodes in the lower branch prepare the tables so they look exactly the same. It’s a bit of a snob.
20220215 Pikairos Count Atoms Ocurrence in Formula - modified by Thyme.knwf (239.7 KB)