I have a large data set with a number of chemical formulas in a column of such data set. Some have square brackets along with charges (+ and -) that I want to remove. For example (totally made up formulas) it may look something like this:
Are you sure that you want to mantain numbers outside square brackets?
How can you will distiguish which numbers were inside and which ones were outside the brackets?
If you just want the content in the square brackets, then the Regex Extractor node (part of the Palladian node collection) will work if you use the expression: [A-Z]\w+
Are you sure you don’t want to do a proper chemical operation on these structure representations - perhaps using RDKit or the Infocom wrapped ChemAxon libraries?
Yes- I don’t need the charges. Key to this is the mol structure (on another column). Normally I’d keep them but there’s a bit of software that gets upset with the charges in the formulae!