String Cleaner: Factor in other string presentations such as HTML

Hi,

certain HTML Encoded strings like " ", a non-breakable-space, make it difficult to parse data. In the below sample workflow the String to XML node fails respectively creates a missing value.

Cell in row:"Row0" and column "XML" could not be parsed: The entity "nbsp" was referenced, but not declared. Add missing value.

Having the String Cleaner also remove the these characters in other representations such as unicode or HTML (hex, decimal or named) would be nice.

Best
Mike

Hi @mwiegand,

In XML, one can reference entities by enclosing the entity name in & and ; like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE example-doc [
    <!ENTITY example-reference "armin">
]>
<root>
    <intro>my name is &example-reference;</intro>
</root>

To fix the issue in the string (HTML), you can use the String Replacer node to 1) replace &nbsp; with a space character or 2) enclose &nbsp; in a CDATA section or 3) replace & with &amp;

Here I have an example how to replace the representation of the HTML symbol entities: