Replacing HTML Entities in a string using String Manipulation fails on Trademark ASCI-0153

Hi there. the ™ charecter is failing inside KNIME in the String manipulation node

I tried the following ones

replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
$column_name$,
“€”, “€”),
“Š”, “Š”),
“ ”, " "),
“®”, “®”),
“°”, “°”),
“±”, “±”),
“²”, “²”),
“µ”, “µ”),
“º”, “º”),
“¼”, “¼”),
“½”, “½”),
“Ô, “Ô),
“Ä”, “Ä”),
“Ï”, “Ï”),
“Ö”, “Ö”),
“Ø”, “Ø”),
“à”, “à”),
“á”, “á”),
“â”, “â”),
“ë”, “ë”),
“ö”, “ö”),
“ø”, “ø”),
“ü”, “ü”),
“…”, “…”),
“™”, “™”)

And got the failure


Warning

Invalid settings:
org.knime.ext.sun.nodes.script.compile.CompilationFailedException: Unable to compile expression
ERROR at line 117
Syntax error, insert “)” to complete Expression
Line : 116 “…”, “…”),
Line : 117 “™”, “™”);

OK

I tried with Unicodes as well and here the “™” failed as well?

replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
$column_name$,
“€”, “\u20AC”), // Euro
“Š”, “\u0160”), // S with Caron
“ ”, “\u00A0”), // Non-breaking space
“®”, “\u00AE”), // Registered sign
“°”, “\u00B0”), // Degree sign
“±”, “\u00B1”), // Plus-minus sign
“²”, “\u00B2”), // Superscript two
“µ”, “\u00B5”), // Micro sign
“º”, “\u00BA”), // Masculine ordinal indicator
“¼”, “\u00BC”), // One quarter
“½”, “\u00BD”), // One half
“Ô, “\u00C3”), // Latin capital letter A with tilde
“Ä”, “\u00C4”), // Latin capital letter A with diaeresis
“Ï”, “\u00CF”), // Latin capital letter I with diaeresis
“Ö”, “\u00D6”), // Latin capital letter O with diaeresis
“Ø”, “\u00D8”), // Latin capital letter O with stroke
“à”, “\u00E0”), // Latin small letter a with grave
“á”, “\u00E1”), // Latin small letter a with acute
“â”, “\u00E2”), // Latin small letter a with circumflex
“ë”, “\u00EB”), // Latin small letter e with diaeresis
“ö”, “\u00F6”), // Latin small letter o with diaeresis
“ø”, “\u00F8”), // Latin small letter o with stroke
“ü”, “\u00FC”), // Latin small letter u with diaeresis
“…”, “\u2026”), // Horizontal ellipsis
“™”, “\u2122”) // Trade mark sign

Error:


Warning

Invalid settings:
org.knime.ext.sun.nodes.script.compile.CompilationFailedException: Unable to compile expression
ERROR at line 117
Syntax error, insert “)” to complete Expression
Line : 116 “…”, “u2026”),
Line : 117 “™”, “u2122”)

Are we doing something wrong here? Or is there a limit to how many replace you can enter in - When we try with 15 it goes well but 20 it fails…

Or did KNIME forget around the Trademark text?

I tried to narrow it down to
replace(
replace(
replace(
$$CURRENTCOLUMN$$,
“ü”, “\u00FC”), // Latin small letter u with diaeresis
“…”, “\u2026”), // Horizontal ellipsis
“™”, “\u2122”) // Trade mark sign

And it fails

However this works so it seems that there is something wrong with trademark symbol :slight_smile:

replace(
replace(
$$CURRENTCOLUMN$$,
“ü”, “\u00FC”), // Latin small letter u with diaeresis
“…”, “\u2026”) // Horizontal ellipsis

Hi @C_Skjerning and welcome to the KNIME community forum,

I tried the first expression in your first post and it seems there is an extra “replace” function at the beginning. So, just by removing the first replace( from the expression, it worked fine for me.

You can check this by putting the cursor next to the last closing parenthesis in your expression (in the String Manipulation node) and see which openning parenthesis is selected:

1 Like

Hi Armin. Ok - Nice feature to check that with parantheses, however as you can see I tried to narrow it down to the last three and it fails. I also tried to remove one replace in the first and I still get this error

---------------------------
Warning
---------------------------
Invalid settings:
org.knime.ext.sun.nodes.script.compile.CompilationFailedException: Unable to compile expression

ERROR at line 66
Syntax error on token(s), misplaced construct(s)
  Line : 65    public java.lang.String internalEvaluate() throws Abort {
  Line : 66  return replace(

ERROR at line 94
Syntax error on tokens, Expression expected instead
  Line : 93  “Š”, “Š”),
  Line : 94  “ ”, " "),

ERROR at line 116
Syntax error, insert ";" to complete BlockStatements
  Line : 115  “…”, “…”),
  Line : 116  “™”, “™”);

I’m running on KNIME 5.1.2 could that be a problem?

I also tried and remove all the line breaks as I can see you did just in case and then it goes balistic :slight_smile:

replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace($Description Short$, “�”, “€”), “�”, “Š”), “ ”, " "), “®”, “®”), “°”, “°”), “±”, “±”), “²”, “²”), “µ”, “µ”), “º”, “º”), “¼”, “¼”), “½”, “½”), “Ô, “Ô), “Ä”, “Ä”), “Ï”, “Ï”), “Ö”, “Ö”), “Ø”, “Ø”), “à”, “à”), “á”, “á”), “â”, “â”), “ë”, “ë”), “ö”, “ö”), “ø”, “ø”), “ü”, “ü”), “…”, “…”), “™”, “™”)

---------------------------
Warning
---------------------------
Invalid settings:
Unable to compile expression
ERROR at line 66
Syntax error on token(s), misplaced construct(s)
  Line : 65    public java.lang.String internalEvaluate() throws Abort {
  Line : 66  return replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace(__col1, “€”, “€”), “Š”, “Š”), “ ”, " "), “®”, “®”), “°”, “°”), “±”, “±”), “²”, “²”), “µ”, “µ”), “º”, “º”), “¼”, “¼”), “½”, “½”), “Ô, “Ô), “Ä”, “Ä”), “Ï”, “Ï”), “Ö”, “Ö”), “Ø”, “Ø”), “à”, “à”), “á”, “á”), “â”, “â”), “ë”, “ë”), “ö”, “ö”), “ø”, “ø”), “ü”, “ü”), “…”, “…”), “™”, “™”);

ERROR at line 66
Syntax error on tokens, Expression expected instead
  Line : 65    public java.lang.String internalEvaluate() throws Abort {
  Line : 66  return replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace(__col1, “€”, “€”), “Š”, “Š”), “ ”, " "), “®”, “®”), “°”, “°”), “±”, “±”), “²”, “²”), “µ”, “µ”), “º”, “º”), “¼”, “¼”), “½”, “½”), “Ô, “Ô), “Ä”, “Ä”), “Ï”, “Ï”), “Ö”, “Ö”), “Ø”, “Ø”), “à”, “à”), “á”, “á”), “â”, “â”), “ë”, “ë”), “ö”, “ö”), “ø”, “ø”), “ü”, “ü”), “…”, “…”), “™”, “™”);

ERROR at line 66
Syntax error, insert ";" to complete BlockStatements
  Line : 65    public java.lang.String internalEvaluate() throws Abort {
  Line : 66  return replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace( replace(__col1, “€”, “€”), “Š”, “Š”), “ ”, " "), “®”, “®”), “°”, “°”), “±”, “±”), “²”, “²”), “µ”, “µ”), “º”, “º”), “¼”, “¼”), “½”, “½”), “Ô, “Ô), “Ä”, “Ä”), “Ï”, “Ï”), “Ö”, “Ö”), “Ø”, “Ø”), “à”, “à”), “á”, “á”), “â”, “â”), “ë”, “ë”), “ö”, “ö”), “ø”, “ø”), “ü”, “ü”), “…”, “…”), “™”, “™”);

Hi @C_Skjerning , your initial problem was not the trademark symbol. It was that you had one too many REPLACE( statements

i.e.
you effectively had

replace(
replace(
replace(
$column_name$,
"€", "\u20AC"), // Euro
"Š", "\u0160")

when you should have had

replace(
replace(
$column_name$,
"€", "\u20AC"), // Euro
"Š", "\u0160")

The compilation error was that there was no closing bracket on the outermost “replace(”. In fact the actual error is that the outermost “replace(” should not have been there.

If you count them, you’ll find you had 25 replacements but 26 “replace(” keywords.

As a side note, it would be easier to assist if when pasting code on the forum, you marked it as “preformatted text” using the </> button on the forum toolbar. That way, we can just copy and paste it instead of having to fix all the “smart quotes” before we can help :wink:

1 Like

Hi Takk - Then you have not read my comments - I have tried with only three replaces or 2 replaces and it still fails - I have also tried the suggestion regarding removing the first replace as stated - I will reedit the post and format as code as you have suggested.,…

Here is the original code verified and with correct number of replaces - It still fails

    replace(
        replace(
            replace(
                replace(
                    replace(
                        replace(
                            replace(
                                replace(
                                    replace(
                                        replace(
                                            replace(
                                                replace(
                                                    replace(
                                                        replace(
                                                            replace(
                                                                replace(
                                                                    replace(
                                                                        replace(
                                                                            replace(
                                                                                replace(
                                                                                    replace(
                                                                                        replace(
                                                                                            replace(
                                                                                                replace(
                                                                                                    replace(
                                                                                                        $column_name$,
                                                                                                    "&amp;#128;", "\u20AC"),  // Euro
                                                                                                "&amp;#138;", "\u0160"),  // S with Caron
                                                                                            "&amp;#160;", "\u00A0"),  // Non-breaking space
                                                                                        "&amp;#174;", "\u00AE"),  // Registered sign
                                                                                    "&amp;#176;", "\u00B0"),  // Degree sign
                                                                                "&amp;#177;", "\u00B1"),  // Plus-minus sign
                                                                            "&amp;#178;", "\u00B2"),  // Superscript two
                                                                        "&amp;#181;", "\u00B5"),  // Micro sign
                                                                    "&amp;#186;", "\u00BA"),  // Masculine ordinal indicator
                                                                "&amp;#188;", "\u00BC"),  // One quarter
                                                            "&amp;#189;", "\u00BD"),  // One half
                                                        "&amp;#195;", "\u00C3"),  // Latin capital letter A with tilde
                                                    "&amp;#196;", "\u00C4"),  // Latin capital letter A with diaeresis
                                                "&amp;#207;", "\u00CF"),  // Latin capital letter I with diaeresis
                                            "&amp;#214;", "\u00D6"),  // Latin capital letter O with diaeresis
                                        "&amp;#216;", "\u00D8"),  // Latin capital letter O with stroke
                                    "&amp;#224;", "\u00E0"),  // Latin small letter a with grave
                                "&amp;#225;", "\u00E1"),  // Latin small letter a with acute
                            "&amp;#226;", "\u00E2"),  // Latin small letter a with circumflex
                        "&amp;#235;", "\u00EB"),  // Latin small letter e with diaeresis
                    "&amp;#246;", "\u00F6"),  // Latin small letter o with diaeresis
                "&amp;#248;", "\u00F8"),  // Latin small letter o with stroke
            "&amp;#252;", "\u00FC"),  // Latin small letter u with diaeresis
        "&amp;#8230;", "\u2026"),  // Horizontal ellipsis
    "&amp;#8482;", "\u2122")  // Trade mark sign

Hi @C_Skjerning. I did read your comments. Have you counted? :wink: Have you tried removing the first replace ? :slight_smile:

1 Like

Yes I did…

Here is a video of using two replace and when i ingest trademark it fails

replace(
replace(
replace(
$$CURRENTCOLUMN$$,
"&amp;#252;", "\u00FC"),  // Latin small letter u with diaeresis
"&amp;#8230;", "\u2026"),  // Horizontal ellipsis
"&amp;#8482;", "\u2122")  // Trade mark sign

Thanks for posting the actual code. Ok, there is some confusion here because I thought you were posting that you get an error when you try to replace TM symbol, which is what you had in the earlier posts. That error, as has already been noted was because of the additional replace( keyword.

But when I ran the actual code that you just uploaded, no such error occurred.

So are you saying that you don’t get an error, but that the replacement simply doesn’t occur? Apologies if I have misunderstood what the actual problem is that you are posting? Am I looking at the wrong problem?

(I’ve tried watching the uploaded screencast but for whatever reason it is coming through in low resolution and I can’t really see it properly)

Hi Takb - As you can see in the Video I get a runtime error from the String Manipulator (Multi column) & also the String Manipulator single column is failing as well, when entering the escape of Trademark Unicode (or if i just at ALT+0153) into the replacer then it fails…It’s really weird :slight_smile:

So I guess it is the validation of the code given that it simply does not like, when ™ is inside???

What does the file look like? Are you guys sure those multiple replace function are the best option for this?
br

Hi Daniel we are looking into using JavaSnippet Node and read directly from XML file as well but this requires a little more skills from our team in order to understand and set up :slight_smile:

// Import necessary classes from the Apache Commons Lang library
import org.apache.commons.lang3.StringEscapeUtils;
import org.w3c.dom.Document;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import javax.xml.xpath.XPathConstants;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.stream.StreamResult;
import java.io.StringWriter;

// system variables
public class JSnippet extends AbstractJSnippet {
  // Fields for input columns
  /** Input column: "XML" */
  public Document c_XML;

  // Fields for output columns
  /** Output column: "XML" */
  public Document out_XML;

// Your custom variables:
Document inputXml = $c_XML$;  // Replace $inputXML$ with the actual XML variable from the XML Reader node
String decodedXml = StringEscapeUtils.unescapeHtml4(unescapedXml);
// expression start
    public void snippet() throws TypeException, ColumnException, Abort {
c_XMLStringWriter writer = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.transform(new DOMSource(inputXml), new StreamResult(writer));
String unescapedXml = writer.toString();

// Decode HTML entities


// Return the decoded XML
out_XML = decodedXml;

// expression end

We have not got this to work yet since we have some issues around understanding adding variables for input and output - But we are working on it. I still think it is funny that there is something wrong with the String manipulation node that should do the job :slight_smile:

1 Like

Hi @C_Skjerning , Can you just confirm what the original data looks like. You have some code trying to replace “™” and other code is using the full html syntax of “&#8482;”

I’m assuming that if you are doing html replacement, then it would be the full & syntax.

In which case in my test:

replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace(
replace($column_name$,
"&amp;#128;", "€"),
"&amp;#138;", "Š"),
"&amp;#160;", " "),
"&amp;#174;", "®"),
"&amp;#176;", "°"),
"&amp;#177;", "±"),
"&amp;#178;", "²"),
"&amp;#181;", "µ"),
"&amp;#186;", "º"),
"&amp;#188;", "¼"),
"&amp;#189;", "½"),
"&amp;#195;", "Ã"),
"&amp;#196;", "Ä"),
"&amp;#207;", "Ï"),
"&amp;#214;", "Ö"),
"&amp;#216;", "Ø"),
"&amp;#224;", "à"),
"&amp;#225;", "á"),
"&amp;#226;", "â"),
"&amp;#235;", "ë"),
"&amp;#246;", "ö"),
"&amp;#248;", "ø"),
"&amp;#252;", "ü"),
"&amp;#8230;", "…"),
"&amp;#8482;", "™")

works.

I tried re-watching your video but it simply isn’t clear enough for me to see what exact script you have there when the node throws that error. Have you tried watching it back? Is it just me? :wink:

1 Like

@C_Skjerning , taking @Daniel_Weikert 's point about this being possibly not the best option, and you mentioning possibly thinking about Java Snippet…

Would this work for you?

1 Like

@C_Skjerning this is what I tried with the standard Column Expressions nodes. It will iterate over the columns.

2 Likes

Dear @C_Skjerning,

The issue with your short version of the expression and the second post here is the commnet at the end which should be placed after a semicolon (which is also mentioned in the prompt message in your video):

replace(
replace(
replace(
$$CURRENTCOLUMN$$,
"&#252;", "\u00FC"), // Latin small letter u with diaeresis
"&#8230;", "\u2026"), // Horizontal ellipsis
"&#8482;", "\u2122"); // Trade mark sign
3 Likes