Weird Character Appearing

Hi, can someone help check this out for me, as to why my Excel Reader Node produces this weird character —> |’

It only appears on the last column, only on certain rows, at the end of the string:

The original Excel file doesn’t have those:
DataToCheck.xlsx (84.2 KB)

Thanks in advance!

@badger101 maybe you can upload a complete workflow where these characters appear. They would not in my sample workflow with your data

kn_forum_47511_excel_strange_characters.knwf (185.0 KB)

Here you go @mlauber71
Data2checkworkflow.knwf (63.1 KB)

As I mentioned, it only appears on certain rows of the last column. The portion on top as you showed wasn’t affected. You might wanna scroll down :grin:

Additional screenshot:

@badger101 On my Mac with KNIME 4.6.2 I do see no such characters. Could you give a complete workflow that would also include the file or give my example a try and see if the characters also show up there. And then save the characters in a KNIME table. Would they also appear there or is it just the display? Or maybe add an indication which lines are affected.

edit: OK I can see them in my Excel. Very strange … if I edit the cell they disappear … So they are there before the import …

Yeah it’s strange right. Here’s the context: I copy pasted the strings from webpages. Would it likely to happen because of the copying process also took in additional hidden characters? What I did when copying was I chose only ‘Paste values’ , avoiding the original format. I also tried ‘Clear format’ in Excel to try to get rid of those.

@badger101 there indeed is a hidden character there

image

1 Like

Do you know the right script to remove those? I tried

replace($$CURRENTCOLUMN$$,“[U+200E]”,“”) and I also tried with backward slash \ in front of the “[” and “]” but found no success.

And how do you check for the hidden characters in the first place, please?

Here’s a workflow to show that it’s not a display-only issue:

Data2checkworkflow2.knwf (155.9 KB)

The duplicates were not acknowledged in the Duplicate Node:

@badger101 a very strange case indeed. It you want to spot such things best copy them into an advanced editor like notepad ++ or visual studio code or something.

You might have to escape the unicode string in the RegEx syntax with KNIME.

regexReplace($$CURRENTCOLUMN$$, "[\\u200E]" , "")

Another approach can be to eliminate all strange characters and only leave the ones you explicitly allow (there was a debate about this before):

regexReplace($$CURRENTCOLUMN$$, "[^\\w\\d\\s.,!@#$%&*()=+~-]","" )

image

3 Likes

@mlauber71 Thank you, I just needed the [\\u200E], it works! :grin: Thanks for solving this.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.