Handle special characters in string value columnss

Hi Team,

Below 2 values visually looks same but knime analytics platform doesnt consider it as similar values.

For example, If i add a duplicate row filter for the below values. i get the both the values in the output.

260_0250 temp_µc_(aurix)v
260_0250 temp_μc
(aurix)_v

After debugging it further, i understood that “µ” character is causing this issue.

Please suggest, how to filter duplicates for such values?

Please find the screenshot below, The above values are changed as soon as this post is saved.

Hi @rajvenkatesh_k,

Welcome to the forum.

The second value appears to have an underscore _ as the second to last character before the “v” so that would make them different. Is that a typo?

1 Like

Hi @takbb

Please find the screenshot below.The values seem to change as soon as the post is saved.

Thanks,
Raj

Ah… yes I saw you’d edited it just after I posted… re the changing values when posting to the forum, if it does this, it’s because it has “special meaning” to the forum s/w, so on a browser on PC, you can highlight that text and click the “preformatted text” button
image

2 Likes

Hi @takbb

Below are values after using preformatted text option and highlighting.

**260_0250 temp_µc_(aurix)_v**
**260_0250 temp_μc_(aurix)_v**

Thanks,
Raj

@rajvenkatesh_k would you be able to upload a small sample of workflow that contains the problematic rows, or alternatively output those rows to an xlsx and upload it.

[edit: actually no need to upload, your last post is sufficient, thanks- if I copy and paste those into a Table Creator, I also see the duplication issue… interesting… ;-)]

2 Likes

hi @rajvenkatesh_k , what is the origin of the two different pieces of text.

I pasted each here:

and as you can see they are different unicode characters

one is
U+03BC : GREEK SMALL LETTER MU
the other is
U+00B5 : MICRO SIGN

which explains why KNIME says they are not the same. I think you need to check on how the values are being sourced.

The attached is a possible way to resolve the specific issue by string replacing the problematic character (and also demonstrates the issue for anybody else who might want to suggest options) but ideally I’d say the data needs to be “fixed” at source.

String replace μ unicode characters.knwf (10.2 KB)

Do you have different character encodings maybe on different data sources? I’m not particularly expert on unicode and encodings, so maybe somebody else has other ideas…

6 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.