Data writing to sql database with 1 character changed to questions mark

davidbaker · January 15, 2025, 9:52pm

Hi folks,

I am seeing some very peculiar behavior that I can’t wrap my head around. I’m inserting some data into a SQL database using the DB Insert node. There is a single string value in my table that gets written incorrectly every time I try. For some reason, it converts the string “Aparture_mm” to “Apartur?_mm”.

This is what it looks like before inserting:

And this is what it looks like after:

Any ideas?

mlauber71 · January 15, 2025, 10:00pm

@davidbaker can this be due to different encodings? Knime does use UTF-8. What does the database expect.

What kind of database is this. Does this happen to other strings too?

Can you provide a sample of the data in its original form?

davidbaker · January 15, 2025, 10:17pm

Hi @mlauber71 , good question. It’s a MS SQL database (2019, case insensitive if that matters).

I’ve not noticed it happening with any other strings.

I found that if I change the value to something slightly different, it inserts without any issue, so I am just going to accept this workaround and move on for now.

Thanks for your reply.

takbb · January 16, 2025, 12:36pm

Hi @davidbaker, my guess is similar to @mlauber71 's.

In the attached workflow I have a table containing two rows that I have manually keyed in:

Although the data appears to be the same in both rows, it isn’t. Row 2 contains a regular “e”, but row 1 contains a “Cyrillic e” (item 2 on the below list from my chatGPT conversation)

If it is an “encoding” or “character set” thing, then it is possible that this may appear as unrecognised in SQL Server.

I’ve used a couple of my components to help identify where the “e” characters differ:

This workflow generates the ascii and unicode codes for each character. It may be of assistance if you feed in the offending row in you actual data.

Here you can see that the codes for the “e” character transposed from Row0 are outside of the regular western character set range.

Investigate Characters in data.knwf (113.1 KB)

Of course, if it is this, I’m afraid I don’t have any clues about how it came to be in your data!

davidbaker · January 16, 2025, 3:48pm

Wow, this community is amazing. Thank you both (@mlauber71 & @takbb ) for taking the time to investigate, and provide such detailed replies. I have no doubt that your assumptions are correct, and it is an issue with the encoding (something that I know pretty much nothing about).

Unfortunately I have edited the dataset, to eliminate the offending record, and now I can’t reproduce it. So I can’t confirm it with the components you’ve provided.

Thanks again!

Neha_Kakkar · January 17, 2025, 6:07am

Looks like the issue is with character encoding. Some special character in the string might be causing it to change to a question mark. Since you’ve already fixed it by editing the dataset, it might not happen again. But if it does, you can use the tools shared here to check the character’s encoding.

MartinDDDD · January 18, 2025, 11:20am

@Neha_Kakkar wow I never knew this could happen - can you explain again how the tools like Correlation Filter or Range Slider Widget can help address this?

Neha_Kakkar · January 20, 2025, 5:43am

Hi @MartinDDDD, I think you misunderstood. I was talking about the tools shared by @takbb, like ‘Cell Character Splitter’ and ‘Character ASCII and Unicode.’ These tools can break the string into characters and show their encoding. They help find any unusual characters causing the issue.

MartinDDDD · January 20, 2025, 7:21am

My apologies - I think I mixed up the topic.

How can I use Cell Character Splitter and Character Unicode/ASCII Nodes? I cannot find them when searching for them.

Neha_Kakkar · January 21, 2025, 6:38am

No problem. The ‘Cell Character Splitter’ and ‘Character ASCII and Unicode’ might not be built-in KNIME nodes. They could be custom nodes or part of an extension. You can check the KNIME Hub or forums to find them.

Felipereis50 · January 21, 2025, 9:47am

Could you explain how the adjustment would work?

I understand that there is a string, and for some reason, the letter “e” causes an issue.

Should the correction be made within the database? In other words, in the database column where the information will be stored (by altering the column type)?

MartinDDDD · January 23, 2025, 7:21am

@Neha_Kakkar without being disrespectful - allow me to ask this question - please pick from the three option a - c below:

a) you are a Gen-AI powered bot
b) you are a Gen-AI assisted human
c) none of the above - are you kidding? What makes you think any of that?

Please answer honestly with a, b, or c.

haozi04 · January 23, 2025, 7:44am

Off-topic: now GenAIs are really becoming cyber-garbage generators. You can easily find misleading information on the web. Hilariously, GenAIs are continuously being fed with the garbage they’ve produced.

Disclaimer: not relevant to the reply until he/she make a choice of a or b😉

To be fair, I think the previous reply pointed to the workflows in this reply:

system · April 23, 2025, 7:45am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.