Hebrew messing up JSON

I receive data from clients which may sometimes be in a different languages. We have a few Israeli clients which provides some of the data in Hebrew (read from right to left instead of left to right) this is messing up my JSON format as shown below:
image
Any idea how I can stop this from happening?

Hi @tiaandp , some languages require at least UTF-8 (you may need UTF-16 in some cases), so make sure your encoding is set to at least UTF-8.

If you are still struggling, you can share some data so we can take a look at it.

Hi @bruno29a. Where can I set the encoding in knime? I tried to set it under preferences > spelling but it just says that the spelling service is not installed?

Hi @tiaandp , this has nothing to do with Spelling.

What nodes are you using? Can I see how your workflow is to see the source of your data?

@bruno29a The workflow is very big, so It will be hard to share. I source my data from excel files containing the data. I have attached a small workflow as an example. I don’t mind the columns being read backwards in knime but the JSON is important to be in normal JSON format as my workflow is an API. If the format is backwards it will break the code receiving the response from my workflow.
HebrewJson.knwf (11.0 KB)

Hi @tiaandp , this example will not help as the column name is already “corrupted”. I need to see how you are reading the source.

I understand that you can’t share the workflow, but can you at least show me a screenshot of it where you are reading the source data? The corruption of the data, be it the column header, is probably happening at that point.

@bruno29a This is an example of the raw data as I receive it:
image
It is already “corrupted” in the excel file.
The problem is that I can’t edit the files as the file is a URL link to the API(my knime flow) to download the file from S3.

Hi @tiaandp , I was trying to create a sample data, and so I went to get some text in Hebrew via google translate.

It appears that the text you are showing is not corrupted, but rather it’s actually Hebrew. Take a one of the cell and translate via google translate to verify.

So, the text is just in Hebrew. I’m not sure what is the behaviour you are expecting. Is it breaking your JSON?

Hi @bruno29a. Thanks for your reply. Yes it is breaking the JSON format. When using the table to json it causes the JSON to be read from right to left instead of left to right as in the images below:
image

This incorrectly formatted JSON breaks our API.

Hi @tiaandp , from what I’ve read online, Hebrew is read from right to left. I don’t know if that’s what’s causing this behaviour…

We’ll need someone from the Knime team to confirm this.

Tagging @ScottF , @Iris from the Knime team.

Hi @tiaandp , was just trying to clarify for the Knime team, and found something interesting.

This is ok:

However, if the column name is in Hebrew, the conversion to JSON for variable/value is reversed:

(Knime team, you can try it from @tiaandp 's HebrewJson.knwf file in post #5).

Now, in all honesty, I’ve tried to manually write how we expected the results to be, I have not been able to. Whenever I tried to write "<The column name in Hebrew>" : "3", the text gets reversed to what we see in Knime, be it in Notepad, Notepad++, Textpad - you can give it a try @tiaandp

It looks like that’s how it’s expected to be written, but it is indeed an issue for determining which one is variable and which one is value.

However, it seems like it’s not an issue @tiaandp . If you convert back the JSON to table, the “3” and “6” are detected as values to column “יתרה קובעת לתשלום”:
image

It’s not Knime’s doing, based on what happened in the other editors. and I think that’s the expected behaviour for Hebrew texts, because it reads from right to left.

Here’s the workflow converting back:
HebrewJson - Bruno.knwf (10.9 KB)

So, it’s not breaking anything in the end. The system understands that “3” and “6” are the values, and that “יתרה קובעת לתשלום” is the variable. It just that it’s displayed in the way that it’s supposed to be read in Hebrew, which you should not care from the point of view of the system that’s doing the processing.

EDIT: Another trick that I wanted to check was to try to access the variables “יתרה קובעת לתשלום”, or “3” or “6” via JSON Path, and actually it shows you how it really is in the JSON Path’s JSON Preview:

This reinforces the fact that the system does see “יתרה קובעת לתשלום” as variable, and “3” and “6” as values.

2 Likes

Thanks so much @bruno29a. It is very weird. We will test a bit and I’ll post here if we find the solution.

No problem @tiaandp , I look forward to see what you find, but for me it looks to be a non-issue, it’s an expected behaviour, and it does not break the JSON. It is only a display thing.

@bruno29a Yeah, it doesn’t break the JSON on knime, but we had trouble when the JSON strings were received via a get request to the knime server from a Java application.

I see @tiaandp . I’m would have imagined that it should also not break via the Java application…

Since it’s not an issue in Knime, can’t you retrieve the JSON strings directly from Knime? There’s a Get Request node.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.