JSON Missing data issue

HI

I am having JSON files which often contain missing values and when I put it as a list the missing values get assigned to the wring row and the data goes all wrong.

I have looked at the forum and I am not able to figure it out. I have attached the sample flow. Can someone help?

Thanks

test data.knwf (2.8 MB)

Hi,

could you point me to the actual problem in your workflow? As I’m not familiar with the dataset it’s hard to follow.

Andreas

2 Likes

HI Andreas, I have attached the image which shows the mismatch. I have also pasted the data folder which has the json file.

@ace2131 you will have to carefully examine which element is on which ‘level’ of the JSON file. It seems the main body is again split in two parts. And then you have sub-elements that are cascaded.

2 Likes

@ace2131 maybe you take a look at this. First you will have to skip the initial entry since it seems to have a different structure than the rest.

Then you will have to extract the sub-information like video and author as separate JSON structures and give them their own paths. In the individual values ‘hearts‘ needs to be a Long integer (for example).

Then in the end you will have to ungroup the values:

1 Like

Thanks mlauber71. I tried your flow but I am still seeing the mismatch. refer the screenshot

hi actionandi, there is a mismatch in the data coming through the json files due to missing values. screenshots above and the data attached shows the problem

@ace2131 maybe you can provide us with some ID where to find the data. Following screenshots is a challenge. I will try to locate the entry. Most likely it has to do with the data and not with some problem in KNIME.

1 Like

@ace2131 something is off I agree. Not yet sure what it is. Question would be if you could boil it down to the case where the JSON goes wrong. I thought I had the configuration right but something is off (either in the data or in the workings of the node).

@ace2131 I tried another approach where I iterated over the single blocks inside the JSON and not do a massive ungroup which might lead to problems in the IDs. Here the JSON path is created by a Flow Variable and will isolate one entry:

Inside the loop you will have to extract the sub-JSON structures about authors. Also rename the sub-columns to identify to which part they belong:

In the end you collect the results and now you can see where the information is. You will have to see if some sub-entries have more than one line (video subtitles) and how to handle that.

1 Like

@mlauber71 this works. I had to modify how the row id were being generated but barring that it worked. Thanks

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.