JSON parsing

analytics_sharma · September 5, 2019, 6:32am

Hi,

In my workflow, I am reading JSON files using List Files Node and then trying to run parallel Chunk Start for all the files read from List files node, to run the parsing workflow but now before starting the parallel chunk I want to see if ,my JSON file is empty , means has no data only empty brackets the I dont want o read those parse those files, Do I need to use java snippet if yes can you tell me how?

armingrudd · September 5, 2019, 6:55am

Hi @analytics_sharma,

Is it possible for you to provide your workflow? You do not need to include your real data, just some examples are enough.

analytics_sharma · September 5, 2019, 7:25am

Hi ,
I am attaching the workflow with sample files , In this Sample15 and Sample 16 are empty files , if you see chunks will start for all the files, so in between List Files node and parallel chunk node I want to filter out sample15 and sample 16 files.KNIME_workingEg1.knwf (29.9 KB)
mysample12.json (1.7 KB)
mysample13.json (2.1 KB)
mysample14.json (2.5 KB)
mysample15.json (2 Bytes)
mysample15.json (2 Bytes)

armingrudd · September 5, 2019, 8:11am

There is no need to do that.

If you want to exclude those files, first, you need to read all files once before the loop (where the files will be read again) and check whether they are empty or not and then pass the non empty files to the loop. But since empty files have no impact on your output, I think you do not need to do that.

Is there any particular reason to do this which I cannot see?

analytics_sharma · September 5, 2019, 8:42am

Hi ,

yes, actually in some workflow if I read empty files and nonempty files I am trying to split some JSON paramater into columns because of which in the end (collected data) the last node -parallel chunk end throwing error that columns are different -
“Cell count in row “Row0_1_1_1_1_Row0_1_1_1#0_#2” is not equal to length of column names array: 77 vs. 76”

so I was thinking if we can filter the empty files before or else instead of cell spltter node I need to use Java snippet to put if else tomake the output columns fixed.

armingrudd · September 5, 2019, 8:56am

Is it possible for you to provide a workflow which produces the same error?

analytics_sharma · September 5, 2019, 9:46am

Yes, i have modified the workflow and the sample12 file in which Flgs I am splitting into columns , if you run this workflow , it will throw an error.jsonParallelFileEg1.knwf (155.4 KB)
mysample12.json (2.2 KB)
mysample13.json (2.1 KB)
mysample14.json (2.5 KB)
mysample15.json (2 Bytes)
mysample15.json (2 Bytes)
mysample12.json (1.7 KB)

armingrudd · September 5, 2019, 9:58am

The error is not related to the empty JSON files. Check the first chunk. In the JSON to Table Metanode, the Split Collection Column is producing 3 extra columns for the first chunk but the other chunks do not have these 3 columns.

The first file here has “Flags” under “Obserts” but the rest of the files do not have “Flags”.

analytics_sharma · September 5, 2019, 10:40am

Yes, the first chunk is producing three extra column but others will not produce as the files are either empty or some files do not have that Flags parameter. This is the issue , if you run only the file with Flags and empty files , this error will occur as the file has data , it is splitting into extra columns and empty file does not have data so there is no splitting of column is happening. This is the reason I want to filter empty files at the early stage or else I need to put some logic while splitting. Do you have any suggestion for this.
Correct me if my understanding is wrong.

armingrudd · September 5, 2019, 1:43pm

When you use the files with flags and the empty file, the workflow will execute but the parallel end node throws an error which does not stop the flow.

When you use the files with flags and the files without flags, the workflow will not run at all since the column counts do not match.

Now you want to remove the empty files which do not stop your flow, how about the other files without the flags which stop the flow?

What I can suggest here is to add a label to the rows coming from the file with flags. Then after the parallel chunk, read those files again and extract the flags. I think it is better than reading all the files and filtering some and read the rest of files again.

Below is your workflow which I have modified. You can still optimize it. I just wanted to give a clue.
jsonParallelFileEg1.knwf (175.1 KB)

analytics_sharma · September 6, 2019, 8:55am

Thanks for the suggestion, it solves the problem:) just one clarification, in the last joiner 37 instead of using all the columns to join we can use some 2-3 columns?

armingrudd · September 6, 2019, 11:44am

I am glad that it was helpful.

It depends what attributes are exactly the key values that unify the records. If there is one unique attribute you can use that single column instead. Since I did not know your data, I added all the columns to make sure it works.

analytics_sharma · October 3, 2019, 5:21am

Hi ,

I have got different requirement now, in line with the above discussion, we are filtering empty files and labeling the files coming but files are coming with different number of flag columns , say 1 file has 2 flags and other file has 4 flags,because of which chunks are producing different number of columns and hence the workflow is failing.getting the error column number are different. you can the same above workflow.

armingrudd · October 3, 2019, 7:57am

Hi,

Chech “Allow changing table specifications” in the configuration window of the Loop End node.

analytics_sharma · October 3, 2019, 8:42am

Hi ,

I have tried checking the “Allow changing table specifications”, but still the workflow is failing.

armingrudd · October 3, 2019, 8:47am

Which node is failing?

analytics_sharma · October 3, 2019, 9:07am

The last node ‘parallel Chunk End’ with error Cell count in Row & Row is not equal to the lenght of column 77 vs 76. Flgs is coming as null in few files. I have attached the sample file with previous exmaplemysample13.json (2.3 KB)

armingrudd · October 3, 2019, 9:45am

I could read and run the file along with the previous files without any errors.

Would you please provide a workflow with sample data to reproduce the error?

system · April 2, 2020, 9:52pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.