In my workflow, I am reading JSON files using List Files Node and then trying to run parallel Chunk Start for all the files read from List files node, to run the parsing workflow but now before starting the parallel chunk I want to see if ,my JSON file is empty , means has no data only empty brackets the I dont want o read those parse those files, Do I need to use java snippet if yes can you tell me how?
Hi ,
I am attaching the workflow with sample files , In this Sample15 and Sample 16 are empty files , if you see chunks will start for all the files, so in between List Files node and parallel chunk node I want to filter out sample15 and sample 16 files.KNIME_workingEg1.knwf (29.9 KB) mysample12.json (1.7 KB) mysample13.json (2.1 KB) mysample14.json (2.5 KB) mysample15.json (2 Bytes) mysample15.json (2 Bytes)
If you want to exclude those files, first, you need to read all files once before the loop (where the files will be read again) and check whether they are empty or not and then pass the non empty files to the loop. But since empty files have no impact on your output, I think you do not need to do that.
Is there any particular reason to do this which I cannot see?
yes, actually in some workflow if I read empty files and nonempty files I am trying to split some JSON paramater into columns because of which in the end (collected data) the last node -parallel chunk end throwing error that columns are different -
âCell count in row âRow0_1_1_1_1_Row0_1_1_1#0_#2â is not equal to length of column names array: 77 vs. 76â
so I was thinking if we can filter the empty files before or else instead of cell spltter node I need to use Java snippet to put if else tomake the output columns fixed.
The error is not related to the empty JSON files. Check the first chunk. In the JSON to Table Metanode, the Split Collection Column is producing 3 extra columns for the first chunk but the other chunks do not have these 3 columns.
The first file here has âFlagsâ under âObsertsâ but the rest of the files do not have âFlagsâ.
Yes, the first chunk is producing three extra column but others will not produce as the files are either empty or some files do not have that Flags parameter. This is the issue , if you run only the file with Flags and empty files , this error will occur as the file has data , it is splitting into extra columns and empty file does not have data so there is no splitting of column is happening. This is the reason I want to filter empty files at the early stage or else I need to put some logic while splitting. Do you have any suggestion for this.
Correct me if my understanding is wrong.
When you use the files with flags and the empty file, the workflow will execute but the parallel end node throws an error which does not stop the flow.
When you use the files with flags and the files without flags, the workflow will not run at all since the column counts do not match.
Now you want to remove the empty files which do not stop your flow, how about the other files without the flags which stop the flow?
What I can suggest here is to add a label to the rows coming from the file with flags. Then after the parallel chunk, read those files again and extract the flags. I think it is better than reading all the files and filtering some and read the rest of files again.
Below is your workflow which I have modified. You can still optimize it. I just wanted to give a clue. jsonParallelFileEg1.knwf (175.1 KB)
Thanks for the suggestion, it solves the problem:) just one clarification, in the last joiner 37 instead of using all the columns to join we can use some 2-3 columns?
It depends what attributes are exactly the key values that unify the records. If there is one unique attribute you can use that single column instead. Since I did not know your data, I added all the columns to make sure it works.
I have got different requirement now, in line with the above discussion, we are filtering empty files and labeling the files coming but files are coming with different number of flag columns , say 1 file has 2 flags and other file has 4 flags,because of which chunks are producing different number of columns and hence the workflow is failing.getting the error column number are different. you can the same above workflow.
The last node âparallel Chunk Endâ with error Cell count in Row & Row is not equal to the lenght of column 77 vs 76. Flgs is coming as null in few files. I have attached the sample file with previous exmaplemysample13.json (2.3 KB)