Hello,
I am trying to parse a JSON api to a table with JSON Path. Regarding this workflow, the output for json parse has 13 million records and in Alteryx its taking 7 mins to run the workflow successfully. But, in knime only the JSON Path node is taking hours to run. I have attached a snapshot where 1st JSON Path is reading the lists and the second JSON Path is reading the paths from an api and further steps are followed to parse the values into the table.
Please provide any resolution or other method for this scenario
Hi @Anjum_Taj -
Thanks for this info. Could you provide some more specifics, like version of KNIME and OS you are using?
Also, if you could provide a small version of the dataset for testing (assuming it’s not confidential), along with your uploaded workflow in progress, that would be very helpful.
In addition to what Scott wrote.
What input node are the JSON Path nodes connected to?
If you’re reading the JSON from a filesystem, you could try the JSONPath option of the JSON Reader, which should improve the performance if you’re interested only in parts. It can even already ungroup via $[*]
in case your input json is an array of objects.
For a 4.5GB 13mio records sample JSON I generated (using that script), I can see 250MB/s read speeds on my local Mac SSD, so it seems it’s quite fast at reading.
I am also curious about the Column Filter between the JSON Path and Ungroup nodes. What’s their configuration? The JSON Path nodes can already remove the source JSON column (which would speed up their output writing) and any other filters should be pushed as “upstream” as possible to reduce the number of columns.
Hello,
Knime version is 5.4.2 and OS is Windows 11 version 23H2. The file is the confidential. Could you please send me a sample workflow. I can test it from my end and let you know the update…