I’m developing a workflow to process the JSON formatted data from the NVD database. This is not a particularly nice set of data. Anyway, I’m seeing very poor performance in this workflow. For example, viewing the results of a Column Filter where the result is 7640 Rows with 5 Columns (output from the Isolate Description leg) takes 15 seconds to display after showing 'Loading port content…". Scrolling through the display just does not work. The memory policy for the node is ‘Keep all in memory’. I have about 28GB free on a 32GB Win7 box with 24GB assigned to KNIME. The JSON to Table nodes are also running very very slowly.
I’ve attachNvdJsonInput.knwf (278.6 KB)
ed the workflow. The input data is the Gzipped 2018 JSON dataset.
P.S. I know the Unpivoting node needs some rework.
P.P.S the Affects leg is very much a ‘Work in Progress’.
Update: Writing the 7640*5 Description table to CSV takes about 30 minutes. Loading the CSV and processing in a new workflow is basically instantaneous.
24 GB RAM for parsing a 44 MB JSON sounds for sure like overkill
I’d consider the “JSON to Table” as a convenience functionality – you’ll get a nice table without any manual configuration, but high performance is surely not the focus (I’m not aware of the implementation details, but it’s quite obvious that it’ll need at least 2x data passes to build up the table spec. in advance, and the more nested the structure is, the more complicated your table will become, regarding the number of columns).
So, as a first step if you’re after execution performance, I’d try replacing the “JSON to Table” node by custom configured “JSON Path” nodes.
Cannot give any clear advice about the viewing performance – I’m used to that when viewing tables with large JSON or XML cells.
Thanks for that. I’m looking at alternatives and JSON Path is certainly one of them. JSON to Table allows me to get a better understanding of the data so I can specify the paths required.
I’m still confused by the Viewing and CSV Writing performance when all the columns involved are just plain string values that have been pulled out of the JSON structures.
The solution was to use a “JSON Path” node followed by a “Split Collection Column” node. This gives vastly improved performance. That said, the “JSON to Table” node does perform reasonably on simple cases.
Great to hear. Thanks for your feedback!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.