I'm looking to convert a medium sized JSON document (around 500K rows) to a table. The intention here is to produce a table that captures the JSON data just to the first level of depth, with one row for each node at the root level.
I've tried a couple of ways of doing this, each of which works for smaller JSON documents, but so far haven't found anything that works for a more significant amount of data. Many solutions that I've seen online (including on this forum), suggest making use of the JSON To Table node, however this node seems to crash with large JSON documents (the progress bar never gets past 0%, even with a reasonably large java stack allocation). As a workaround for this, I've tried using a recursive loop to recursively extract one row at a time (with JSON path), and concatenate this to a final output, however this is extremely inefficient and also fails to complete, even after several hours of runtime.
I'd really appreciate any assistance here, as what I'm trying to do doesn't feel like it should be that difficult. Other tools, like Tableau, make it possible to achieve the same thing in a few seconds. I've considered looking into the use of a java snippet, but I'm reluctant to do that unless absolutely necessary.
Using JSON to table, were you able to create multiple rows of data without using transpose loops? All I can get using that node is one row with an infinite number of columns (one per cell).
@glenerik Attached is a screenshot of an example workflow to read in JSON row by row, and output a complete table. Because this uses the Concatenate node, it can handle differences in the JSON schema from record to record. However, as mentioned, this only works for smaller JSON files.
I think the path forward is to use a single "JSON Path" that extracts the array elements and then "Ungroup" them. It turns your single JSON document (which must be GB size?) into manageable chunks. Workflow + Screenshot attached.
The core problem here seems that you have one large document and the the JSON Parser doesn't support stream reading (such as SAX for XML if that means anything to you?). To my knowledge there is no (java) library that can do that.
Hope that helps (= works). I tested it with your example, inflated to 500+k array elements (700+MB)
Thanks Bernd, this is great, very much appreciated.
The source data is only a few MB, but that size was enough to create problems using our initial approach. Suffice to say, your solution is far more efficient and effective. For the record, we’d tried ungrouping after JSON Path output but hadn’t gotten things in quite the right order. It’s going to make a real difference to us knowing how to do this from here on.