Unable to convert larger JSON documents to table

andrew.dun · August 7, 2017, 7:47am

Hi All,

I'm looking to convert a medium sized JSON document (around 500K rows) to a table. The intention here is to produce a table that captures the JSON data just to the first level of depth, with one row for each node at the root level.

I've tried a couple of ways of doing this, each of which works for smaller JSON documents, but so far haven't found anything that works for a more significant amount of data. Many solutions that I've seen online (including on this forum), suggest making use of the JSON To Table node, however this node seems to crash with large JSON documents (the progress bar never gets past 0%, even with a reasonably large java stack allocation). As a workaround for this, I've tried using a recursive loop to recursively extract one row at a time (with JSON path), and concatenate this to a final output, however this is extremely inefficient and also fails to complete, even after several hours of runtime.

I'd really appreciate any assistance here, as what I'm trying to do doesn't feel like it should be that difficult. Other tools, like Tableau, make it possible to achieve the same thing in a few seconds. I've considered looking into the use of a java snippet, but I'm reluctant to do that unless absolutely necessary.

Any advice would be appreciated.

Thanks,

Andrew.

glenerik · August 8, 2017, 7:56pm

+1.

Using JSON to table, were you able to create multiple rows of data without using transpose loops? All I can get using that node is one row with an infinite number of columns (one per cell).

thor · August 8, 2017, 8:55pm

It would help if you posted an example of the input and the expected output.

glenerik · August 8, 2017, 9:07pm

Hi Thor, here is an example of the data structure, I want this to yield 3 rows with Date and Amount as the field names:

[ { "date":1439006400, "amount":50 }, { "date":1439035200, "amount":51 }, { "date":1439049600, "amount":49 } ]

andrew.dun · August 11, 2017, 1:36am

@glenerik Attached is a screenshot of an example workflow to read in JSON row by row, and output a complete table. Because this uses the Concatenate node, it can handle differences in the JSON schema from record to record. However, as mentioned, this only works for smaller JSON files.

andrew.dun · August 11, 2017, 1:37am

As per the above

jsonToTable.PNG

geoffb · August 11, 2017, 2:09am

Hi Andrew,

Any chance you could upload the workflow as well? You can strip it over any data.

andrew.dun · August 11, 2017, 3:07am

Hi Geoffb, as attached. Haven't included data, but this should be able to process any (small) JSON file into a crosstab.

JSON_to_table 1.05.zip

andrew.dun · August 14, 2017, 3:15am

Attached is a small sample of JSON data that can be used to demo the above workflow.

andrew.dun · August 14, 2017, 3:17am

Attachment take 2

staffs_small_desens.zip

wiswedel · August 16, 2017, 12:53pm

I think the path forward is to use a single "JSON Path" that extracts the array elements and then "Ungroup" them. It turns your single JSON document (which must be GB size?) into manageable chunks. Workflow + Screenshot attached.

The core problem here seems that you have one large document and the the JSON Parser doesn't support stream reading (such as SAX for XML if that means anything to you?). To my knowledge there is no (java) library that can do that.

Hope that helps (= works). I tested it with your example, inflated to 500+k array elements (700+MB)

- Bernd

andrew.dun · August 17, 2017, 8:09am

Thanks Bernd, this is great, very much appreciated.

The source data is only a few MB, but that size was enough to create problems using our initial approach. Suffice to say, your solution is far more efficient and effective. For the record, we’d tried ungrouping after JSON Path output but hadn’t gotten things in quite the right order. It’s going to make a real difference to us knowing how to do this from here on.

Thanks again

- Andrew.