Hi, I’m trying to read this geojson file from a kaggle dataset here:
link to the dataset
The file is called cbg.geojson. I have tried to read it with json reader node and I have also tried by using the streaming execution option.
But the result is still the same. Knime freeze and and I need to force closing.
The machine has 16GB ram.
I just want to read it and convert it in a table/csv format.
Does anyone have an idea of how to solve it?
Thanks in advance.
Yup, looks like parsing that 3 GB json requires more than 16 GB of memory (On a sidenote, I tried with 28 GB and it still wasn’t enough). I suppose the JSON Reader node in its current state is not meant to parse files of that size. As a workaround, you could read the file line-by-line using a Line Reader node and then do the parsing of json objects manually. I’ve attached a workflow that should do that legwork, but it won’t exactly be fast. It can probably be heavily optimized.
json_row_by_row.knwf (18.3 KB)
thank you for this interesting workaround.
It requires several hours for complete the process, but it seems working well.
I have just tested it with a limited amount of rows.