Reading 20M+ JSON Files

Hello,

I asked this question a while ago but never found a good solution. I have around 20M JSON files on a fast Hard Drive (7883.6 MB/s). I need to read them into knime to parse them. I have been unable to find a fast way to do this.

I tried using:
List Files -Table Row to Variable Loop Start - JSON Reader
List File - Turn path into local url - read via localhost with GET Request
As previously suggested in Reading 2M local JSON files - Performance Question
I also tried the Load Text Files, but the UI keeps crashing

The best option is still via a local web server, but then the Windows Server becomes the bottleneck.

Any ideas?

Thank you

Hi @nxfxcom

Did you try the solution suggested by @AlexanderFillbrunn in your other post ?

I guess this is the way to go. Did it work for you ? If not, where did KNIME couldn’t solve it? at the reading of the JSON ? At the transformation from string to JSON ? Somewhere else ?

Could you post here an example of your JSON files to have a go to it ? It would definitely help us to help you :wink:

Best

Ael

1 Like

Hello,

the challenge with the Load Text Files node is that the Files have to be selected via the UI vs a Data port. Selecting a couple of million files via the UI is always leading to a crash.

Hi @nxfxcom

I understand better now. How big is every JSON file ? Did you try to concatenate them all in a single text file beforehand ? This would prevent the problem of UI crashing and once the concatenated file is uploaded using the -Load Text Files- node, you could split the JSON into different cells of a single column Table to then convert them into JSON using the -String to JSON- node as suggested by @AlexanderFillbrunn. Would this be a solution ?

If you could upload here one of your JSON files (if they do not contain confidential information) , I would implement a workflow example on how to deal with this.

Hope it helps.

Best

Ael

1 Like

Hi,
Maybe the Tika Parser node could be used to read the files? Are they all in one folder?
Kind regards,
Alexander

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.