I asked this question a while ago but never found a good solution. I have around 20M JSON files on a fast Hard Drive (7883.6 MB/s). I need to read them into knime to parse them. I have been unable to find a fast way to do this.
I tried using:
List Files -Table Row to Variable Loop Start - JSON Reader
List File - Turn path into local url - read via localhost with GET Request
As previously suggested in Reading 2M local JSON files - Performance Question
I also tried the Load Text Files, but the UI keeps crashing
The best option is still via a local web server, but then the Windows Server becomes the bottleneck.
Did you try the solution suggested by @AlexanderFillbrunn in your other post ?
I guess this is the way to go. Did it work for you ? If not, where did KNIME couldn’t solve it? at the reading of the JSON ? At the transformation from string to JSON ? Somewhere else ?
Could you post here an example of your JSON files to have a go to it ? It would definitely help us to help you
the challenge with the Load Text Files node is that the Files have to be selected via the UI vs a Data port. Selecting a couple of million files via the UI is always leading to a crash.
I understand better now. How big is every JSON file ? Did you try to concatenate them all in a single text file beforehand ? This would prevent the problem of UI crashing and once the concatenated file is uploaded using the -Load Text Files- node, you could split the JSON into different cells of a single column Table to then convert them into JSON using the -String to JSON- node as suggested by @AlexanderFillbrunn. Would this be a solution ?
If you could upload here one of your JSON files (if they do not contain confidential information) , I would implement a workflow example on how to deal with this.