I sent a GET call that returns a bunch of data in csv format in the returned body. I found a way to parse it but it’s quite slow and memory inefficient. I’ve included the workflow and data for your inspection. The trouble is the first transpose node which is particularly slow and my guess causes the issue. I’d appreciate any insight or suggestions you may have to make it much better. The example is 50K rows but the full return is in the millions and grows over time.
Thanks in advance for your help!
BLOB_PARSE.knwf (3.4 MB)
I ran your workflow with a Timer node just to see what we’re working with.
As configured, your Cell Splitter node produces 61,821 columns, and the Transpose node converts this to 61,821 rows. The Cell Splitter took 1,460 ms and the Transpose node took 217,594 ms.
I compared this with my usual approach, which is to have the Cell Splitter node to output a list, then use an Ungroup node to get rows. The Cell Splitter node took 45 ms and the Ungroup node took 29 ms.
You can check it out here:
Thanks so much for taking the time to look through this example and share your knowledge. I have used the Ungroup node when parsing other JSON outputs but for some reason didn’t consider it because I’m not really sure what a large binary object is or how to handle them and this was the best I could do by trial and error.
Again thank you!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.