I want to feel the pulse of the community a bit. The REST nodes (GET, POST,…) are getting more and more important with APIs becoming prevalent.
I have been trying to avoid them however whenever possible because they make things slow especially if the endpoint is such that it’s 1 call per row and you have thousands of rows. I just assumed that was the price you had to pay using APIs.
I never realized until now that the slowness seems to be entirely due to these nodes and not the endpoints or networks. In the exact same workflow calling the same endpoint on the same network I can reach order(s) of magnitude faster responses (and hence also workflow execution) when replacing the REST node with a Python node that uses aiohttp (async) to make the the REST calls.
This is most extreme with 1 call per row (~100x speed-up observed), less extreme when calls can be sent batch-wise (~3x speed up). Even rate-limited APIs greatly benefit. One case has 1000 request per minute and the GET node simply does never reach that limit even with high concurrency setting. It seems the concurrency setting has little or no effect from my observations, it’s always slow.
Of course hand-crafting each call with aiohttp is not really a sustainable solution hence I want to feel the pulse of the community, if you observed the same thing and if we can therefore push KNIME to fix the REST nodes ASAP (=ideally as a bug fix in next minor release).
Or maybe there is some community input why it could be slow, some KNIME settings affecting REST node performance.
Attached is a minimal example which calls a free mock API that returns some random company names. With the “Timer Info” you can see that the python node with aiohttp runs about 10x faster, at least for me. (python env needs aiohttp installed)
REST Node Performance.knwf (11.4 KB)
Interesting to know if others make the same observation that aiohttp is much faster?