S3 List Files Performance

Aaron_Hart · May 17, 2022, 12:37pm

The new file handling nodes are really great for working in complex environments but I am running into a performance issue when recursively listing the contents of an s3 directory when compared to the same operation using the AWS CLI.

For example, if you take 1000 files distributed randomly in say 10 folders, each of which has 2-3 sub folders, I see a 10-100x performance difference between the list files node and the aws cli and the gap gets worse as the directory tree grows.

Does anyone have any insights on why this is the case and if anything might be possible to boos the performance of this node?

Thanks!

Aaron

nsas · May 25, 2022, 7:47pm

Hi Aaron,

Just did a measurement from my computer:
time aws s3 ls s3://****** --profile ***** --recursive 2.84s user 0.14s system 39% cpu 7.624 total
time aws s3 ls s3://****** --profile ***** --recursive 2.74s user 0.13s system 39% cpu 7.313 total
time aws s3 ls s3://****** --profile ***** --recursive 2.70s user 0.15s system 38% cpu 7.441 total

Using the List Files/Folders node the average runtime was: 11.323 ms

The bucket I used had 11.174 files and folders, the typical subfolder depth was 3-4.

Is this in line with your experience?

Norbert

Aaron_Hart · June 16, 2022, 7:49am

Thanks for the comparison, that is not at all our experience! Such an operation would never complete with the List Files node in our environment. I wonder what the difference could be?

nsas · June 17, 2022, 8:24am

Hi Aaron,

Maybe a proxy or something that makes the network latency big?
I am attaching the WF I used to measure the runtime:
aaron.knwf (13.0 KB)

system · September 15, 2022, 8:25am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.