S3 List Files Performance

The new file handling nodes are really great for working in complex environments but I am running into a performance issue when recursively listing the contents of an s3 directory when compared to the same operation using the AWS CLI.

For example, if you take 1000 files distributed randomly in say 10 folders, each of which has 2-3 sub folders, I see a 10-100x performance difference between the list files node and the aws cli and the gap gets worse as the directory tree grows.

Does anyone have any insights on why this is the case and if anything might be possible to boos the performance of this node?

Thanks!

Aaron

Hi Aaron,

Just did a measurement from my computer:
time aws s3 ls s3://****** --profile ***** --recursive 2.84s user 0.14s system 39% cpu 7.624 total
time aws s3 ls s3://****** --profile ***** --recursive 2.74s user 0.13s system 39% cpu 7.313 total
time aws s3 ls s3://****** --profile ***** --recursive 2.70s user 0.15s system 38% cpu 7.441 total

Using the List Files/Folders node the average runtime was: 11.323 ms

The bucket I used had 11.174 files and folders, the typical subfolder depth was 3-4.

Is this in line with your experience?

Norbert

1 Like

Thanks for the comparison, that is not at all our experience! Such an operation would never complete with the List Files node in our environment. I wonder what the difference could be?

Hi Aaron,

Maybe a proxy or something that makes the network latency big?
I am attaching the WF I used to measure the runtime:
aaron.knwf (13.0 KB)