The new file handling nodes are really great for working in complex environments but I am running into a performance issue when recursively listing the contents of an s3 directory when compared to the same operation using the AWS CLI.
For example, if you take 1000 files distributed randomly in say 10 folders, each of which has 2-3 sub folders, I see a 10-100x performance difference between the list files node and the aws cli and the gap gets worse as the directory tree grows.
Does anyone have any insights on why this is the case and if anything might be possible to boos the performance of this node?
Just did a measurement from my computer: time aws s3 ls s3://****** --profile ***** --recursive 2.84s user 0.14s system 39% cpu 7.624 total time aws s3 ls s3://****** --profile ***** --recursive 2.74s user 0.13s system 39% cpu 7.313 total time aws s3 ls s3://****** --profile ***** --recursive 2.70s user 0.15s system 38% cpu 7.441 total
Using the List Files/Folders node the average runtime was: 11.323 ms
The bucket I used had 11.174 files and folders, the typical subfolder depth was 3-4.
Thanks for the comparison, that is not at all our experience! Such an operation would never complete with the List Files node in our environment. I wonder what the difference could be?