Knime 3.5.x S3 file picker issue

Since Knime 3.5.x we face an issue with the S3 file picker when there are deleted prefixes in S3.

The error can be reproduced the following way: Upload file to S3, then delete the prefix.

aws s3 cp LocalFile s3://bucketname/aaa/RemoteFile
aws s3 rm s3://bucketname/aaa/
aws s3 ls s3://bucketname/aaa/

Note that the rm deletes only the prefix "aaa/", not the complete structure. (S3 is a object storage, that is, there are no directories as in Windows/Linux filesystems).

Now when it comes to the S3 file picker, as soon as "bucketname" is expanded, it gets an error:

WARN  Amazon S3 File Picker 2:2        Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: D57EEF4ED3CD8528)

The S3 file picker becomes unusable. We traced it down and found out that a HEAD Object Request is sent to AWS, the AWS API responds with a HTTP 403.

The following stacktrace is shown in the knime.log

2018-03-02 08:13:13,157 : INFO : SwingWorker-pool-4-thread-10 : S3Connection : Amazon S3 File Picker : 2:2 : Create a new AmazonS3Client in Region "eu-central-1" with connection timeout 30000 milliseconds
2018-03-02 08:13:16,955 : WARN  : SwingWorker-pool-4-thread-2 : RemoteFileChooser : Amazon S3 File Picker : 2:2 : Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: D57EEF4ED3CD8528)
2018-03-02 08:13:16,956 : DEBUG : SwingWorker-pool-4-thread-2 : RemoteFileChooser : Amazon S3 File Picker : 2:2 : Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: D57EEF4ED3CD8528)
com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: D57EEF4ED3CD8528), S3 Extended Request ID: orSlGF4OByv0GvKNX4zbUeEWuxt6ENb7EAXYWR94scIRQnsUgHqESEAovHMCld2Hmwhl9004Dv4=
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1588)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1258)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4221)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1249)
        at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1224)
        at com.amazonaws.services.s3.AmazonS3Client.doesObjectExist(AmazonS3Client.java:1284)
        at org.knime.cloud.aws.s3.filehandler.S3RemoteFile.doestBlobExist(S3RemoteFile.java:139)
        at org.knime.cloud.core.file.CloudRemoteFile.exists(CloudRemoteFile.java:216)
        at org.knime.base.filehandling.remote.files.RemoteFile.getPath(RemoteFile.java:275)
        at org.knime.base.filehandling.remote.files.RemoteFile.getFullName(RemoteFile.java:252)
        at org.knime.base.filehandling.remote.dialog.RemoteFileChooser$RemoteFileTreeNode.<init>(RemoteFileChooser.java:619)
        at org.knime.base.filehandling.remote.dialog.RemoteFileChooser$RemoteFileTreeNodeWorker.doInBackgroundWithContext(RemoteFileChooser.java:495)
        at org.knime.base.filehandling.remote.dialog.RemoteFileChooser$RemoteFileTreeNodeWorker.doInBackgroundWithContext(RemoteFileChooser.java:1)
        at org.knime.core.util.SwingWorkerWithContext.doInBackground(SwingWorkerWithContext.java:106)
        at javax.swing.SwingWorker$1.call(SwingWorker.java:295)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at javax.swing.SwingWorker.run(SwingWorker.java:334)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Hi mweber

I just followed your steps to reproduce the problem, but was not able to run reproduce it.

For me the FilePicker node was still working on the RemoteFile after calling rm on the prefix.

I also noticed that, after i run

aws s3 rm s3://bucketname/aaa/

Then when I do an ls on that bucket with

aws s3 ls s3://bucketname/

It will still return “PRE aaa/”. So it does not seem that it is actually getting deleted. Does this behave differently on your end?

Yes, “aws s3 ls” will still return a result if there are objects with that prefix, even if you delete the prefix. This is because S3 is an object storage and not like a typical file system with folders.

Maybe you need to activate S3 versioning to reproduce the issue. It can then be tracked down as follows:

# Get file meta information (this is similar to what Knime does)
$ aws s3api head-object --bucket bucketname --key aaa/
{
    "AcceptRanges": "bytes", 
    "ContentType": "binary/octet-stream", 
    "LastModified": "Tue, 30 Jan 2018 14:03:28 GMT", 
    "ContentLength": 0, 
    "VersionId": "xe.G0QpmWJnLCjrrxExj4jO3JkeFguuw", 
    "ETag": "\\"d41d8cd98f00b204e9800998ecf8427e\\"", 
    "ServerSideEncryption": "AES256", 
    "Metadata": {}
}

# delete prefix
$ aws s3 rm s3://bucketname/aaa/

# now the same will fail
$ aws s3api head-object --bucket bucketname --key aaa/
An error occurred (404) when calling the HeadObject operation: Not Found

# list still works as if nothing happend
$ aws s3 ls s3://bucketname/aaa/

# however, there is a delete marker set - this can be shown with
$ aws s3api list-object-versions --bucket bucketname

# if we remove the deletemarker, i.e. restore the prefix, then everything works again
$ aws s3api delete-object --bucket bucketname --key aaa/ --version-id x2to8glPuodbv_AoCaFAjJsY5QxDasdf

Hope that helps

Thanks for further explaining. With versioning I can also see the DeleteMarker for “aaa/”. But this still does not interfere with the FilePicker’s functionality. I am still able to create signed URLs for the file “s3://bucketname/aaa/RemoteFile”.

Does the FilePicker work for you before you remove the predicate?

The last test I’ve done was with Knime 3.5.1. As soon I remove the deletemarkers on the prefixes, the Knime S3 file picker is working. With a deletemarker (with S3 versioning) or a removed prefix (without S3 versioning), the S3 file picker fails.

By the way, this completely works with Knime 3.4.2 without any issues, but with 3.5 it fails.

So I got my hands on a KNIME 3.5.1. However I am still not able to reproduce the error locally.

Do you have further permission settings on your bucket which could lead to the “403 Forbidden” response by the AWS S3 API?

It took be the whole day, but the return code depends on the S3 permissions, obviously. If you have access rights (s3:ListBucket on the prefix seems to be sufficient), then the return code is 404. If there is no access rights, then the return code is 403.
Knime seems to handle the 404 and continues with fetching the contents. But in the case where the 403 is returned, the S3 file picker fails (including error message and stacktrace in the logfile).

Example for ListBucket policy

{
    "Statement": [
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Condition": {
                "StringLike": {
                    "s3:prefix": [
                        "aaa/*"
                    ]
                }
            },
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::bucketname"
            ]
        }
    ]
}

Hope this helps…

Hi @mweber

Sorry for taking so long. Did you want to include the log file in your last post? As another question; Did you make sure that you are in the correct Region, configured in the Amazon S3 Connector dialog?

No, I did not want to include a log file. It is the same output as in my first post. The region is the correct one: “eu-central-1”.
As I wrote, for me it seems as if Knime handles the HTTP 404 return, but not the HTTP 403.

@oole: is there any update on this? Could you reproduce the issue?

Thanks for the responses. It’s useful.

1 Like