Amazon S3 Connection(legacy) node error

ShinagdeS · September 11, 2021, 3:03pm

Hi Team,
I am using Amazon S3 Connection(legacy) node in my workflow. usually, it works fine but sometimes it gives this error.
WARNING: Unable to execute HTTP request

This error is resolved after I restart my workflow manually.
Is there are any way where the Amazon S3 Connection node tries to reconnect automatically in case of the above error and I don’t need to restart the workflow manually.

Thank you.
Srinivas

ShinagdeS · September 13, 2021, 4:51pm

Hi Team,
Any throughs on this

ShinagdeS · September 13, 2021, 6:53pm

Hi @ipazin do you have any answer to my question.

Thank,
Srinivas

ipazin · September 14, 2021, 8:51am

Hello @ShinagdeS,

to my knowledge there is no option for this node to re-execute in case of any error. But you can try approach from here with Try/Catch nodes:

Additionally I suggest to try new (non legacy) Amazon S3 Connector node. Although this seems as network issue you might have more luck with it.

Br,
Ivan

sascha.wolke · September 14, 2021, 11:26am

Hi @ShinagdeS,

as ipazin already mentioned, there is a new Amazon S3 connector node that uses the new File Handling and new S3 client version behind the scene. Does the new node fix the problem?

Cheers,
Sascha

ShinagdeS · September 14, 2021, 4:16pm

Hi, @ipazin and @sascha.wolke thank you for replying.

So basically, I am trying to upload a file in AWS s3.

With the new (non-legacy) AWS s3 connector node, I am not able to link nodes, I think there is a difference in the version of nodes.

Also, I was not able to find the new upload node in knime nodes hub.

Thank you
Srinivas

ScottF · September 14, 2021, 7:17pm

Hi @ShinagdeS -

With the new Amazon S3 nodes, you want to use the Transfer Files node, making sure to click the three dots to enable the dynamic ports, where you can add a source file system connection.

Maybe this blog post about the new file handling system helps clarify things for you? In particular, the list of old nodes vs new nodes:

ShinagdeS · September 15, 2021, 5:14pm

Hi @ScottF , @ipazin , @sascha.wolke, thank you for all your help.

I have modified my workflow to use the new knime nodes, hopefully the error which I was getting
Unable to execute HTTP request: s3.us-west-2.amazonaws.com will go away.

This is how my workflow looks now,

Thank you again!
Srinivas

bruno29a · September 16, 2021, 1:40am

Hi @ShinagdeS , you should probably also link your Merge Variables node to the Amazon Redshift Connector. That way, the connection will be established only after the Merge Variables is executed.

As it is right now, your Redshift connection is established when the workflow starts. That connection then remains idle until the first DB SQL Executor runs, which runs only after the Merge Variables is run. Before all this can happen though, there is a CSV Writer that has to happen, and I’m guessing there’s a bunch of processing that needs to happen before writing to CSV, then the Transfer Files to S3 needs to happen, so there is a risk that your Redshift connection could expire before you get to the DB SQL Executor.

Similarly, I would connect the CSV Writer to the Amazon Authentication or the Amazon S3 Connector rather than to the Transfer Files. Again, the S3 connection will be establish when the workflow starts, but will only transfer to S3 once the CSV Writer is completed. It’s always best to establish your connection right before you need it, that way you have less risk of connection getting expired, and Knime does not notify you if a connection is expired. The operations that are dependent on the connection will just fail if the connection is expired.

sascha.wolke · September 16, 2021, 10:54am

Hi @ShinagdeS,

there is a new CSV Writer node that replaces the deprecated one and supports an optional file system input port from the S3 connector (click on the three dots like you did it on the transfer node). This way you don’t need the transfer node anymore. The connections are created as required in the new file handling system by the individual nodes.

Cheers
Sascha

ShinagdeS · September 16, 2021, 4:16pm

@sascha.wolke the new csv Writer node is great!
Thank you.

ShinagdeS · September 18, 2021, 1:40pm

Hi @sascha.wolke , @ScottF , @ipazin

I have modified my workflow to use new CSV Writer node to write files to s3 and then move to redshift.

When I execute the workflow manually it works fine, but when I schedule the workflow it is giving me this error -
CSV Writer 837:4359 - WARNING: No connection available. Execute the connector node first.

Amazon S3 Connector 837:2789 - ERROR: Execute failed:
software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.

Are there any settings in any connector node which I am missing ?

Thanks,
Srinivas

sascha.wolke · September 20, 2021, 12:54pm

Hi @ShinagdeS,

I guess you try to schedule the workflow on a KNIME server? Is this a problem that occurs from time to time or every time you run the workflow on the Server? Sounds like a network/DNS problem. If this happens every time, make sure the server has a working network and DNS connection.

Cheers
Sascha

ShinagdeS · September 22, 2021, 4:32pm

Hi @sascha.wolke @ScottF @ipazin -

I am now using the new nodes in my workflow but still, I am getting network connections issues. I need to re-execute it manually every time to work.

So to avoid that, I have to use the try-catch node to keep executing until there is no network connection-related error.

This is how my workflow looks like -

Thanks,
Srinivas

bruno29a · September 22, 2021, 6:35pm

Hi @ShinagdeS , this is a good alternative.

Don’t go too “crazy” about linking the nodes

For example, you don’t need to link the Amazon Authentication with the Amazon S3 Connector with the variable port. They’re already connected with the AWS Connection port (the blue square).

Also, you can simply link the CSV Writer to the Amazon Redshift Connector only. Your bottleneck before establishing the Amazon Redshift connection is the CSV Writer.

Regarding the alternative with the loop, you may need to add a couple of things:

Add a Wait node inside the loop (either at the beginning or at the end) to create some delays between retries so you don’t abuse Amazon. This will also depend on how long is the timeout set to. If you’ve already waited 30 seconds because of the timeout, then it should be ok to retry right away.
Add a maximum number of retries - this is a fail safe that’s common practice where you don’t want to be stuck in an infinite loop where you can’t strop retrying because your connections keep failing. You have to set a maximum number of retries that you feel like at that point, you need to give it a rest and it’s not going to work and either retry some other time (may be different time has different traffic) or that something needs to be fixed before retrying.

sascha.wolke · September 27, 2021, 11:13am

Hi @ShinagdeS,

great workaround with the try-catch nodes, but this should actually not be required and we should discover the problem. Can you explain a little bit how your setup looks like?

What means re-execute manually? Does this mean running the workflow local on your Desktop instead of running the workflow on the KNIME Server?
Does the workflow always fail on the KNIME Server?
What KNIME version are you using?
What KNIME Server version are you running?

Cheers,
Sascha

ShinagdeS · September 27, 2021, 12:03pm

Hi @sascha.wolke ,
My setup is, I am using a call local workflow node to call different workflows one by one, which is working fine. Once the call local workflow node is executed I upload the status of the workflow(failed or successful) to s3 and from s3 to redshift.

Now while uploading the status of the different workflows to s3, I used to use legacy nodes which were giving me some network issue, so I started using the new nodes which is also giving connections issues.

Re-execute means if Amazon s3 connector node fails it should re-execute the node. Yes, If the scheduled workflow fails on the server, I execute it manually on local on desktop.
Yes, almost 8 out of 10 times workflow fails on the server.
local version: 4.3.1
Server version: 4.1.4

sascha.wolke · October 4, 2021, 10:28am

Hi @ShinagdeS,

is there an opportunity to update the KNIME Server/Executors to a more recent version? Can you post the full Error Message and Stacktrace from the Logs? Not sure what goes wrong, but 8 out of 10 times workflow fails on the server sounds very strange.

system · April 4, 2022, 10:29pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.