Security Concerns With Using KNIME + Py4J Python Library

jude_ran · April 18, 2022, 8:35pm

Greetings!

With the recent release of KNIME 4.5, I was eager to try out the faster “Labs” version of the “Python Script” node. However, I noted that “Py4J” Python library is a requirement for executing this node. I have some security concerns with this particular library, so I am interested in learning what security measures you have in place to protect the user from some of the vulnerabilities it introduces.

Some explanation on this vulnerability - the “Py4J” library allows Java to talk with Python over a socket on local IP Address (i.e., 127.0.0.1). On Windows, this is not an issue since only one user is usually logged into a PC at a single time. However, on a Linux/Unix server, this may be an issue since multiple users can be logged onto the server simultaneously.

The question becomes – if multiple users are running KNIME with the “Py4J” library on the same Linux/Unix server, would it be possible for a “different” user to respond to an open KNIME/“Py4J” socket on 127.0.0.1 and execute commands as the “originating” user?

What security measures do you have in place to stop this scenario (a “man-in-the-middle” attack)? Do you use something like a TLS certificate to ensure that the response is provided by the same user? If not, would there be any means to enable and configure TLS for the Java/Python socket communication?

A detailed explanation of your security measures would be appreciated.

Thanks!

carstenhaubold · April 20, 2022, 7:50am

Hi @jude_ran,

Welcome to the KNIME forum!

Thanks for this very good question. We have not considered a multi-user setup with malicious socket connection interception yet.

Right now, we are not using TLS for the connection via py4j but we will look into enabling it, py4j has that built in. This makes sense in any case to prevent eavesdropping on the communication, but would not prevent man-in-the-middle attacks yet that intercept the initial connection setup.

py4j also offers client authentication. Your suggestion of employing a user-specific secret to authenticate the Java and Python processes with each other is a very good hint. We have created an internal development ticket for this topic and will address it shortly.

Is this something preventing you from using KNIME at the moment?

Cheers,
Carsten

jude_ran · April 20, 2022, 6:33pm

Thank you so much for the response! And thank you for adding an internal development ticket to address this vulnerability – I truly appreciate it!

Unfortunately, this security risk will prevent me from having KNIME 4.5 installed on our Linux/Unix system. In the meantime, version 4.4 can can be used, though it’s not ideal given how significantly faster Python scripts now execute in the 4.5 version (and I appreciate this improvement, by the way).

I realize that you just created the ticket and probably had no time to properly investigate the issue, but would it be possible to provide some broad estimation as to when you think a “patch” for this would be released?

Thanks again!

kienerj · April 21, 2022, 8:47am

You can still install KNIME 4.5 but simply not install the new Python (labs) nodes and keep using the “old” python nodes. In fact I would not use the new Python (labs) nodes in a production setting anyway.

carstenhaubold · April 21, 2022, 10:00am

Thanks @kienerj, indeed you can use KNIME 4.5 and just not install the extensions that you do not want to use.

@jude_ran if you are really concerned about someone eavesdropping or intercepting the communication between KNIME and the Python process that runs the code of a script node, then the current Python Script nodes are unfortunately also not an option as they do not use an encrypted connection either.

KNIME AP was designed with a single-user setup in mind that is run locally on a user’s machine. There it is unlikely that someone will listen in on socket connections. If you are afraid that multiple users with access to the same machine might take malicious actions against each other, then this sounds like you might want to sandbox some processes in virtual machines?

Also, note that for scripting nodes there is always the risk factor that the user code inside a script node could open up a network connection and transfer data.

I realize that you just created the ticket and probably had no time to properly investigate the issue, but would it be possible to provide some broad estimation as to when you think a “patch” for this would be released?

We hope to include a fix in the upcoming KNIME 4.6 release

Best,
Carsten

jude_ran · April 22, 2022, 2:18pm

Thanks for making me aware that previous Python nodes have an equally insecure connection; I should have anticipated this. I now see that Python extensions cannot be installed for any version of KNIME on our Linux/Unix system.

While I certainly can use KNIME 4.4/4.5 without these Python extensions, most of our planned workflows would require Python scripts in some capacity and therefore not having these extensions available will severely limit the usage of KNIME going forward. Not placing any blame on you; I should have done my due diligence and investigated this topic of socket communication beforehand.

Some more thoughts on securing this connection for the 4.6 version – “Py4J” supports both token authentication and TLS. If using token authentication, keep in mind that a server socket may allow for an unlimited number of connection attempts and therefore the token should be cryptographically strong enough to withstand this scenario. If using TLS, keep in mind that this only authenticates the server by default whereas client authentication should also be required to fully secure the connection. Furthermore, the certificates used for the connection should either be a “self-signed” or a unique “chain of trust” certificate for each user.

I am sure you are aware of all this since this is “Cybersecurity 101” – in fact, you probably know more than I do on this topic – but I just want to ensure the socket communication is secure enough for my eventual usage. I love KNIME – I think it’s the best data analytics tool in existence – and not being able to fully use it is painful.

carstenhaubold · April 22, 2022, 2:50pm

Thanks for the kind words and the suggestions for improving the security of the py4j connection! Yes we’re planning to enable both, client authentication and TLS.

MarcelW · April 22, 2022, 3:01pm

For the time being, perhaps treating Python as a generic “external tool” and calling it using the External Tool or External Tool (Labs) nodes might be a viable workaround. Both nodes transfer data between KNIME and the external tool only in terms of temporary files (well, and the command line arguments to start the tool). The nodes are more limited and likely more cumbersome to set up compared to scripting directly within KNIME, though, so depending on how tightly Python is integrated with your KNIME workflows, this may or may not be a good fit for you.

Marcel

system · July 21, 2022, 3:02pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.