Why a Python based KNIME extension so big?

jiqicn · September 16, 2022, 1:55pm

Hi,

I have developed a Python-based Knime extension and wrapped it up as a local update site. Then I found that the plugin files (those JARs in the plugin folder) are really big. The functions of the extension are all quite simple, but the size of the JARs can easily be hundreds of MBs. Is it normal?

During the process of generating the local update site, I noticed that Maven adds a lot of libraries as dependencies, but I’m not sure if all of them are useful. Is there any way to control that process? Or in other words, is there any way to slim down the plugin JARs?

Best,
Ji

carstenhaubold · September 25, 2022, 12:04pm

Hi @jiqicn,

the update site of a Python-based KNIME extension is so big because it contains the complete conda Python environment for all 3 operating systems. The reason for this is that whoever you share the update site with - even without internet access - will be able to install and run your Python-based nodes.

So yes, the large size is normal. Is this holding you back in any regard?

Best,
Carsten

jiqicn · September 26, 2022, 9:13am

Hi @carstenhaubold,

Thank you for your reply!

In my case, I would prefer to share my extension with the co-developers and users on GitHub for the current stage, since it is not mature enough to be published on KINME Hub. However, as the files are so big, they can easily eat up the storage and bandwidth on GitHub. I could also just share the source code through GitHub and share the update site through Dropbox or other cloud storage, but that strikes me as very cutthroat, as I think version control for both should be done together.

Also, I don’t know if it has anything to do with the size of the update site, but loading workflows that include extensions I’ve developed myself is always very slow, even for very simple workflows. Do you have any idea about that?

Best,
Ji

carstenhaubold · September 26, 2022, 9:48am

During development, I would suggest that you ask your co-developers and alpha-users to clone your GitHub repository, set up the appropriate Python environment themselves, and register the Python-based extension in their knime.ini in the same way as you probably did.

Your co-developers probably have the code checked out anyways, and your users would just have to git pull to get the latest version of your extension. Would that be a suitable workflow for now?

About the speed of loading the workflow: if there are Python-based nodes in the workflow, we need to use a Python process to configure each node. If you have the development_mode enabled for your extension, this forces KNIME to start a new process every time. This is useful for development because you see your code changes immediately, however, process startup can take some time - especially on Windows. For your users, or if you want to use instead of develop your nodes, you could disable the development_mode which allows KNIME to re-use the Python process, speeding up the workflow loading.

jiqicn · September 26, 2022, 1:31pm

Can you maybe tell me how to enable/disable the development_mode of my extension? I can only find the debug_mode option in config.yml, is that what you mean?

carstenhaubold · September 26, 2022, 1:47pm

Ah, hehe, yes. Was reciting from memory, debug_mode is what I was talking about. Sorry for the confusion.

system · October 3, 2022, 1:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.