Portable Python Components

Hello KNIME Community,

TLDR: Is there a way to embed python module(s) in a component and reference them from the Python Script node?

I recently read this article. It was a great introduction but I don’t understand how the component is portable if the Python Script node is referring to a module that lives inside the workflow and not inside the component itself.

It seems that when you execute the Python Script node the current working directory is always the workflow directory and not the directory of the current component. Any modules you import into the Python Script node would then have to be placed in the workflow directory and would not exist if someone were to pull the component from the knime hub etc.

You could put the script in the component directory and use extract context properties to get the absolute path to the workflow directory and then add the component directory to that and add this to sys.path. This isn’t very robust though because the way knime names nodes you aren’t always going to be sure what the node ID for the current component is. If you choose to match the name regardless of the node ID it wouldn’t hold up to you having multiple components in the same workflow. Even if there was a way to make this work it would not be overcome by the issue below.

I have also tried sharing a component to my local workspace then adding the module inside of this component folder and editing the component to import the module. This all works fine when editing and testing the component but when I drag the shared component into a new workflow the module file does not come with it. It is the same when sharing a component. If the component has files in it knime does not recognize they will not be shared to your local workspace / knime hub.

I am hoping there is something obvious I am missing. I was assuming that components should operate like workflows where they are essentially containers that can be moved around and deployed with their dependencies all contained inside of them but that does not seem to be the case right now.

Any help or pointing me in the right direction would be much appreciated!! Thank you for your time!
Kaegan

1 Like

Hi Kaegan,

So I am not an expert on this topic and actually there are a lot of changes currently being made to make shipping python scripts. Have you tried resolving the issue with a conda environment propagation node? Also, are you already running version 4.6?

Best,
Karen

1 Like

Hey Karen,

Thanks for replying! I am still using knime 4.3.4 because that is what our server is using at the moment.

The problem I am describing doesn’t have to do with the conda environment propagation node, it is more about importing local modules in a python component and how that will work when the component is posted on knime hub or shared in general. The conda environment propagation node works great for including external modules but I don’t believe it would solve the problem described above unless I wanted to go share my code publicly as a package.

Thank you for your thoughts!
Kaegan

If you got some REST endpoint you could publish your packages there and then use request node with API key (some kind of security layer) to pull the modules from there.

Hello @Kaegan,

Yes, you are right, the python script nodes use the workflow directory as default. One way to export python modules with Components is to make the component write the python module (.py) to the workflow directory whenever the Component is executed, this will make the module available in the Workflow directory and then the Python Script Node in the Component can then use this module for running the script of the user.

Please look at the Component that I have created which allows the user to select the .py file and write it to the workflow directory, you can use this component and embed it in your Python based Components.

I believe this will help you because I used this hack while working with KNIME 4.3. Let me know if you need further assistance.

5 Likes

Hey @Mpattadkal,

Thank you for responding!! I have taken a look at this potential solution but I’m not sure it solves my problem. It looks like this component would only work for myself locally if I always have a path to the .py file I want to use. If I were to try to distribute my component via the KNIME hub no one would have access to the module that I pointed to locally. This could maybe work for the KNIME server by storing the module there but if I am making components for other people to use at my company who aren’t logged into the KNIME server they also wouldn’t have access to the python module. It would be ideal if the .py file could be saved inside of the component and then referenced or copied somehow from there but as I mentioned in my earlier post:

Thank you again for your reply! Please let me know if I am mistaken in my understanding of how this component works.

@Kaegan well, well. I came up with this ‘solution’ though it is not a very nice one. The component would write .PY file(s) with functions dynamically to a local folder of the workflow.

The code itself would be written into a string variablen and then be exported to a local file so all logic can be stored within the component … in a way … There also is a init file to mark the module. Adapting from this workflow with individual .PY files.

The created module with init file and .PY file will then be imported via the sys-path idea.
Not sure if this is a solution you would want to roll out. But it does work in general.

5 Likes

Excellent solution by @mlauber71 , I also created something on these lines, Please have a look at the workflow which has the Python Scipt (.py) code in the Python Script node and it writes this as .py file to the workflow directory. Embed this workflow inside your Python Component and it will write the Python module defined by you to the Workflow directory

2 Likes

@Mpattadkal I think the point is, that the Python module/code should be within the component (“portable”) and would be there when you share it. If you have a complete workflow you can always carry the .PY file with you and include it with the help of environment variables. That is how I understodd @Kaegan question and that is what my ugly ‘solution’ tries to adress :slight_smile:

1 Like

That is correct, I shared the workflow above to give @Kaegan an idea on how Python files can be written to workflow directory, the above-given workflow can be embedded in the Python Component and made to execute this workflow segment so that it writes the .py file to the workflow directory and then the Python Component can use this supporting modules

Thank you @mlauber71!!

I haven’t gotten a chance to try this out yet but will as soon as I get some time. That sounds a lot more like what I was looking for. @Mpattadkal thank you for your help as well but I am still not sure if that component solves my problem because of my concerns described above. Please let me know if I am mistaken though.

1 Like

@Kaegan

Is there any reason you can’t publish your package on Pypi or Conda-forge? This would be the obvious route for making your package available for those wanting to use your Python component. You would just need to state that the component has a dependency on published packages.

In separating the package and component you can update both individually - if someone has the component they can pull the latest package with appropriate bug fixes/ security patches as appropriate.

DiaAzul

Hey @DiaAzul,

Thanks for responding! Some of these components are for internal use only at my company and we have a private knimehub where people can share them. We run an internal training program with KNIME so I develop a lot of data science related utilities for others to use. In addition, some of the code I’m working on would only work internally on our network so it wouldn’t make sense to publish it publicly. We do have a private pypi server but conda does not make it easy to include these private repositories when deploying a workflow using the conda propagation node so I would have to contact the server admin every time I would want to add a private dependency or have one installed in an existing conda environment.

I know that use case is specific to me but in general having to publish a package adds a little bit of overhead that might not be necessary when all you want is to be able to edit your module(s) in an IDE externally and share relatively simple components using python for others to work with even on the public KNIME hub. The ideal solution would be if we could just embed modules inside of components and have it come with a component when sharing it. I know that is likely not available now but I would love to see the possibility of something like that in the future.

I would also love to see support for pipenv since that would make including private repo’s in workflow deployments much easier.

@Kaegan FYI, I don’t use Conda in my workflows. I have a .venv folder and point my KNIME installation at the python.exe. within the Python preferences. Does this help you at all? You can then use Poetry or pipenv to manage the environment. The only downside is that KNIME doesn’t publish knime-extension on Pipy… though they have agreed it may be a good idea.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.