Is there way to run packages from https://pypi.org/ or https://github.com/ within KNIME?

Hi,
Google’s BigQuery’s auto-schema-detection relies on first 100 rows at the most and it’s difficult to rely on. However if I use this Python package, I can increase accuracy and prevent any errors or re-work. I can manually run it locally but wondering if there is a way to do the same within KNIME workflow without setting up individual components like writing scripts. If I can run the package by something like calling the package with [ ```
python3 -m bigquery_schema_generator.generate_schema < file.data.json > file.schema.json


Thank you.  

https://pypi.org/
https://pypi.org/project/bigquery-schema-generator/
https://github.com/bxparks/bigquery-schema-generator

Ahh, I’m finding posts now that this is over my head…

Or is it not that bad??

It’s unclear to me what you actually want especially given your own comment to your question.

For running python code from KNIME you need to install the Python node KNIME extensions and then configure KNIME correctly. Easiest way to achieve that is using the Anaconda/Miniconda python distribution. Then you can install any python module be it from pypi or github into your python environment for KNIME. Of course it requires you to be able to code in python to actually make use of this.

However your own reply is about installing a non-official knime extension which is something completely different. So it’s hard to give a good answer.

Can you clarify?

2 Likes

Hi @kienerj,
Thank you for replying and I appreciate your willingness to help. Right now I have the ETL process and the data ready in KNIME and eventually I’ll need to upload that into BigQuery. I now have like 300 cols so I’d like to use their auto schema detect but they only read first 100 rows at the most so it’s not reliable. I want to incorporate this package into my KNIME workflow. https://pypi.org/project/bigquery-schema-generator/ I can feed the csv into it and will generate the schema based on full dataset. I wanted to see if there is a relatively manageable (for someone like me with no coding skills) way to do that. Hope that makes sense. Thank you!!!

I don’t know this tool so it’s hard to tell what is possible or not. From the tools page it’ snot clear if it works on windows at all. Then you will certainly need a python installation and be able to follow the install instructions on the page. the good news is it seems to be a premade script so to use it you don’t necessarily need coding skills. Still within knime you need to build a workflow that can call the external tool via command-line. For that have a look at these nodes:

image

But i don’t think they actually fit your use-case well but you need to investigate. To make this work it will certainly not be simple even if it might be doable without programming.

Personally I think the capabilites to call external tools (via cli) should be improved within KNIME. This is more a comment towards the KNIME team. Your example is good because it show that sometimes you want to use an external tool to create something completely new not just new columns for existing rows. And often it’s better to let the user handle the reading of the generated output because again in this example it’s a json file which none of the above nodes in the screenshot support as far as I know.

2 Likes

Hi @kienerj,
Understood and what you are describing totally makes sense. And External tool nodes look interesting. Thank you very much for clarifying capabilities of KNIME. I really appreciate your time.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.