As for the error, changing line 20 from
... input_table.drop['Rating'] to
...input_table.drop('Rating') should resolve the problem (note the round brackets instead of the square brackets).
Regarding your questions:
You can also put the data cleaning code directly into the script. Also note that Python scripting nodes are capable of importing external Jupyter notebooks and using their functionality directly, as explained in this blog post.
I don’t fully get that question. Could you please elaborate or upload an example workflow that illustrates your problem? The input data to a Python scripting node (that is, the content of the train.csv in your case) is made available to the Python script via the
input_table variable (as you can see in the screenshot you provided).
I am attaching the my ipub notebook and csv file used for it. I want to save that model as pickle and load the model and scoring on the test data with the help of nodes. please help with it. You can see entire code in the screen shot from model saving to model loading and let me know how to implement with the help of nodes… The data set is iris_data set.
Have you tried adapting the sample workflow I posted? I might have a look at your example if you would attach it to your post.
I am taking your workflow as example and i am building my workflow i am facing below error in my example. how to share the workflow and dataset?? so that i can share it.
The dataset i used it iris dataset. Please let me know the changes to be done to make it work and also is it important to use extract context properties and java edit variable nodes??.. i was getting path error when saving and loading the pickle file
Does the “model” directory at “C:\Users\jafarsharifs\knime-workspace\model” exist? If not, you need to create it first as Python’s
open(..) function does not do that automatically.
A few things. The paths seem to be different on MacOS and Windows so you end up with a path with / and \ which is not good (maybe the os.sep is not adequate). And your pickle files have different names (did not immediately spot that).
Attached a new modified workflow that does work.
kn_example_python_logistic_regression_iris.knwf (58.5 KB)
Thank you i have resolved it without using Extract context Properties. I have a question i am able to save the model as pickle.
- Will knime support saving the model using pyspark…In case of pyspark model is saved as folder??
- I want to implement models as branching one after the other one model output is input for other model Nothing but an hierarchial of models as shown in below figure.
3).Will python learner support sprak mlib libraries/pyspark models and can we execute pyspark models using python learner node?
How can we implement above workflow in knime using same iris data or on customer data .available in sample workflows
I am not sure I get what you are trying to do, but let me try to answer your questions:
Will knime support saving the model using PySpark…In case of PySpark model is saved as folder??
I assume you are relating to a ml/mllib model that was created in a PySpark snippet node. Currently the PySpark nodes to not support the output of a model via a KNIME Port. But you can save the model in the HDFS of the Cluster. It is as easy as calling model.save(sc,“path in hdfs”). You are then able to load the same model in another snippet with ModelCLass.load(sc,“path in hdfs”).
Some models can also be saved as PMML https://spark.apache.org/docs/latest/mllib-pmml-model-export.html which could then be used in any KNIME node with PMML input.
I want to implement models as branching one after the other one model output is input for other model Nothing but an hierarchial of models as shown in below figure.
My first idea, here would be to use three spark rowfilters, that filter the needed rows after the first model and then use it in the corresponding PySpark snippet.
Will python learner support sprak mlib libraries/pyspark models and can we execute PySpark models using python learner node?
I am afraid I do not understand what you want to do here. My best guess is that you would like to PySpark within the normal Python node and then have a pickled object of the ml/mllib model? What is the use case for this?
best regards Mareike
Can you please share any code snippet that saves the pyspark model in hdfs path.
sorry for the late reply I did not noticed your replied.
Here is a Workflow that saves and loads a model in PySpark in this case in a local Big Data environment.
You might have to change the path in the save and load nodes to fit your system. The path is the path in HDFS, for local Big Data this is your local Filesystem which might not have a “/tmp” folder.
PySparkModelSave.knwf (36.6 KB)
I hope that helps you.
best regards Mareike
can a model be converted in pickle format only if it is written in python learner node? which means we have to manually code entire model in python learner node?
I am not sure I follow through. The Python Learner node in KNIME is mainly a code window with some suggestions on how to arrange the code.
You could also just write some model to the disc from the Python node without bothering with other nodes, like MOJO files in this example:
I think the structure is just there to be consistent with other KNIME model handlings.
You could BTW pickle other objects from Python too:
In the above eg. in image , what if I don’t want to write my model in python learner node ,instead I wanna make it with random forest learner and predictor nodes. After doing so, is still the model convertible to pickle format?
Pickle is not a model or data format by itself but it would ‘wrap’ a Python object and store it to be brought back later.
If you want to use models between KNIME and Python you should look for example think about using PMML - mostly I have used MOJO model files from H2O.ai which can easily be shared between KNIME and Python (or R) - if you use the correct MOJO version (which is not so obvious, unfortunately) - eg:
KNIME 4.1* <-> H2O 3.2* (MOJO version 1.3)
KNIME 4.2* <-> H2O 3.3* (MOJO version 1.4)
With other model formats you would have to check if they could be used in a corresponding Python environment/package.
Which is why I like to use
Python or R -> H2O.ai -> MOJO-file -> KNIME (if you want with Sparkling Water on a Big Data cluster)
Actually I was finding a format in which knime models can be converted so that I can save it outside of knime also (like in hdfs) for deploying it somewhere else. Please help
There are certain model that generic KNIME models support that you would be able to use in other environments. Namely PMML and MOJO. I am not sure if there are more. Which one would you like to use?
If you want to save files in HDFS you would have to upload them via the upload node (like at the beginning of this example).
An example of how to use MOJO in a big data environment can be found here: