Using best model from Guided_Analytics_for_ML_Automation

TomasCardoso · July 30, 2019, 10:43am

Hi all.

I managed to successfully run the Guided_Analytics_for_ML_Automation from the HUB examples in my server, in the WebPortal and I would like to run a scheduled workflow in the server that makes use of one of the models that can be downloaded at the end of the execution from the Guided_Analytics_for_ML_Automation. Is this possible? I’m trying to do it but I’ve encountered a couple of problems

First Problem:
I can run that best model manually by giving it a csv file when prompt, but since I want to run it on a schedule I would have to change it and give it input instead of that csv file, that part is fine, I’ve tried on the Analytics platform and my Python source code that fetches some data from the Google cloud can give that data to the model and run it. The problem is, if I try to copy and paste the model’s workflow into mine with the python source node, it doesn’t work because the model workflow can’t find the table parameters for the model… I’m not sure what path to set for that node either for it to be able to find it. Or maybe do I have to save my workflow in a specific folder for it to find it?

Second problem:
I have another problem relating to the python source node. I need to read from a couple of files and I would like to create a project folder, have the workflow in it and have a folder there containing the files I need, and read them from there, so I would like to use a relative path. But if I give it a relative path it doesn’t work… It only works with absolute local paths, which is a problem because my server won’t be able to find my local files. My repository looks like this and my workflow is the “knime_and_google”. What kind of relative path would I need to give it to find the json in the aux_files folder?

Edit: After some digging I understand a bit better relative paths and KNIME so I tried this to load the json: knime://knime.workflow/aux_files/knime-project-automl.json in my python code but it doesn’t seem to work. However, if I move my json to the workflow, it will create a JSON node and in the path field in the node, this path works… Is this a python problem? Does these relative paths only work on nodes, not on python code? If so, how can I access the path to this json in a server. Cause I really need the path, not the json object itself

Edit 2: I also tried knime://knime-server/auto_ml/Google_communication/aux_files/knime-project-automl.json, didn’t work

pic_for_forum

Hope someone can help me with this.

moritz.heine · July 31, 2019, 6:51am

Hi,

concerning your second problem: you’re really close.

knime://knime-server/auto_ml/Google_communication/aux_files/knime-project-automl.json (absolute)
knime://knime.mountpoint/auto_ml/Google_communication/aux_files/knime-project-automl.json (mount point relative)
knime://knime.workflow/…/aux_files/knime-project-automl.json (workflow relative)

However, I’d assume that Python nodes aren’t able to handle knime://-URLs because we simply execute the Python script without checking for such URLs. Alternatively, you could use the File Reader node and certain that the Python Script (1 => 1) node.

Cheers,
Moritz

TomasCardoso · July 31, 2019, 8:06am

Oh alright, thanks. Yeah I don’t think python nodes can handle this

paolotamag · July 31, 2019, 4:15pm

Hi Thomas,
regarding the “First Problem” I would do the opposite:
rather than copying guided automation massive workflow into yours, i would do copy your python workflow into the massive webportal application.

Make sure to merge the workflow groups in any case so the dependencies with files, shared components and sub-workflows are kept.

Just move the directory folder in the 01_Guided_Analytics_for_ML_Automation workflow group.
Then copy paste the nodes and components of knime_and_python workflow in the the workflow where all the automated components are.

If you would rather call your workflow rather than adding the nodes to ours you can use the call workflow nodes (https://kni.me/n/M66aoaj-gqFbGR5j). Just make sure you use a relative path and move your workflow within the workflow group 01_Guided_Analytics_for_ML_Automation .

TomasCardoso · August 1, 2019, 9:24am

Oh I think I explained myself poorly. I don’t want to join my python workflow with the guided automation one. I want to join one of the models that can be downloaded upon completion of the Guided automation with mine.

My end goal is for a client to run the Guided automation, see what model works best with his data and then use that model in a scheduled job to continuously read data from the cloud and write predictions back to the cloud. So I would encapsulate one of those models with two python nodes, one for reading and another for writing.

But I’ve been able to do it anyways and I did have to copy my python nodes to that best model workflow, it worked that way. I would like to be able to do this automatically using a call workflow from within my python workflow to be able to use that model, but maybe that isn’t possible, I have to investigate further.

But thanks anyways!

paolotamag · August 1, 2019, 12:33pm

We are investigating this strategy you mention as well and if we find how we will let you know!
Let’s keep in touch!

TomasCardoso · August 6, 2019, 2:11pm

Hi.

Do you think it would be easy to use this workflow for regression problems? I see that the nodes used in the machine learning models are for classification, I tried to modify these for the Regression and remove the nodes in the Select target metanode that force the exclusion of regression variables so I could select the regression variable as target, but it seems it is not as simple as that. I suspect this doesn’t work because the new Learner (Regression) node now doesn’t have the right target variable set.

Log:
ERROR Call Workflow (Table Based) 0:677:358:224 Execution failed in Try-Catch block: Failure, workflow was not executed, current state is IDLE.
Column Splitter 0:168:242: All columns in top partition.
Column Splitter 0:168:378: All columns in top partition.
Empty Table Switch 0:167:167: Node created an empty data table.
CASE Switch Data (End) 0:167:168: Node created an empty data table.
Extract Table Spec 0:167:169: Node created an empty data table.
Column Filter 0:167:171: Node created an empty data table.
Column Appender 0:167:164: Second table is longer than the first table! Missing values have been added to the first table.
H2O Random Forest Learner (Regression) 0:172:146: Selected target column null can not be found in input frame. Please select a valid target column in dialog.
Column Splitter 0:167:159: Some columns are no longer available: “Selected features”; all columns in top partition.
Table Row to Variable 0:240:173: Table has 1254 rows, ignored all rows except the first one
Missing Value 0:168:241: The current settings use missing value handling methods that cannot be represented in PMML 4.2
Missing Value 0:168:388: The current settings use missing value handling methods that cannot be represented in PMML 4.2

Do you have any quick fix to make regression models work? Or would I have heavily modify the original workflow? I would also like to have something similar for unsupervised problems, using k-means and such, but I think that one really needs to be using a much more modified workflow.

paolotamag · August 7, 2019, 6:59am

For unsupervised stay tuned as a new workflow should be released soon.
For regression there is no quick solution available.
You would need to go through the workflow and change all the strategies using the target variable.
The learner regression node probably needs a flow variable stating the numerical column to be predicted. Additionally every time the accuracy and other statistics are computed via Scorer node you need to replace with the R2 or RMSE or what ever you like. It will indeed take some time.
You could then release it on hub.knime.com and get many kudos from the KNIME Community though

SimonS · August 7, 2019, 8:07am

Hi @TomasCardoso

Regarding your issue downloading and running the prediction workflows automatically. I hope I understood correctly what the problem is.
What there is basically happening if you download such a workflow is that you download the workflow and during downloading a table containing all the models and configurations (which have been optimized) for the workflow is injected, i.e., copied, into this workflow.
So if you want to use a modified version of the prediction workflow, you need to copy this table by yourself into this workflow. I am not sure if this is possible on the server but even locally in your Analytics Platform this may not an easy task.
Let me know if this is not your issue but, anyways, I think what you are trying to achieve is either not easy and needs some “dirty hacking” or impossible.

Cheers,
Simon

TomasCardoso · August 7, 2019, 8:59am

Yeah I managed to make it work. What I did was Import workflow and add my python nodes to it. It works fine like this. But the biggest problem is that these workflows come with a manual csv read, that is, when I run the workflow I have to specify the path to a csv, and I want to do periodical automatic ingestion on the server, so i have to modify the workflow anyways and replace the data source.

Bottom line: I got it to work, everything is fine. I’m now trying to modify the Guided automation workflow itself so it can do regressions instead of classification but it’s no easy task.

TomasCardoso · August 7, 2019, 9:01am

Oh that’s good news, the unsupervised workflow. Could you give a rough estimation on how long? I’ll do my best with the regression, but I can see it’s no easy task indeed.

SimonS · August 7, 2019, 9:02am

Nice to hear that!

There are indeed several nodes that need to be changed or adapted to make regression work. Let us know if you manage it!

Cheers,
Simon