This workflow trains a few data analytics models and automatically selects the best one to predict death in car accidents. Data has been sub-sampled to allow the workflow execution also on the least equipped machines. Sub-sampling is in metanode Reading Data/Pre-processing and can be removed to make the workflow run on all data.
This is a companion discussion topic for the original entry at https://kni.me/w/dTjnMslOn2UlufAZ
Has this example been updated lately?
I tried it but some of the databases are not accessible and some of the nodes are deprecated?
Can someone update this please?
Thank you in advance.
Hi @tw349 -
You’re right, this is an older workflow. We’re slowly going through our Example workflows and updating them, but this one hasn’t made the transition yet.
That said, I was able to run it by executing the Accident Table, Vehicle Table, and Person Table metanodes in series, instead of all at once - these are located in the initial Reading Data metanode. For whatever reason, the R library used to read DBF files doesn’t seem to like parallel execution.
The deprecated nodes shouldn’t be an execution problem - we have these included for backwards compatibility reasons, but they should still work. That said, when this workflow gets revised, those deprecated nodes will be updated to their most recent versions.
Hope this helps.
I’m wanting to do something similar to this for regression models that are predicting numeric outputs. Any suggestions on how to evaluate multiple models? I assume R2 is the way to go, but I’d also like to know if anyone has any ideas for a simple-ish visualization like the multiple ROC curves to show the performance of the different models.
Maybe just a grouped bar chart displaying different metrics (R2, RMSE, or whatever is of interest) for each model would work?