The first contains a product ID and process parameters (PARAM 1, PARAM 2, PARAM 3, PARAM 4, PARAM 5, PARAM 6). The values of the process parameters vary between -1 and -36.
The second contains the same ID and results in terms of 3 quality defects (in number and in percentage)
I want to combine the two tables in Knime in order to predict the best combination of process parameters (from table 1) which generates the minimum quality defects (the treatment will be done by default separately and not all the defects at the same time)
I am enclosing the two tables in the hope that you can help me
Welcome to the forum and sorry for the delayed response here. Looking at your files, it seems like there is quite a lot of duplicate information in the Process Inputs table. But apart from that, I guess you just need to do a simple join on ID between the two tables?
In that case, your workflow might look something like the screenshot below. Let me know if your problem is more complex than my assumption, and we can work through the details.
I have the Knime 3.4.1 so I don’t have the Duplicate Raw Filter node. Anyway, I can filter the Excel sheet before uploading it in Knime.
After joining both files, what’s the most practical way to visualise the best combination of the 6 parameters that generate 0% of defect?
I’m not sure. but I suppose it depends on what you mean by “best combination”. After joining the tables, I can see several product IDs that have zero defects, but since I don’t really have any context or domain knowledge about your data, I don’t know which of those you might consider best…
The 5 parameters represent grinding profiles of rolls varying between -30 to 0. I want to get the optimal configuration of profile’s cylinders which generates 0% of defect.
I hope I am clearer
@MahmoudR there could be three ways to go forward.
You could state your problem as a classification task with 1/0 or true/false. Then all prediction models for classification are open to you. You could start with a decision tree that can easily be interpreted and you could take your rules from there. Although it might not be the strongest model.
And move on to random forest and xgboost models. Or you could let some auto machine learning do the task.
If models get very complicated they become harder to be interpreted. You might be able to employ a model but only know so much of its inner workings besides variable importance. Methods like LIME could help you with that
Another approach might be to use rule induction methods. They might give you chains or sequences of events sometimes similar to a decision tree. They might have a target. I put together some in a workflow - you would have to put in some work with configuration and interpretation.
Next attempt could be to formulate your task as a regression problem. % of failure or an index. In this case the world of regression models is open to you. I just reference my recent post about interpreting an automl approach again.