Algorithm(s) for a continuous output and cathegorical variables

I am searching a node to run a statistical / data mining analysis to find out, what operators in certain process step are responsible for the most breakage.

Output- breakage in % (continuous) or number of defects (discrete). There is a various batch size.
The Inputs are the operators only (cathegorical). One production step = one operator and there are 9 production steps.

Which algorithms are suitable for such a dataset?
H2O Generalized Linear Model for regression?

Hi @essegn!

Could you please provide a sample of your dataset? This would make it a lot easier to follow your question!

Also, could you please elaborate a bit more about what “operators” are?


1 Like

Like @stelfrich said: maybe you can provide us with more details what you want.

To get an idea about models for numeric and categorical data you could read these entries:

1 Like

Data.xls (42 KB)

Hi guys,

thank you for your replies.
Please be so kind and check the dataset, what i need to have analysed.

There are 8x Inputs (with I at the beginning) and one Output.
The inputs are process steps in the manufacturing.
The each process step is being performed from one operator - operating personel.
All inputs are cathegorical variables.

Output is continuous - breakage in %.

Classification or regression?
I am not sure, because both need to have mostly continuous variables. My experience so far.

The goal of the analysis should be to find out, from what operators (working stuff) will be produced the most damage and build some model in order to predict such a faulty behaviour of the process. The production step is not important.

I have used the general linear model in Minitab (ANCOVA) so far, but i hope, that they are some other algorithms in Knime out there.

Ok - I will have to see when I have a chance to review that.

You should think about what your data is and if maybe some kind of rule induction would be best (with target if you say broken unbroken by a predefined rule).

I compiled a collection of rule nodes - you could try if they may help you. Some rules might accept numeric targets. You might want to read the descriptions.

Hi mlauber71, i have tried it with workflow you gave me, but i am somehow lost.
I tried the Association Rule Learner and found out that I am not able to connect a XLS Reader with that node.

Would you be so kind to check the data by yourselve?

@essegn - the task seems not to be so simple. Or you would have to think again what the result should be. Most rule inductions seem not to be capable of dealing with numeric targets. I set up a workflow with Weka M5 that handles numeric targets although the configuration and interpretation is a challenge. I have provided some links to relevant discussions. You might see if that might help you.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.