the best mining model you suggest

I am working on a project which given new born infant father and mother info,it predicts color of infant's skin.(we consider two different person born in a same country as a one person ,for example two different chinese are just considered chinese)

our training format :

father's nationality  mother's nationality  father's age mother's age  father's skin color mother's skin color  infant skin color

an example of a single row in database:

chinese  ethiopian  35  24  W B B

and suppose databse has more than enough number of instances.

now we have this info as input  (father's nationality | mother's nationality | father's age | mother's age | father's skin color | mother's skin color ) and want to predict infant's skin color.

German ethiopian  24  35 W B ?

and suppose databse has more than enough number of instances.

there are two questions:

First problem,target class column is not unique and both mather and father's info should be considered at the same time.can anybody help tell me which mining method can do this task?

Second problem,how can I tell preditor that Father's nationality age and skin color are related together and all belong to same object(person) ?(same for mother's nationality ,age and skin color)

Please somebody just give me a small hint

There are so many possibilities to try from the mining category in knime.

i would start with a decision tree learner and predictor.

in the node you specify the target class and the columns to be considered to base predictions on.

simon.

 

in decition tree learner,I cannot see the target class (infant's skin color ,I just see father's and mother's column),what is wrong?(maybe the problem is Father's and mother's column are string but infant's skin color is 0 or 1)(0 for white,1 for black)

 

In other words how can I select target class which is integer or decimal?

That is exactly your problem. Decision Tree and others in mining category require class (nominal) columns. So all these columns must be in String format, not integer or Double.

 

This is easy to rectify..

 

Prior to the Decision Tree nodes, insert a Number to String node, and in this node specify which columns you wish to convert to String. Simply as that :-)

Simon.

thank you richard! now it works perfectly.

One more question:

there is a part of predictor that guess number of children but the result of prediction is really bad.the reason is predictor should consider number of children relatively meaning that 0 or 1 child is much different than 5 or 6 child.when I convert number to text we lose a big info in our data which is the sequence of numbers.I think this fact affect the results.

what do you suggest to solve this problem? now the prediction is only on 33% of cases is correct!

In which case, as you have numbers like this, leave those columns as integer or double, and instead try the naive bayes learner which can take both nominal (string) and numerical (integer or double) data.

this may increase prediction accuracy.

thanks

simon.

I cannot select integer type of attribute as class in Naive Bayesian Learner

no answer?

Hi,

Apologies I misread your question, I thought you wanted to use integer columns in your data set for predictions. Make sure any number columns are in number format such as integer or double that you wish to use to base predictions on to capture that extra information.

however, back to your actual question. That is difficult to do, a lot of the available models are classification models which do not take in to context that 0 is much different than 4 etc. they generally require string input only.

if you want a non-classification model, you could use the regression models under the statistics section which allows integer class columns, however, you are then limited to no string columns in the prediction set.

you do have a larger expanse of models available under weka. You could try the linear regression model under weka/classification/functions which takes a numerical class column and numerical and/or string columns for predictions. So this should do what you need. There are probably some others available in this weka section too which would meet your needs. I hope this offers better predictions for you.

simon.