Confusion on integers, numbers and strings

I am very new in Knime and I am still learning most of the things. I am going through the model designed by Rosaria et al (KDD Analysis on All Data) to learn how to design an excellent model. However, I have challenges when it comes to some nodes ie Evaluating all predictions (combine) 3:1 Metanode - when I try to execute the model, I get the following information “Invalid settings: Please select at least one aggregation method”. The problem is that from their model the GroupBy nodes are not there. What should I do?

On a different note, if I want to predict good-bads clients in credit scoring, how should I treat the following variables (number of accounts, percentages, age, property square metres, monetary balance, ratios, number of properties, gender, marital status). I understand that KNIME automatically assigns D most of the variables and some are strings. Integers I had to assign on my own. What are the effects of incorrectly classifying the variable?

Thank you very much although I know my questions are too general.

Hello @wilbert_1 -

I downloaded the KDD Analysis on All Data workflow from the EXAMPLES server and executed it, but it runs fine for me - I was not able to reproduce your problem with invalid settings. Are you still having trouble?

As to your other question, generally you should treat the variables in a practical way, unless you have a reason not to. By that I mean, counting numbers should generally be integers, monetary and continuous variables should be double, etc. You want to avoid assigning double types when not needed, as they require more memory.

There are exceptions of course - for example, sometimes you might want to treat a number as a string if you intend to do a specific type of binning on it, or perhaps you want to use it as a discrete classifier. But in most cases, if you assign variables in a sensible way you’ll be fine.

1 Like

At combine metanode that is where I do have a problem. To be specific, all the four used models are invalid.

.

Please help. I decided to use a different dataset.

Thank you very much.


I am pasting the pictures.

I suspect using a different dataset is the cause of your problem. Without that dataset, it would be difficult for me to determine the exact cause of the failure. Generally, our example workflows are designed to work with the data supplied, and will not necessarily be extendable to other datasets without adjusting the workflows themselves.