Predicting multiple columns that sum up to 100%

bernardors · April 26, 2017, 3:19pm

Hi Folks, I am struggling to get a regression to work with Knime.

I have a certain data table that contains 10 columns of input data (some are text, some are numbers) and the expected output are other 6 columns.

The catch is that these 6 columns have to to sum up to 100% for each row.

I am basically allocating a certain spend across 6 different buckets based on 10 input variables. What I want to achieve is a model that can predict this percentage split of new data items based on training on the dataset that I have with historical allocations.

Can you point me in the right direction? All the examples I could find only predict one column...and they don't mix text and numbers in the input columns.

Any help is appreciated !

Iris · May 1, 2017, 11:18am

Hi,

here you can find an example workflow where I implemented a multitarget prediction.

https://www.knime.org/nodeguide/control-structures/loops/looping-for-multiple-target-prediction

Kindest regards, Iris

bernardors · May 3, 2017, 1:18pm

Thanks, iris.

Any idea on how I could ensure that the predicted rows sum 100%? This is a premise of the model.

Iris · May 3, 2017, 1:46pm

Not really a good one...

You could set the last value to 100-the others...

bernardors · May 3, 2017, 2:36pm

Ok, that's not good news :-(

One other thing... one of the columns in my regression (and classifications, for that matter) are strings, free text strings. I would like to use the information in the text to help my regression performance, alongside the other numerical values.

For instance: my brain know that there's a high chance that the class for a given row will be "A" if the person wrote "ipsum loren" somewhere in the text.

Is there any way that my trees and neural networks could try to figure that out too? (while still using the other numeric inputs)