I have historical, hourly sales data for a particular product from 2+ years. The quantity sold per hour is mostly 0, some hours have 1 sale/h, a few have 2 per hour and and very few 2+. I binned the sales into 5 bins (]∞...0.0] / ]0.0 ... 1.0] / ]1.0 ... 2.0] / ]2.0 ... 3.0] / ]3.0 ... ∞[)

I also have a number of external factors (e.g. weather, holidays) that we believe to have an impact on those, sales. Together with some additional features (e.g. month, week, weekday, hour, etc.) these are my feature columns. I binned/normalized all these features.

I then use the DL4J FF Predictor (Regression) to tell me if a certain hour within the test set is likely to be 0 sales, 1 sale, 2 sale, etc.

BUT... how do I interpret the outcome of the regression? Is it a probability? If yes, how comes the values are negative (especially for 2 sale, 3 sale)? Of course the probability is lower but I always thought that there are no negative probabilities?

Is there something wrong with the architecture of my neural net? I tried using the preconfigured DeepMLP and SimpleMLP with various learning rates but the problem is still there.

the outcome of the regression Predictor node should be in the same domain as the outcome you used for learning. In your case I'm assuming that would be the quantity sold per hour. However, there is no constraint that prevents the net from outputting something negative during testing. Does it output something negative for most of the examples or just for some? Unfortunately, there is no easy answer. It could be that you need to train your network for a longer time, your hyperparameters are off, or that the distribution of your test set is too different from the distribution of your training set, naming only some possible problems. Anyway, if you only want to train on the five bins you described (and do not need values in between) then your problem sounds more like a classification than a regression. You could try to use the classification nodes instead. Then you will never get different values other than those you trained on.

Thanks for your advice! I was actually expecting the output to be in the same domain as the input but somehow this seems not to be the case.

Below on the left you can see the input (first 5 columns) and on the right the output (predictor columns). One row represents one hour.

If I sum up the predictor columns I get a value close to 1 so I was thinking it might be a probability. But then how can it be that certain probabilities are negative? You see what I mean...?

0 sales/h

1 sale/h

2 sales/h

3 sales/h

>3 sales/h

Predictor 0 sales/h

Predictor 1 sale/h

Predictor 2 sales/h

Predictor 3 sales/h

Predictor >3 sales/h

1

0

0

0

0

0.846124053

0.160469145

-0.024264246

0.005301833

0.01336658

1

0

0

0

0

0.995525062

0.014840186

-0.02526094

-0.029157221

0.011653632

1

0

0

0

0

1.000596523

0.006015211

0.009776607

-0.004636049

-0.004107803

1

0

0

0

0

0.986456096

0.01387471

0.008661777

-0.011522293

-0.007086128

1

0

0

0

0

1.012033343

0.001765192

0.004888833

0.00268352

5.87E-05

1

0

0

0

0

0.574314117

0.405344278

-0.027998492

0.006599367

0.013003767

1

0

0

0

0

0.992162645

0.013400078

0.007141069

0.001610398

-0.007180482

1

0

0

0

0

0.979611695

0.023313999

-0.015300542

-0.023554146

0.004515469

0

1

0

0

0

0.971259773

0.023324847

0.005195856

-0.017358005

-0.009769022

1

0

0

0

0

0.995547235

0.013174474

-4.79E-04

-3.43E-04

-0.002910823

0

1

0

0

0

1.008205771

0.001502842

0.006736889

-0.00775218

1.38E-04

1

0

0

0

0

0.062569618

1.004780293

-0.053878814

-0.004566014

-0.024834469

1

0

0

0

0

0.625284314

0.378435493

-0.003024131

-0.007757306

-0.007739067

1

0

0

0

0

0.999688089

0.01005739

-0.007861093

-0.013839841

0.003765702

1

0

0

0

0

0.989498198

0.01387009

9.06E-04

-0.013652802

-0.002847731

1

0

0

0

0

1.008190989

-1.36E-04

-6.19E-04

-0.026842415

0.006504208

1

0

0

0

0

0.979489982

-0.021096647

0.027010486

0.008307576

-0.02417922

1

0

0

0

0

0.317614675

0.719134331

-0.034152851

-0.011794329

-0.016941816

1

0

0

0

0

0.993339479

0.009681493

0.013650194

-0.003351748

-0.008182466

1

0

0

0

0

1.008823276

-0.00116697

0.00824292

-0.016745269

0.001544356

0

1

0

0

0

1.013825417

-0.003275812

0.014645681

-0.002040207

-0.001960754

I'll try to follow your suggestion and use the classification learner. The proble there is that it only allows one target column whereas I have 5. Nevertheless I will transform them and give it a try.

Just in case anybody else has another idea why it didn't work in this case or how I shall interpret the output of the regression learner I'd be more than happy!

as I said before there is no constraint that prevents the Predictor to output negative values. If your training data is around zero it tries to fit to that value as close as possible (depending on the error function you used, I assume 'mean squared error'). Therefore, this might be something slightly negative or slightly positive around zero which is the case from the table you posted. The outputs are no probabilities. The fact that it sums to one is only by chance because the ground truth sums to one in your case.