Weak Label Model Predictor - 'Execute failed: The probabilities do not sum up to 1. Consider setting a proper epsilon.'

Hi,
performing some experiments with the new ‘Weak Supervision’ nodes I encountered a strange Error of the
Weak Label Model Predictor - I got: 'Execute failed: The probabilities do not sum up to 1. Consider setting a proper epsilon. ( Stack Trace see below)
As there are no corresponding settings available in this node … what to do?

I’m running KNIME 4.1 (with all extensions) on a WIN 10 Laptop with 32GB (24GB dedicated to KNIME). My WF is as follows:

  1. I’m using the creditcard.csv data from your Example ‘Keras Autoencoder for Fraud Detection Training’

  2. I implemented 50 Weak Labeling Functions - each adding a column (values 1, 0, ?) to the data (resulting in 80 columns totally)

  3. Feeding this table into the ‘Weak Label Model Learner’ works without problems

  4. But feeding the model and the data into the ‘Weak Label Model Predictor’ produces the error described above!

How to deal with this issue?
Thx
Erich

2020-01-18 18:18:51,843 : ERROR : KNIME-Worker-17-Weak Label Model Predictor 0:3 : : Node : Weak Label Model Predictor : 0:3 : Execute failed: The probabilities do not sum up to 1. Consider setting a proper epsilon.
java.lang.IllegalArgumentException: The probabilities do not sum up to 1. Consider setting a proper epsilon.
at org.knime.core.node.util.CheckUtils.checkArgument(CheckUtils.java:255)
at org.knime.core.node.util.CheckUtils.checkArgument(CheckUtils.java:116)
at org.knime.core.data.probability.nominal.NominalDistributionCellFactory.createCell(NominalDistributionCellFactory.java:119)
at org.knime.wsl.weaklabelmodel.predictor.WeakLabelModelPredictor.createCells(WeakLabelModelPredictor.java:143)
at org.knime.wsl.weaklabelmodel.predictor.WeakLabelModelPredictor.access$1(WeakLabelModelPredictor.java:137)
at org.knime.wsl.weaklabelmodel.predictor.WeakLabelModelPredictor$1.getCells(WeakLabelModelPredictor.java:132)
at org.knime.core.data.container.RearrangeColumnsTable.calcNewCellsForRow(RearrangeColumnsTable.java:541)
at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(RearrangeColumnsTable.java:769)
at org.knime.core.data.container.RearrangeColumnsTable$ConcurrentNewColCalculator.compute(RearrangeColumnsTable.java:1)
at org.knime.core.util.MultiThreadWorker$ComputationTask$1.call(MultiThreadWorker.java:442)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadUtils$RunnableWithContextImpl.runWithContext(ThreadUtils.java:334)
at org.knime.core.util.ThreadUtils$RunnableWithContext.run(ThreadUtils.java:210)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:123)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:246)

Hello @Erich_Gstrein,

unfortunately, there is nothing you can do right now except trying a different number of labeling functions.
Thanks to your stack trace, I was able to identify the issue but in order to ensure that the fix indeed fixes the problem, having the workflow would be a great help. Is there any chance that you could share it with me?

Sorry for the trouble and best regards,

Adrian

2 Likes

Hi Adrian,
please find attached (a simplified version of) my WF for reproducing the error. You’ll find some explanations inside.

Best
Erich
weakSupervisionBug.knwf (294.1 KB)

5 Likes

Thank you Erich,
I’ll have a look as soon as possible and let you know what I find.

Cheers,

Adrian

2 Likes

Hello Erich,

I did some more digging and took a closer look at the labeling functions.
The problems might be related to the extreme imbalance of the labeling functions as they are all predicting 0 for all rows except two.
From this data the learner isn’t able to derive a sensible model.
This is not to say that the observed behavior is correct because in any event a node shouldn’t fail with an incoherent error message.
However, you’ll probably have to still investigate the data once we fixed the issue because even if the model doesn’t fail with an exception any more, it still won’t be able to learn a lot from the data.

Cheers,

Adrian

1 Like

Hi Adrian,
thx for your quick replies and your suggestions.

Background: I was experimenting, if weak supervision can be applied to very unbalanced data sets (e.g. your credit card data set) and outperforming e.g. classical approaches such as RandomForrests (RF). To do so,

1.I looped 50 times over a training sample - with 200.000 records containing 300 frauds - learning 50 RF models on a more-balanced basis of all frauds + 3000 randomly picked NON frauds.
2. In a second loop I applied these 50 models again on the data set (200k+300) but setting all predictions with confidence < 0.95 to NULL with the idea only using those labels where the models are very sure about.
3. Having now this label Matrix (~200kx50) - with labels ‘0’, ‘1’, NULL - I used the WeakSupervision Learner and Predictor -> resulting in the reported error when using more than 16 labler.

Best
Erich

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.