Make Random decision using probability from a column

Hi all

I have use a logisitic regression predictor to assign a probability value to each case in my dataset. The data set has 42,000 rows.

Each case has it's own unique assigned probability, specific to that case, in a column called "Probability".

I now want to simulate the decision made for each case based on that individual case's probability.

The node I have found for this is the Random Boolean Assigner.  But to use this for each case, it appears that I will have to use a chunk loop to select each row - one at a time - convert the probability column value to a variable, and then use this variable in the variable flow (injecting this into the cg_probability cell in the Random Boolean Assigner). 

The approach works, but it's very computationally expensive, taking several minutes to do the 42k cases - I will have to 36 of these minimum.

Is there any node or method where a decision can be made on each case referencing the Probability column for each case, but not having to do this with a loop? Is there a node I have perhaps missed?!

Many thanks for your help with this - could save a considerable amount of time!!



This is how I would do it in one pass.

1) Let's assume that your Probability column is defined as the probability of X having outcome 1 (TRUE), so that P(Xi=1) = pi. Obviously P(Xi=0) = 1 - pi.

2) Use the Random Number Assigner node to add a column to your table with quasi-random draws from a Uniform distribution with values in [0,1]. You will get on each row a value qi with 0 <= qi <= 1. Leave the dependency column set to None. Name this column P(Uniform).

3) Use the Math Formula node to add a third column with the simulated outcome for your binomial distribution. The outcome will be 1 (TRUE) whenever qi <= pi, 0 (FALSE) otherwise. You can use this formula:

if($P(Uniform)$ <= $Probability$, 1, 0)

This is the equivalent of going row by row, assigning to each row i an outcome of 1 (TRUE) with probability pi and 0 (FALSE) with probability 1 - pi.

Hope this helps.


Thanks very much Marco!  This is exactly what I needed. 

It will probably save about 3 hours of waiting around today!