Hi @armingrudd

I have one, hopefully final question to your method.

you are binning the 500 rows into 10 bins and as you mentions gets a strong negativ and positive correlation.

in our full set we have around 3500 rows and when binning these into bins we tried several different number of bins and our results are very different. When trying different binning the results where as following:

72 bins

0,42 positive (date&time diff(binned) - canceled).

-0,57 negative correlation (date&time diff(binned) - not canceled)

40 bins:

0,56 positive (date&time diff(binned) - canceled).

-0,58 negative correlation (date&time diff(binned) - not canceled)

10 bins:

0,64 positive (date&time diff(binned) - canceled).

-0,61 negative correlation (date&time diff(binned) - not canceled)

5 bins:

0,78 positive (date&time diff(binned) - canceled).

-0,77 negative correlation (date&time diff(binned) - not canceled)

Different number of bins obviously affect the outcome. and my question to you is: how do you/we justify the number of chosen bins? there seems to be many different rule-systems for choosing. What would be the sensible answer when asked why we choose the number of bins that showed the best correlation or vise versa. if we view this critically…

If this type of question is not up you alley I fully understand, just wanted to try

PS. just to be totally clear

a Positive correlation between Time difference and Cancelled bookings shows that when times closes in to zero the cancelations decreases? (since the positive correlation show that time and number of cancellation follows one another)

A negative correlation between Time difference and not Cancelled shows us that there are more bookings as we get closer to 0 and less as times goes on? (since the negative correlation show that time and not cancelled, increases while the other decreases and vice versa)