Counting occurrences that fit in certain criteria

robvdo · May 31, 2016, 9:42pm

Hello,

Is there a way to count the number of occurrencies that a certain value compare to values in certain columns?

I have a list o telephone call records with start time and call duration. For each call, I want to know the number of calls that were running concurrently. For this matter, I would count the number of rows where start time is earliear than the start time of the reference call and also where the finish time is later than the start time of this reference call.

Thanks

marco_ghislanzoni · June 1, 2016, 5:26pm

Conceptually this is an interesting problem to address.

I came up with a solution that uses a custom Java distance function. If two phone calls, expressed as the pair [starting time, duration], are somehow overlapping their distance will be 0, otherwise it will be greater than 0 and will correspond to the time between the end of one and the beginning of the other.

By the way, two perfectly adjacent phone calls will also have a 0 distance between them, possibly mistaking them for concurrent. Not sure how likely this case is to happen in real life.

Please see the attached workflow for a working example. To make it function properly with your data set you will likely have to convert your starting time to a long integer first (using something like Unix epoch).

Give it a try with a real dataset, I am curious to know whether it works as expected.

Cheers,
Marco.

concurrent_phone_calls.zip