Divide rows into 7 even groups based on sum of values in a column

MapperKnime · October 31, 2019, 9:01pm

We have a list of work items for staff that have a field with the number of minutes it should take to work each item, we would like to divide the workload by 7 to give each person an equal number of minutes to work. Attached is a jpg of the table, The UNIT_TOTAL column is the number of minutes for each line (which represents one work item) we need to assign 1/7 of the sum of “UNIT_TOTAL” to each staff member so they have equal work load. Could you tell me how we might accomplish that?

Thank you!

mwiegand · October 31, 2019, 10:00pm

Hi @MapperKnime,

ahh, a mathematical problem. I love crunching numbers …

I’ve done a quick test and must say it is a very interesting challenge. I try to formulate it first as abstract as possible and a possible workflow.

Challenge
You’d like to equally distribute something but not have equal amounts of everything. So it is an approximation challenge.

Optimal scenario
Though, we first assume, the best scenario: Sum of minutes and count of work items can be perfectly divided by number of employees. Then doing simple math helps as you might imagine.

Real live
Now let’s assume a more real world scenario. Which it is actually. We build the sum of all minutes and divide it by the amount of employees. That gives us the best possible distribution, the point of approximation.

Now that we have a threshold to get as close to as possible we want to try to distribute each work item only once. We should start with the big ones first, that enables us to better match to the approximation point. So a recursive loop is required or some fancy row loop with reading & writing table as a cache.

Here comes the next challenge. Assuming you have three employees but only one work item of seven minutes, then you’d like to assign more work packages to the other two until they too have seven minutes. So the list of work packages, sorted descending by minutes required, is assigned to each to an employee until the sum of assigned minutes is equal or greater the amount currently in distribution based on the first assignment.

Here again, a new challenge emerged. You’ve got three employees, one work page of, the next of five and three minutes and a few more. The first gets seven minutes assigned, the second would get eighth. So after each assignment round the list of employees has to been resorted descending by the sum of assigned minutes and the difference to the maximum must been resolved first.

In theory this should evenly distribute the minutes until all work packages have been assigned. Though, it’s very complex, but approximation / or optimizations problems aren’t easy in the first place.

Would be keen to know if others have a better idea.

Cheers
Mike

mwiegand · November 2, 2019, 3:38pm

Hi @MapperKnime,

alrighty, after playing around with various kind of approaches how to work with the data and different ones to assign the work minutes as evenly distributed as possible, here’s one nice attempt.

It does not use recursive loops as one, like I did, will face chicken-egg-problems. It does not use some fancy clustering or statistical methods making it difficult to comprehend or adjust. It’s linear, easy and elegant …

Work packages

Employee list before / after

https://hub.knime.com/mw/spaces/Public/latest/Knime%20Forum%20Support/Equally%20distribute%20work%20package%20minutes%20by%20number%20of%20employees

Let me know what you think …

Happy Kniming
Mike

HansS · November 2, 2019, 5:00pm

Hi @MapperKnime, Mike

Hi

This workflow divide_rows.knwf (138.3 KB) runs multiple times (10x in this example). Every run it divides the workload in 7 parts. In every loop the data is shuffled and the standard deviation is calculated. So after 10 loops the wf selects the run with the smallest standard deviation as the best solution. The result is a table with the workload sorted by batch. And there is a GroupBy to examine the total time per batch. Hope this helps.

gr. Hans

mwiegand · November 2, 2019, 6:40pm

Hi Hans,

I had a new idea based on your workflow @HansS. For each iteration the standard deviation could be used to pick work packages that are as close together as possible approaching an an equal distribution.

In addition it would be required to assign the work packages so the employee with the most minutes to work on already gets the ticket with least work minutes.

Edit: Aaaand here come the 2nd approach … but no statistics involved. Just picking the work packages by the amount of employees. Work packages are assign in descending order of work package minutes while employees are sort ascending.

This ensures the employee with the least amount of minutes gets the work package with the most amount of minutes.

Kind regards
Mike

MapperKnime · November 3, 2019, 1:08pm

Thank you so much Mike, I will give this a try, and I appreciate the input, this has several nodes I’ve never used before so I’m learning a lot of useful tools!

MapperKnime · November 3, 2019, 1:10pm

Thank you so much Hans, this is great! This is perfect and I learned nodes I had never used before so I appreciate so much the help with this project and future projects as well.

mwiegand · November 3, 2019, 1:10pm

You are welcome @MapperKnime. Everything is documented. Have fun playing with it! If the last approach is a feasible solution for you I’d welcome if you mark it as a solution.

Cheers
Mike

system · May 4, 2020, 1:10am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.