Clustering Items Together For Shipping

Folks I have a problem I can’t conceptualize in KNIME.  In one table, I have a list all products that I know can be shipped together.  So there are two columns with pairable items in each column.

I have a second table listing all the products for a new order.  Each product in the order is a single line with the ITEMID as the lone column. 

The challenge is to determine how many boxes I need for ship those items. If all rows have matching rows together in the first table, I could ship in a single box.  If none match I would need 10 boxes.

Looking for some guidance on how to tackle this problem conceptually in KNIME.  Say the allowed pairing table has three rows:




In my new sales order table, I see 4 lines for the products on this order:





Since A and B are matched they can ship together.  And since B and D are matched, they can ship together.  So by the transitive property A and D can ship together.  So I can ship ITEMA, ITEMB and ITEMD in the same box.  ITEMF must package separately.

Could someone toss me a few concepts of a way to tackel this in KNIME?


I made you a workflow. Might be easier than explaining.

My trick was to use the set operator which can make you intersections of two columns.

The workflow is attached, hope this helps.

Cheers, Iris 

Hi Iris and smcleod,

Using the set operator is a smart idea to solve the given example!

An alternative approach would be to use the network analysis features of KNIME. The advantage is that this solution can also solve more complex examples than the one provided (i.e. with more than two packaging groups).

The key idea is that the Network to Row node has the option "Split-up unconnected components". See the updated workflow attached.

Best regards


These are both excellant ideas.  Thank you.  I'm going to work with each a little bit and see what type of model will get me the greatest accuracy with the lowest computational cost.

Thank you so much.  These aer both great ideas.  Gives me a good direction.  I'm going to experiment with both and see what the computational requirements are compared to the accurancy for my model.