Row filter based on multi column

Hi all -

I am trying to do some analytics on shopify data. I looked for some workflows but didnt see any so as I complete some happy to share when done (calculating Life time value, analytics on shopper behaviors etc).

Right now I am trying to do some data clean up and have been researching and playing around with but cant figure out how to filter the following scenario. I have a snippet of an export below - how would I do the following:

I want to filter out all the returns from the orders to a separate workflow. So first I need to group by Order ID or sort by order ID - as that is the unique identifier to the order. Then if Sale Kind has a Return I want to select it be part of the separate workflow as that indicates the return. I think the answer is some sort of Loop but I have not been able to get it right.

I appreciate any help in advance!

All the best,
Brendan

Hi @BSFL89

Welcome to the KNIME community!

Just to be clear, if an order has a return, do you want the entire order to be filtered out (the 4 rows in your example) or only the rows where the sale kind is return?

Thanks for the welcome! In my example I would want to filter our all 4 rows to a separate workflow.

I think I got the loop to work, but trying to get the column expressions coding right. I want to test if the column sale kind has a value of “return” for the given order ID in the loop, if it does then append a new column Sale Type to say Return for each row. Then I can easily filter it. But it seems like this operates on each row individually.

Hi @BSFL89

You can approach this in multiple ways, something like this should get you going.

Tip: try to include an Excel sheet, csv file or a workflow in your post next time. People are more inclined to help you when if you provide it so that we don’t have to manually re-write that table into KNIME based on the screenshot :wink:

Assuming I have one order with a return (#1) and one without (#2). If you then use the groupBy node with the grouping column being the OrderID and the aggregation set to unique concatenate for Sale kind, you can quickly evaluate which OrderID’s have a return row in them.

Since you are looking for the return ones, you can find those with a Row Filter on *return* (don’t forget the checkbox for contains wild cards)

If you then apply a left outer join to the original table based on the OrderID., this information is now associated to each row of the same OrderID. (In this case, 1 is returned and 2 is not)

With a row splitter that excludes null values, you will have the returned orders in output port 0 and the not-returned orders in port 1.

Port 1
image

See WF:
Row filter based on multi column.knwf (28.3 KB)

Hope this provides some inspiration!

2 Likes

That seems to have worked! Thank you so much and noted, next time I will post the excel!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.