Filter rows of data based on looped groupby input

ebarr · October 24, 2023, 11:52pm

Below is a partial sample of data - I have various channels with associated spend, engagement, revenue data that comes out of System A. I want to join to the various media platforms in Systems B, C, D, E… etc. Probably not the cleanest way, but I want to split the rows of data in System A by their channel grouping and have each channel grouping branch out and join to each respective system (B,C,D,E, etc.).

I started by building out each filter using a rule filter. When the rule is satisfied, I push the data out of the top output and can join to the data input of the corresponding media platform (see image below). However, I was wondering if a loop could be used? My thought was a loop start, pass into ‘groupby’ that groups all of the values in the field I want to filter by, then pass into each stream.

This image reflects my file input - then each rule-based row splitter passes the data that is satisfied into a workstream, and the data that doesn’t satisfy moves to the next filter where a new rule is created, and so forth. IN this example, I used a column resorter to simply show that I would have four paths, but in reality, I would have a joiner to join to the different feeds, then concatenate the feeds together into one overarching report.

The data, in its most simple form, looks like the following:

date	channel	metric 1	metric 2	etc.
2023-01-01	seo_google	5	6	7
2023-01-02	seo_google	6	7	8
2023-01-03	seo_google	43	52	62
2023-01-04	seo_google	5	6	7
2023-01-05	seo_google	3	4	5
2023-01-06	seo_google	23	28	34
2023-01-07	seo_other	4	5	6
2023-01-08	seo_other	5	6	7
2023-01-09	seo_other	4	5	6
2023-01-10	seo_other	3	4	5
2023-01-11	seo_other	35	42	50
2023-01-01	social_paid	5	6	7
2023-01-02	social_paid	6	7	8
2023-01-03	social_paid	7	8	10
2023-01-04	social_paid	786	943	1132
2023-01-05	social_paid	4	5	6
2023-01-06	social_paid	4	5	6
2023-01-07	social_paid	3	4	5
2023-01-08	social_paid	35	42	50
2023-01-09	social_paid	5	6	7
2023-01-10	social_paid	6	7	8
2023-01-01	social_organic	7	8	10
2023-01-02	social_organic	786	943	1132
2023-01-03	social_organic	4	5	6
2023-01-04	social_organic	4	5	6
2023-01-05	social_organic	3	4	5
2023-01-06	social_organic	35	42	50
2023-01-07	social_organic	5	6	7
2023-01-08	social_organic	6	7	8
2023-01-01	text_marketing	7	8	10
2023-01-02	text_marketing	786	943	1132
2023-01-03	text_marketing	4	5	6
2023-01-04	text_marketing	3	4	5
2023-01-05	text_marketing	35	42	50
2023-01-06	text_marketing	5	6	7
2023-01-07	text_marketing	6	7	8
2023-01-08	text_marketing	7	8	10

How would I structure a loop (if possible) where seo_google (field value 1) gets filtered and pushed through to workstream 1, seo_other gets pushed through to next work stream, social_paid through the next, etc. so that I can create this with a loop node, groupby, rule filter, and multiple outputs? or is this not feasible?

Thank you!

elsamuel · October 25, 2023, 2:49am

Perhaps you could explore a setup that uses CASE/SWITCH nodes:

I used a placeholder metanode for Processing, but it could be anything you want.

ebarr · October 25, 2023, 3:06pm

this is great - thank you! do you have the workflow by any chance? i’m still fairly new

ebarr · October 25, 2023, 3:09pm

also - instead of rule engine - can a ‘groupby’ be used to feed the downstream in the event I add another channel so the rule engine doesn’t have to be hardcoded and be dynamic into the loop?

elsamuel · October 26, 2023, 12:54am

The workflow is here:

The Rule Engine node is being used here to set an explicit tag (in the form of a variable), which activates the relevant branch of the case switch.

I’m not sure I see how the GroupBy would help things, but it might be possible, depending on what processing you’ll actually be doing.

ebarr · October 26, 2023, 1:21am

Thank you. My thought was a group by could make it dynamic in the event another channel was added so the group by channel upstream could feed the filter

takbb · October 26, 2023, 8:01am

Hi @ebarr , I think I see where you are coming from with the “Group By” idea, but like @elsamuel said, I don’t think it can help you in this case. I’m assuming that you would mean to use the Group By to provide you with a distinct list of channels, so that when another channel is added, it is automatically added to the list.

If that is what you were thinking, then it isn’t going to work because you have different processing paths according to the specific channel, and so you are always going to not only have to pre-know the specific path to be taken for a given channel, but of course you are going to have to modify the workflow to add those paths too, which is the bulk of the effort.

However, if you are wanting to make it slightly simpler on the configuration, then in KNIME 5.1, you could replace the Rule Engine with a Table Creator and Value Lookup:

Then instead of writing the rules out, you simply manually add an additional row to the Table when there is a new channel to be included:

This can work because of the new “hidden” advanced feature of the Value Lookup node, which is to allow wildcard matching. You need to set “String matching” to Wildcard (you can see that other pattern matching options are also available if you prefer)

So why do I call it “hidden”? Well, for some reason that I really do not understand, this panel is only visible if you scroll right to the bottom and click “Show advanced settings”

No idea why that is an “advanced setting”. To me that is the best thing about the Value Lookup node, and sets it apart from the “Joiner” node. I’m all for “discoverability”, but it feels like it was done that way simply to give us something to discuss here on the forum

While I’m on my “soap box”, I’ll add that to me the worst part of that UI design is that even with an advanced option (e.g. wildcards) chosen, when you go back into the configuration, there is not even a hint that it’s been chosen unless you know to open advanced settings again. Hoping that will change in a future release.

For demo, here is @elsamuel 's flow with the Value Lookup.

Case Switch example - with value lookup.knwf (32.6 KB)

(If your question has been answered, please remember to mark @elsamuel 's solution as the solution, as that was where the hard work was done!)

Daniel_Weikert · October 26, 2023, 5:00pm

@takbb it really improved a lot with wildcard , regex, smaller larger one … from the original cell replacer node (except for the new name of course).
Any reason why group loop wasn’t applied after the cases where set? Maybe I overlook sth.?
br

takbb · October 26, 2023, 5:16pm

Hi @Daniel_Weikert,

Certainly I think “Value Lookup” was clearly aimed at appealing to those trying to find the Excel VLookup functionality

Do you mean group loop in place of Table Row to Variable Loop?
e.g.

I confess I hadn’t looked that far ahead, and only tried to find an option for replacing Rule Engine, but on the face of it, I agree that would appear to be more efficient. But I may have overlooked something too

Daniel_Weikert · October 26, 2023, 5:26pm

@takbb
ok thanks for the detailed explanations (as always)!
I was just curious whether it was necessary to iterate row by row (could have been the case and I missed it)

Yes indeed but if you want to go from excel to Knime then why use the same formulas/nodes. As the great apple said. “Think different”

br

system · January 24, 2024, 5:27pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.