I am trying to replicate a sumifs statement from excel where I match a distinct query parameter from a table against a url column in another and them sum the pageviews.
I have the tables in knime but no primary key so I am not sure how to replicate this. In excel I would use a sum if with a wildcard either side to check the page column
and would expect an output like this
any advice appreciated as I am new to Knime.
You could use string manipulation with substring formula or regex etract expression to extract the word after the ? and then join them based on this newly created column
regex would be sth like this
but I think you need to escape the backlashes in string manipulation so try double backslashes
Hope this solution helps (see attached file as well).
- Use ‘Cross Joiner’ to join each row of the top table with each row of the bottom table.
- Use ‘Column Expressions’ to check if a page contains the parameter (true or false)
- Use ‘Rule-based Row Splitter’ where you only keep rows that are ‘true’
- Use ‘GroupBy’ to sum over the page views for each parameter
wildcard_sumifs_two_tables.knwf (14.4 KB)
@Daniel_Weikert Issue with this approach is that some URL’s contain more than one parameter. If you write it out in full, you’ll get
which requires additional processing to see which parameters are actually in the URL.
I would change it slightly and use
(?:&[a-z]+)|(?:\?[a-z]+). This is able to capture all groups within the URL.
In KNIME, something like this should get you going.
First, use the Regex Extractor node and use the beforementioned code. Set the output to Rows.
In case you don’t have this node, it’s available here: Regex Extractor — NodePit
Next, apply a small clean-up to remove the HTML characters.
Finally, use a groupBy node where by the Full Match is the grouping column and the Page Views is the Manual Aggregation of type Sum.
If you want to control the list of parameters selectively, a Joiner node with an inner join after the parameter extraction would be sufficient.
Wildcard sum ifs between 2 tables.knwf (21.5 KB)
Hope this provides some inspiration!
@ArjenEX all good points that you raised, and very good solution.
However, keep in mind that query strings are case sensitive and are not always lower case. Your regex is limited to lower case query strings only.
But for this situation, it should be ok, since the parameter are pre-defined (colour, shape, etc, which are all lowercase) - and perhaps that is why you did it that way on purpose
Good spot! Luckily quickly solvable with
@ArjenEX Thank you so much for the response and explaining it so clearly.
That was much easier than the way I was trying round the houses with the mapping back between the two tables.
I have tweaked it for the real dataset and its behaving as expected
You are right,
sorry for the confusion. My eyes probably were too tired to spot that.
Great that you helped with your solution
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.