I am trying to replicate a sumifs statement from excel where I match a distinct query parameter from a table against a url column in another and them sum the pageviews.
I have the tables in knime but no primary key so I am not sure how to replicate this. In excel I would use a sum if with a wildcard either side to check the page column
You could use string manipulation with substring formula or regex etract expression to extract the word after the ? and then join them based on this newly created column
regex would be sth like this
but I think you need to escape the backlashes in string manipulation so try double backslashes
br
@Daniel_Weikert Issue with this approach is that some URL’s contain more than one parameter. If you write it out in full, you’ll get
which requires additional processing to see which parameters are actually in the URL.
@Jake120F
I would change it slightly and use (?:&[a-z]+)|(?:\?[a-z]+). This is able to capture all groups within the URL.
@ArjenEX all good points that you raised, and very good solution.
However, keep in mind that query strings are case sensitive and are not always lower case. Your regex is limited to lower case query strings only.
But for this situation, it should be ok, since the parameter are pre-defined (colour, shape, etc, which are all lowercase) - and perhaps that is why you did it that way on purpose
@ArjenEX Thank you so much for the response and explaining it so clearly.
That was much easier than the way I was trying round the houses with the mapping back between the two tables.
I have tweaked it for the real dataset and its behaving as expected