Hi!
I have some troubles with dataset I’m currently working with, inside there are a lot of rows with parameters, that I’m interested if, for example:
/cityname/blablabla/blablabla?query=:relevance:allCategories:blablabla:facet-brand:Bosch
/cityname/blablabla/blablabla?query=:relevance:allCategories:blablabla:isSeasonality:true
/cityname/blablabla/catalog?query=:relevance:allCategories:catalog:isNovelty:true
So, I need to extract some parameters from those URLs, like :facet-brand, :isSeasonality and some more.
But the problem is that some url’s have multiple parameters, like this URL:
/cityname/blablabla/blablabla?query=:price-asc:allCategories:blablabla:facet-brand:Weber.Vetonit:facet-brand:Ceresit:facet-brand:Unis:facet-brand:Knauf
Of course, I have other metrics in other columns, like users, sessions, bounce rate, etc.
Is there a way i could extract those parts, and then group by them?
p.s. I tried pattern based aggregation, but the results were kinda strange, and there is no way i could check them, even in Excel
HansS
October 11, 2022, 9:57am
2
Hi @luckyenough
The first step to take is to extract the relevant information from the url’s. Try the Cell Splitter – KNIME Hub node (maybe multiple times) with the appropriate delimiter (e.g. : or facet-brand). If you provide a (sample) dataset, it is much easier for the forum community to help you forward.
gr. Hans
1 Like
Hello @luckyenough and welcome to the KNIME community forum
I assume that you are talking about retrieving data from a statistical server. Is it?
Have you tried to send your parametrized rows into a loop via (1.) ‘Table Row To Variable Loop Start’ node.
Then:
GET Request (etrieve data from web service)
JSON Path (query values from JSON)
Ungroup (ungroup column lists)
Process your desired output (do your transformations …)
Loop End
I am just guessing
BR
PS.- You can pre-classify your Paths with a ‘Logical Indexing Matrix’, for the parameters that you are interested in.
Hello @PLS_KN
I couldn’t avoid to give this a try. It is based in a Logical Indexing methodology. It doesn’t use the Regex Extractor component (I cannot download it from my network today).
[image]
With the mentioned logical indexing matrix you can give this workflow some other LOGICAL uses like: when does a word happens when some other is present before…
This is an example of the excluded words in your final filter:
[image]
I hope you find useful this workflow. BR
3 Likes
facets — sample.xlsx (9.1 KB)
Here’s some sample data, where every parameter I need is contained:
:facet-brand
:hasDiscount
:isNovelty
:outOfStockFlag
:personalPrice
:isSeasonality
:badgeNames
I’ve tried to use cell splitter, but it splits up to 126 rows (due to some URLs length), and it becomes pretty useless, I think, or maybe I can’t find right way to solve this problem
1 Like
Hello!
Yeah, you are correct - I’m retrieving data from Google Analitycs by API, using R scripts, but the problem is on the “last” step, if I could say that
ArjenEX
October 11, 2022, 11:11am
6
A very similar problem was presented last week.
@Daniel_Weikert Issue with this approach is that some URL’s contain more than one parameter. If you write it out in full, you’ll get
[image]
which requires additional processing to see which parameters are actually in the URL.
@Jake120F
I would change it slightly and use (?:&[a-z]+)|(?:\?[a-z]+). This is able to capture all groups within the URL.
[image]
In KNIME, something like this should get you going.
[image]
First, use the Regex Extractor node and use the beforementioned code. Set …
With the RegexExtractor, you can better control the output in terms or rows, creating a list, etc.
I don’t have to time right now to fully work it out but it looks like you might get close with that approach. After the extraction of all parameters you could more easily filter on those that you would like to keep.
3 Likes
Ow, that’s great news, this problem looks like mine, thank you so much!
I don’t know such kind word to say, either, that your solution fits so perfect!
Thank you so much!
3 Likes
system
Closed
October 18, 2022, 11:38am
10
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.