Grouping column by cell, containing specific text

Hi!

I have some troubles with dataset I’m currently working with, inside there are a lot of rows with parameters, that I’m interested if, for example:

/cityname/blablabla/blablabla?query=:relevance:allCategories:blablabla:facet-brand:Bosch
/cityname/blablabla/blablabla?query=:relevance:allCategories:blablabla:isSeasonality:true
/cityname/blablabla/catalog?query=:relevance:allCategories:catalog:isNovelty:true

So, I need to extract some parameters from those URLs, like :facet-brand, :isSeasonality and some more.

But the problem is that some url’s have multiple parameters, like this URL:

/cityname/blablabla/blablabla?query=:price-asc:allCategories:blablabla:facet-brand:Weber.Vetonit:facet-brand:Ceresit:facet-brand:Unis:facet-brand:Knauf

Of course, I have other metrics in other columns, like users, sessions, bounce rate, etc.

Is there a way i could extract those parts, and then group by them?

p.s. I tried pattern based aggregation, but the results were kinda strange, and there is no way i could check them, even in Excel

Hi @luckyenough

The first step to take is to extract the relevant information from the url’s. Try the Cell Splitter – KNIME Hub node (maybe multiple times) with the appropriate delimiter (e.g. : or facet-brand). If you provide a (sample) dataset, it is much easier for the forum community to help you forward.

gr. Hans

1 Like

Hello @luckyenough and welcome to the KNIME community forum

I assume that you are talking about retrieving data from a statistical server. Is it?

Have you tried to send your parametrized rows into a loop via (1.) ‘Table Row To Variable Loop Start’ node.

Then:

  1. GET Request (etrieve data from web service)
  2. JSON Path (query values from JSON)
  3. Ungroup (ungroup column lists)
  4. Process your desired output (do your transformations …)
  5. Loop End

I am just guessing :thinking:

BR

PS.- You can pre-classify your Paths with a ‘Logical Indexing Matrix’, for the parameters that you are interested in.

3 Likes

facets — sample.xlsx (9.1 KB)

Here’s some sample data, where every parameter I need is contained:

:facet-brand
:hasDiscount
:isNovelty
:outOfStockFlag
:personalPrice
:isSeasonality
:badgeNames

I’ve tried to use cell splitter, but it splits up to 126 rows (due to some URLs length), and it becomes pretty useless, I think, or maybe I can’t find right way to solve this problem :slight_smile:

1 Like

Hello!

Yeah, you are correct - I’m retrieving data from Google Analitycs by API, using R scripts, but the problem is on the “last” step, if I could say that :slight_smile:

A very similar problem was presented last week.

With the RegexExtractor, you can better control the output in terms or rows, creating a list, etc.
I don’t have to time right now to fully work it out but it looks like you might get close with that approach. After the extraction of all parameters you could more easily filter on those that you would like to keep.

3 Likes

Ow, that’s great news, this problem looks like mine, thank you so much!

Hi @luckyenough
This is the idea:

20221011_grouping_column_using _wildcards.knwf (28.7 KB)

BR

3 Likes


I don’t know such kind word to say, either, that your solution fits so perfect!

Thank you so much!

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.