How to filter rows for each visitor id for specific date range and page name column?

Hi I am trying to create a row filter for each visitor id, a kb article is viewed (kb :s : article : number) then within 1 day (same date or 1 day prior) a support request is raised (file_SR : comfirmation).

The example below shows for one visitor the pages they visited in order of visit number and the SRs they raised. The green rows are the ones I would want to keep as they have the kbs and the type of SR (technical : system management). Also the time between the first kb viewed 22 Nov and the SR raised 23 Nov is 1 day.

The red rows would be excluded because one kb is viewed on 17 Nov, which falls outside the date range from when it is viewed to the SR being raised, and the last 3 rows because the gap between kb and SR is 14 Dec to 23 Dec.

I would like to do this for each of the multiple visitor ids as this example is just one visitor. Some visitors would not view a kb at all and only raise SRs and some visitors would view only kbs without any SRs being raised.

Any advice on how to create a filter for this would be much appreciated.

date visitor id page name visit number SR confirmation number
17/11/2019 100122545660601728_5534802519303637938 kb : s : article : 2148582 53  
22/11/2019 100122545660601728_5534802519303637938 kb : s : article : 2050273 56  
22/11/2019 100122545660601728_5534802519303637938 kb : s : article : 2050273 56  
22/11/2019 100122545660601728_5534802519303637938 kb : s : article : 2109887 56  
23/11/2019 100122545660601728_5534802519303637938 kb : s : article : 2050273 57  
23/11/2019 100122545660601728_5534802519303637938 kb : s : article : 2050273 57  
23/11/2019 100122545660601728_5534802519303637938 my : group : get-support : file_SR : confirmation 57 19083213711
23/11/2019 100122545660601728_5534802519303637938 my : group : get-support : technical : system management 57  
14/12/2019 100122545660601728_5534802519303637938 kb : s : article : 2148582 65  
23/12/2019 100122545660601728_5534802519303637938 my : group : get-support : file_SR : confirmation 66 19090461112
23/12/2019 100122545660601728_5534802519303637938 my : group : get-support : technical : installation/upgrades 66  

hi @Qualm1984,

I tried to build your logic in 2 Loops within one workflow.
Please have a look at the attached workflow and try it out with some more data.
Z_015_row filter.knwf (31.9 KB)

Hope this can be used as a starting point, Greetz, Tommy

Hi @tommy

Thanks for sending across the workflow, I just tested on a larger sheet and got a an error at the Java IF (table node):

WARN Rule-based Row Filter 3:22 Node created an empty data table.
WARN End IF 3:20 Node created an empty data table.
WARN Loop End 3:14 Node created an empty data table.
WARN Row Filter 3:5 Node created an empty data table.
WARN Date&Time Shift 3:6 Node created an empty data table.
WARN Rule-based Row Filter 3:11 Node created an empty data table.
WARN Java Snippet 3:18 Node created an empty data table.
ERROR Java IF (Table) 3:19 Execute failed: Switch index out of range [0: 1]: 2

I’ve attached the sheet i fed in:

example-before.xls (41 KB)

This would be the input:

and this would be resulting output:

Hi, after some minor changes, please have a look at the following WF.
The error was caused because of an empty table as input of the Table row to variable loop start Node.

Z_015_row filter.knwf (38.9 KB)

Now I get exact the same output as you show in your table.
Hope this helps again - step by step :slight_smile:
Tommy

3 Likes

This is great works perfectly! thanks for this!

Hi @tommy using your workflow with a larger dataset, its almost there but im getting a few issues with the output.

For example for this visitor id the input is:


and the output (duplicate SRs in red)

  • In this example the corrrect SRs and KBs are included but they are being duplicated for the same visitor id.

  • There is one SR - 19075985810 - that is being included that should be filtered out as there is no kb viewed prior.

I have a vague idea of how you put this together but still learning :slight_smile:
I have attached the input sheet im using vs the output sheets you can see what i mean (they are initially CSV but cant upload that here so saved them as xls)

Input:

example dataset test2.xls (192.5 KB)

output:

SSL articles and SRs within 24hrs - Knime.xls (53.5 KB)

Thanks,
Rob

1 Like

hi @Qualm1984,

usual problem with an external point of view. I haven’t thought of all possible variants and your rules.
I just edited the workflow, implemented your rules and some SR have now been excluded - hopefully not too many. No the column “visit number” is also used for filtering.
A cross check with your red highlighted output wasn’t done.

Please have a look and check…
Z_015_row filter.knwf (48.2 KB)

Greetz, tommy

Hi @tommy

Ha yeah this is working really well considering youre only just putting this together from looking at it from the outside.

I tested your latest workflow vs previous one, ill run through what the initial problem i saw with that one visitor id as an example:

Original data:

the green row groups outlined in black would be what I would want to keep.

Previous workflow output:

  • the first SR rows (in red - SR 19075985810) without a kb article viewed previously was included and then repeated twice more, i tried to look at the loop what was causing it but unable to determine why that is. this should be excluded which does happen on latest workflow

  • SR 19076120910 also repeats twice, resulting in those 3 sets of rows being duplicated when only one set should remain. This should be included in the output as there is a kb article viewed on within same day as SR.

Latest workflow output:

As you can see this has removed the rows for SR 19076120910

Desired output:

Including SR 19076120910 as mentioned above:

Again, your help on this is massively appreciated.

I have included just the excel with the original visitor example for your convenience. visitor example.xls (37.5 KB)

Thanks,
Rob

Also I shouldve mentioned that since you were using visit number, within the timeframe of 1 day prior or same day, a visitor could have viewed a kb article in the same visit or subsequent visits.

In above example you can see one kb article viewed then SR raised all in visit number 92, and the following example 2 kb articles viewed in visit number 95 followed by raising the SR 2 visits later.

ok, @Qualm1984.

reason for exclusion of SR 19076120910 was that I filtered on visit number < SR visit number. no it’s “<=”.
Thus we have again more SR, but each is unique, which should be fine.
Z_015_row filter.knwf (48.3 KB)

Only issue I can now imagine, is the case that a SR is raised and afterwards a kb article is viewed within the same visit no.

greetz, Tommy

3 Likes

This is working now on a large dataset, many thanks for this @tommy !

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.