For each row, count number of previous occurances

This should be simple but I can not seem to get the right outcome by myself so hopefully someone one can point me in the right direction.

I have a dataset which is composed of events, with each event having a date and a person attending. I would like to count the number of times in the previous 12 months a person has attended any other events.

Event Ref | Date | PersonRef
1 | 01/03/2019 | A1
2 | 01/04/2019 | A1
3 | 01/04/2019 | B2
4 | 01/04/2019 | C1

So ideally I would like and end column stating 0s expect for the second line which would have 1. I need to do this across ~1million rows.

Hi @StevenFrancis

See this post about a smilair problem Repeating a Moving Aggregation (What are the options).
gr. Hans

1 Like

@HansS
Thanks for that pointer - it part does what I need, but what I had failed to mention is that my dataset goes back further than a year so I have ones that should not be counted, but as I need to check back for each entry I can not trim it down as I need to get the repeat attendance within a year for the events (e.i. how many events in a set period were repeated attendances based on the previous 12 months).

If I was doing this in Excel I would do a countifs(EventRef, DateCo>=Date-365,DateCo<Date,PersonRefCol=PersonRef) but over my dataset that would kill Excel and it is only part of my analysis.

Hi there @StevenFrancis,

it would help if you can provide a larger input file with more rows that would cover all cases with expected result. Maybe Excel file?

Br,
Ivan

2 Likes

@ipazin - I have drawn a random sample of 1000 lines - there are defiantly some in here that “break the rule” of a previous visit within 12 months of the first by the same person. The basic business logic here is to identify when there is waste in the processes by having to go back to redo work within a window. if that happens I want to see if it is with the same people and look at the quality of the work (based on other data).

Event Sample.xls (907.5 KB)

1 Like

I find it hard to conceive of an efficient way to do this without using a scripting node (e.g. Java Snippet).

Java Snippet way:
First sort table by Date (before Java Snippet).
In Java Snippet: Declare a global Map variable where key = Person and value = list of Dates
In body of Java Snippet look up value in map of previous dates for current row’s Person and iterate through them and count number within your window.
Remove any dates from current Person that are earlier than your window (to save space).
Add current row’s Date to Map (under current Person key).

If you don’t want to use a Java Snippet node, you could implement this (less efficiently) using GroupBy and Joiner nodes but you’ll still have to do a little programming (e.g. using a Column Expressions node). Examples of both approaches are given in the attached workflow (I didn’t finish the Column Expressions node but you should get the idea). Sorry I didn’t use your input table, but instead I created a new input table using the Create Date and Time Range and Random Label Assigner nodes.

Hope this helps
-Don


count prev occurances within year.knwf (16.9 KB)

2 Likes

Thanks for the feedback. I ended up with a solution that works for me but is possibly not the most elegant way - it came while making boxes for board game components so the totally different task I think ley my brain come up with a solution.

What I have done is taken the current months data out and then joined, based on the Person Ref. I have then calculated the difference between the event dates and if this is between 1 and 365 put in a 1, else 0. I then groupby the Event Ref and sum the 0/1s to count how many previous events in the last 12 months - I can join this back up with my original data. It works, and is quick enough for my needs currently so will try and keep things simple for myself for now.

Thanks for the feedback and ideas.

2 Likes

Hi @StevenFrancis,

How about modifying the solution suggested by @HansS which gives you this:

count_event_staff.knwf (1.1 MB)

:blush:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.