Count of customers by day

Hi community!

as an excercise, i wanted to create a report with the count of customers due the start date they sign in. i have no problem with that because is just a group by, but i also tried to count them based on the days between they were logged in and i didn’t know how.

My table is something like the following:
custNumber, Date_in, Date_out.
2,10/1/2017,13/1/2017
3,11/1/2017,13/1/2017
4,13/1/2017,13/1/2017
5,10/1/2017,10/1/2017

my report should show:
date, count of custmer
10/1/2017,3
11/1/2017,3
12/1/2017,1
13/1/2017,1

do you think this is possible?

Thanks and best regards!

To achieve this a few additional steps to your groupBy are needed:

  1. Create the time range of all possible dates: Use a groupBy node on your origin data set. Do not select any grouping column, but use min aggregation function on loginData and max on logoutDate column. Next use a Table Row to Variable node and inject these values as ranges in a Create Date&Time Range node.
  2. Use a Cross Joiner node to join your origin data set with the just created time range data set
  3. Use a Rule Engine to check whether the date from your time range is in between the login and logout date (e.g. $login$ <= $Date$ AND $logout$ >= $Date$ => TRUE)
  4. Use a groupBy node with date as grouping column and choose sum as aggregation function for your prediction column from the Rule Engine.

This should give you the desired output.

Hi @Marten_Pfannenschmidt!
thank you very much for your help, but i think that i am doing something wrong. The workflow i made don’t give me the correct values.

CustomersGroup.zip (283.8 KB)

after all the workflow, if i group by for the new range field is giving me always 299 :frowning_face:

You were almost there. I applied some changes to your workflow that should do the trick for you. As your data set is quite long and a cross join is a really expensive operation, I’ve made another attempt with a chunk loop: This way the time range of every users activity is created and afterwards a GroupBy node is used to count the occurrence of every date. Both subworkflows produce the same result table, but the chunk loop should be faster.

CustomersGroup.knwf (88.6 KB)