Is my groupby with Knime nodes incorrect or is the joiner setup incorrect? Do I need to do smaller successive groupbys joining each with the original table?
Well it’s not unique. After the group, the calculations are done on the groupings (summarize) and then the results are appended to all weeks of the same number. Like all week_4 would get the same max date, calculations etc.
actual_earing ← retention %>%
group_by(week_instal) %>% ## group retention by week_instal
mutate(max_date=max(cohort_day)) %>% ## add a column max_date which is the max value of cohort_day for each of the groups
filter(max_date==cohort_day) %>% ## take only those rows where max_date = cohort_day
group_by(week_instal,max_date) %>% ## Group by week_instal, then by max_date
summarise(max_days=max(day_snc_install)) %>% ## find the max of day_snc_install and set for max_days for each of the groups
ungroup() %>%
inner_join(retention) %>% ## join the original retention table with the results. Wil join on week_instal
filter(ifelse(week_instal==max(week_instal),day_snc_install<6 ,day_snc_install<=max_days)) %>%
filter(!(app_id==‘Android’ & week_instal<4)) %>%
group_by( day_snc_install,
media_source,
cohort_day,
week_instal,
name,
app_id) %>%
summarise(tot_day_earn=sum(value),
users=sum(users),
per_user=(tot_day_earn/users))
After the
summarise(max_days=max(day_snc_install))
it looks like:
That is then joined with the original table “retention”