I need to perform a complex transformation of this data:
We have 3 column: ID of the person, Date, Result. It is data related to Covid Swabs test, with the ID of the person taking the test, the date and the result.
This Data should be transformed like this:
So to aggregate for each ID, and then perform calculations with dates and counts.
Could you help me please?
I think the easiest way is with a multi-step approach using the group by node. It looks like if you group by ID and “Esito” (positive vs negative) first, then you will be able to calculate things like last date of a positive test. Then you can do a second group by on just the ID and calculate things like “first date of test” which include both positive and negative.
Those calcs look to be easily done in the GroupBy node using Count, Min, Max. “Days in between” is the only thing that will require a different approach. Use “Date Time Difference” for that one with the granularity set to days.
thank you for your quick answer.
Two issues arise from your solution.
- I have multiple groupby that needs to be merged somehow:
moreover, some groupby have multiple rows for each ID as they split negative and positive
- With groupby there is no way to count only Positive or Negative, just the general count.
Just group by ID and Pos / Negative on round 1. Then when you do your counts and calcs it will separate the 2 for you. You should be able to drop the second GroupBy (ID only) downstream of the first. You can always join them together by ID if you like.
You can do the same thing with a pivot for the first step and a group by for the second step. If you want to upload a little bit of the sample data that I can show you how it might work.
I am sorry but I am not able to make it work. The second groupby loses the first.
So, if I understood correctly:
Step 1: Groupby ID and Result. This gives me a table with as many rows as the ID had a result.
Step 2. Another GroupBy (following the previous), where I do calculation by dropping a level.
However, I lose the division between results. So the next groupby table will not divide anymore between Negative and Positive
At first glance I would assume duplicating the ID column first and then using 1 groupby node where you group by the relevant fields with count, max,… could work
I found a solution. Is not easy or elegant.
I will post it in the next days.
Sample Workflow.knwf (2.7 MB)
Here is a simple basic setup that you should be able to customize to get to your results. I did a pivot for the ID / Positive vs Negative calcs and a Group By for the combined total calcs, Joined them by ID and dropped a Date Time Difference afterwards to show how to calculate the difference. I just entered some quick dummy data on mine so it won’t match your table or column names, but the process should work for you.
Yes it works.
It is a similar solution to mine.
Thank you so much for your help!
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.