I need to perform a complex transformation of this data:
We have 3 column: ID of the person, Date, Result. It is data related to Covid Swabs test, with the ID of the person taking the test, the date and the result.
I think the easiest way is with a multi-step approach using the group by node. It looks like if you group by ID and “Esito” (positive vs negative) first, then you will be able to calculate things like last date of a positive test. Then you can do a second group by on just the ID and calculate things like “first date of test” which include both positive and negative.
Those calcs look to be easily done in the GroupBy node using Count, Min, Max. “Days in between” is the only thing that will require a different approach. Use “Date Time Difference” for that one with the granularity set to days.
Just group by ID and Pos / Negative on round 1. Then when you do your counts and calcs it will separate the 2 for you. You should be able to drop the second GroupBy (ID only) downstream of the first. You can always join them together by ID if you like.
You can do the same thing with a pivot for the first step and a group by for the second step. If you want to upload a little bit of the sample data that I can show you how it might work.
At first glance I would assume duplicating the ID column first and then using 1 groupby node where you group by the relevant fields with count, max,… could work
br
Here is a simple basic setup that you should be able to customize to get to your results. I did a pivot for the ID / Positive vs Negative calcs and a Group By for the combined total calcs, Joined them by ID and dropped a Date Time Difference afterwards to show how to calculate the difference. I just entered some quick dummy data on mine so it won’t match your table or column names, but the process should work for you.