I have an Excel database with columns:
Name | Start Date | End Date
A person can have more than one period of start or end dates. So, for these cases, I would need to “consolidate” these lines, identifying the start and end dates to plot a timeline.
For example, I have two lines for Raul. The first line starts on 01/01/2020 until 05/31/2020. And the second line starts 12/1/2020 until 12/31/2020.
My goal is to consolidate these two lines into a single line to create a timeline, informing the respective person’s start and end periods.
I’m not sure I fully understand your question. Are you trying to identify the earliest start date and latest end date for a particular individual across different rows? In that case you could use a GroupBy node, grouping on name, and then using minimum aggregation for the start date and maximum aggregation for the end date.
If I’m not interpreting this correctly, could you post an example of what your expected output dataset would look like?
You can try this. It was a bit trickier than I thought it would be at first. I made some assumptions about your data input format so if that’s different (likely) you would need to tweak this. But maybe it gives you an idea of how it could be done. As always, there’s probably a more efficient solution, this was just my first stab at it.