Remove the lowest value based for a given circumstance

I have a dataframe with the ID of sportmans, the test performed, the timepoint, and value for that measure.

Something like:

ID  |  Test  |  Timepoint  |  Value
01       A        Hour2        434
01       A        Hour4        234
01       A        Hour8        5472
02       A        Hour2        1.8
02       A        Hour2        2342
02       A        Hour4        452
02       A        Hour8        234
03       A        Hour2        457
03       A        Hour4        429
03       A        Hour8        4985

As you can see, there are two measures for Sportman02 Hour2.
That’s because the first measure was very low, Hence it’s repeated.

Is there a way of creating a new dataframe where I get only the highest value for a repeated timepoint for a given patient and test?
So in the example above I would get:

ID  |  Test  |  Timepoint  |  Value
01       A        Hour2        434
01       A        Hour4        234
01       A        Hour8        5472
02       A        Hour2        2342
02       A        Hour4        452
02       A        Hour8        234
03       A        Hour2        457
03       A        Hour4        429
03       A        Hour8        4985

Use the GroupBy node.

Group by ID, Test, and Timepoint. Aggregate by Value using Maximum.

image

image

3 Likes

Hi @RoyBatty296 , another way to do this apart from the GroupBy suggested by @elsamuel is the by using the Duplicate Row Filter, similarly duplication on ID, Test and Timepoint, and keeping the max of Value.

The difference is that using the GroupBy, the entire table is re-calculated, hence why the RowIDs are re-calculated. With the Duplicate Row Filter, it will just filter out what you don’t need, keeping the original RowIDs.

Here’s with Duplicate Row Filter:

Input (same as yours):
image

Results:
image

Also, while your title says “Remove the lowest”, actually as per your description (“where I get only the highest value”), you want to remove everything and keep only the highest. For example, on top of 1.8 and 2342 for Sportman02 Test A Hour2, let’s say you also had 2.2, you would not remove 1.8 only, but rather remove 1.8 and 2.2, which is basically saying, in both scenario, that you are effectively keeping only 2342, which is keeping the highest value.

And so, my workflow is called “Keep the highest value…” instead of “Remove the lowest value…”

Here’s the workflow:
Keep the highest value based for a given circumstance.knwf (7.5 KB)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.