Simultaneously selecting the last date when event = NO, and first date when event =YES

I am again (AGAIN!) looking for help. I am able to do this (I have done it before but the process is highly convoluted) but I suspect that there is an easier way.

My OIS has many similar circumstances where multiple assessments are entered. Extraction of these values comes in a list as shown here.

PatientID Date Radionecrosis
78997 2021-05-19T15:49:53 None
78997 2021-07-22T15:52:10 Imaging changes only
78997 2021-08-15T15:53:10 Steroid Use
78997 2021-10-21T15:55:02 None
78997 2022-01-24T15:56:16 None
78997 2022-03-16T15:57:48 None
78997 2022-10-27T15:59:19 None
77105 2020-11-06T12:32:54 None
77105 2020-11-06T12:34:10 None
77105 2021-02-12T12:36 None
77105 2021-02-12T12:36:25 None
77105 2021-05-24T12:38:09 None
77105 2022-01-22T12:30:27 None

In cases where the entries for a single patient are all “None” (i.e., the event ‘Radionecrosis’ has not occurred), I want to know the last date. [this matches patient #77105]

In cases where the entries for a single patient are not all “None” (i.e., the event ‘Radionecrosis’ has occurred), I want to know the first date. [this describes patient #78997]

The output of the manipulation for the event “Imaging changes only” would be:

PatientID Date Radionecrosis
78997 2021-07-22T15:52:10 Imaging changes only
77105 2022-01-22T12:30:27 None

GroupBy seems a logical choice but as an oncologist, it all still looks so counterintuitve (I know that that is my problem).

Again, thanks for reading and contemplating.

Hello @AAM ,
I think GroupBy would be my choice as well since in the “Manual Aggregation” tab of the GroupBy node you can aggregate by “First” and “Last”.

Thanks,
Sanket

3 Likes

Hi @AAM

As my colleague @sanket_2012 mentioned, using the GroupBy node and aggregating by “First” and “Last” is a good solution.

However there are probably many different ways to solve this problem. For example with Duplicate Row Filter. Here is an example (attached workflow).

Note that for both the GroupBy and Duplicate Row Filter solutions, you must keep the data sorted by Date&Time in ascending order.

All the best,

Jose.

Duplicate row first and last.knwf (88.8 KB)

5 Likes

Hi @AAM

The Duplicate Row filter is a nice way to fix this @josegallardo .
But in the RowFilter “Imaging changes only” you filter only on one specific word.
But there can be more words in the column “Radionecrosis”


So instead of using two RowFilter nodes you can also use one RowSplitter node, to split the column “Radionecrosis” on the value “None”.

gr. Hans

3 Likes

Thanks for the help so far. I have struck a problem though. The problem is common and relates to the fact that follow up data can be added many times as “NED” and many times as “Recurrent”, but that the report should report just one value.

At the end of Duplicate row first and last.knwf when run on my real data gives the following output:

PatientID Date Kowledge source LocalStatus ICD10site
10162 2007-12-12T00:00 BDM check NED C61
10165 2021-10-24T00:00 EMR Review NED C18.3
10165 2021-10-25T00:00 Consultation NED C44.3 08
10169 2002-08-07T00:00 EMR Review NED C18.2
10169 2003-04-23T00:00 EMR Review Recurrent C18.2
10179 2016-02-20T00:00 EMR Review NED C25.0
10187 2011-08-26T00:00 EMR Review NED C20
10187 2011-08-27T00:00 BDM check NED C15.5
10188 2008-07-27T00:00 Newspaper NED C32.0

You can see that the oldest NED has been report AND the first Recurrence has been reported, however what I want is:

the first Recurrence value, but if there is no Recurrence then the last NED value.

If the report functions correctly, I should have only the second line with the “Recurrent” remaining.

Here is more data to use (with correct lines bolded):

PatientID Date Kowledge source LocalStatus ICD10site
70876 2014-04-07T00:00 Histology Report NED C20
70876 2015-02-02T00:00 Imaging Report Recurrent C20
70876 2016-10-19T00:00 Imaging Report NED C20
70876 2018-03-19T00:00 Imaging Report NED C20
70876 2018-12-17T00:00 Imaging Report NED C20
71031 2020-04-15T00:00 Imaging Report Recurrent C44.3
71031 2021-07-26T00:00 Consultation NED C00.1
71031 2021-07-27T00:00 Consultation NED C44.3
71321 1999-05-31T00:00 Histology Report Recurrent C67
71321 2019-05-07T00:00 Consultation NED C67
71321 2020-06-19T00:00 EMR Review NED C67
71321 2020-06-20T00:00 BDM check NED C61
71324 2019-09-03T00:00 Histology Report Recurrent C21.1
71324 2023-04-18T00:00 Histology Report NED C21.1
71419 2019-06-13T00:00 Consultation NED C20
71419 2020-06-17T00:00 Imaging Report Recurrent C20
71419 2020-07-01T00:00 Letter Recurrent C20
71419 2021-01-11T00:00 Histology Report NED C20

@AAM this data now looks different than the one in the first posting. There are more columns.

You will have to determine the number of None per patient and category and maybe compare them with the overall number of entries per this group. Then have two groups one where you extract the first date and one where it is the last.

Joins and reference filters should help.

But maybe you might have to check your data and definition again. Maybe provide more data and a result in the form of files.

2 Likes

I think that I have solved it!

By first finding all of the Recurrent episodes, I can then use that population to exclude all of their NED values. The remainder have only NED values and so there is no overlap between the groups that are finally concatenated.

Thanks for the help though

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.