Finding the PSA value closest to date of prostate cancer diagnosis

I have two reports from an Oncology Information System.

The first lists all of the patients with a prostate cancer diagnosis and includes the DATE_OF_DIAGNOSIS and PATIENT_ID.

The second lists all of the PSAs taken on all of these patients and includes the DATE_OF_PSA_TEST and PATIENT_ID.

What I want to do is select for each PATIENT_ID, the PSA test this the DATE_OF_PSA_TEST which is closest but before the DATE_OF_DIAGNOSIS (it can be the same day).

I want to be able to select out all of the PSA_AT_DIAGNOSIS values and PATIENT_ID values into a single file.

I am grateful for any help.

Hi @AAM,

if I understand correctly, this is a simple inner join operation on the PATIENT_ID column, as explained in this video:

This should give you a table with the PATIENT_IDs that are present in both lists (i.e. those who are diagnosed & their respective PSA test dates), which you can then save as a file.

Hop that helps, best


For the mechanics of doing the date comparisons, you could do something like the attached workflow. It joins the diagnosis and test data for each patient, determines the days difference between the two, filters out where the test was after the diagnosis, so keeps only those that are ON or BEFORE, and then finds the maximum test date of those for each patient.



The result is hopefully what you are after. If a patient isn’t included in the test file, then the LEFT OUTER join ensures they still appear on the output, but with no test date. If that is not required, you would either filter out missing rows, or make the join an INNER join.


Workflow attached:

KNIME_PSA_Closest_Not_Later_Than.knwf (17.8 KB)

As @LukasS has said, you can then take the output and write to a file using an Excel Writer, CSV Writer or whichever you require.

Hope that helps.

EDIT: Just noticed your additional line about taking through the test value at date as well, so a modification to achieve this could be as follows:

I’ve changed the test table to the following:

So this now joins back on the patient test table, once it has a chosen test date for each patient to retrieve the test value. This then flows through to the output table


Updated flow:

KNIME_PSA_Closest_Not_Later_Than 2.knwf (21.2 KB)


Thanks LukasS, that doesn’t address the problem of the date comparisons.


Thank you @takbb the logic looks correct at first glance, I’ll put it through its paces and let you know if it works as I intended to describe.

Most grateful!


1 Like

Yep @takbb , it works! You’re a genius!

Once again, many thanks. The next step is more interesting.

How do you take the last 3 PSA values before the diagnosis and calculate the PSA doubling time? Feel free to pass on the answer as I am no where near needing it.




How is PSA doubling time defined? Is it the difference in days between two PSA tests for a given patient where the second test has at least twice the psa value of the first?

The “last 3 PSA Values before diagnosis”… if there are only 2 values, for a given patient, then presumably return those 2 (i.e return “up to” 3 PSA Values).

And does “before” mean “up to and including diagnosis date” as it did last time?


Hi @AAM , I just realised I’d produced a workflow for these later questions of yours but hadn’t uploaded it, and then time moved on.

I took a guess at the basis for definitions for doubling time. I can only test this against my own understanding, so if you need to use it, feel free to come back with questions if it doesn’t quite work as you expected.

KNIME_PSA_Closest_Not_Later_Than 3.knwf (58.6 KB)


Thanks @takbb

I’ll have a look at that. Certainly there are two periods where the doubling time is relevant - the PSAs in the 1-2 years before the diagnosis is established (i.e., finishing with the one I originally requested), which is fairly easy to establish because the final value is selected.

The second period is much more difficult to determine. After treatment the PSA will fall to a minimum value (the “nadir PSA”), this is usually within 12-18 months. It may then stay low permanently, or start to rise. When the PSA reaches ‘nadir PSA + 2’, the patient is said to have suffered a “PSA failure”. Across this period the PSA doubling time is relevant. It is unfortunate that even rising PSA values can fluctuate so that occasionally there appears to be a dip in the value. This is prone to occur if the PSA values are collected frequently (less than 6 monthly intervals).

A rising PSA may be observed for a short or long time, but at some point, other therapies (hormone therapy, chemotherapy, radiotherapy) may be instituted [I can identify these dates], with a resultant quick drop in the PSA to a new nadir. The PSA may then subsequently rise again with a new (and usually shorter doubling time) peak, and so other therapies may be used.

The PSA values extracted in the list have no correlation with therapies except by date, in the same way that you know the relevant PSA by comparing the date of the test and the date of diagnosis.

So the relevant parameters and values to be calculated are:

  • PSA at diagnosis (value) **done
  • PSA doubling time in the 12-24 months before diagnosis (months)
  • nadir PSA after initial therapy (value)
  • nadir PSA + 2 after initial therapy (date)
  • PSA doubling time after initial therapy (months)
  • PSA doubling time after second therapy (months)
  • PSA doubling time after third therapy (months)

I’ll try to get some sample CSV files together soon to assist in definition.

Thanks for discussing and helping,


1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.