MOSAIQ - Manipulating PSA values (Column to Row)

AAM · July 27, 2023, 2:05am

Greetings! I’m back with a problem … again.

I have pulled PSA values from inside my OI (MOSAIQ), and now wish to manipulate the PSA values from these Prostate Cancer patients.

Here is a sample spreadsheet where I have pulled out all of the PSA values (PSAvalue) after the Date of Diagnosis (DateOfDx) with the last PSA preceding the date of diagnosis (PSAatDx), and listed them in temporal order (Rank) with the number of days since diagnosis (Time2PSA) for each patient (PatientID).

PatientID	DateofDx	PSAatDx	DateOfPSA	PSAvalue	Time2PSA	Rank
0AEB3	18/11/2020	18	3/12/2020	19	15	1
0AEB3	18/11/2020	18	15/01/2021	1.2	58	2
0AEB3	18/11/2020	18	17/05/2021	0.07	180	3
0AEB3	18/11/2020	18	12/07/2021	0.04	236	4
0AEB3	18/11/2020	18	23/08/2021	0.03	278	5
0AEB3	18/11/2020	18	24/11/2021	0.02	371	6
0AEB3	18/11/2020	18	21/03/2022	0.01	488	7
0AEB3	18/11/2020	18	30/05/2022	0.01	558	8
0AEB3	18/11/2020	18	23/11/2022	0	735	9

I want these is a spreadsheet where there is one line for each patient (as below):

PatientID	PSAvalue_1	Time2PSA_1	PSAvalue_2	Time2PSA_2	PSAvalue_3	Time2PSA_3	PSAvalue_4	Time2PSA_4	PSAvalue_5	Time2PSA_5	PSAvalue_6	Time2PSA_6	PSAvalue_7	Time2PSA_7	PSAvalue_8	Time2PSA_8	PSAvalue_9	Time2PSA_9
0AEB3	19	15	1.2	58	0.07	180	0.04	236	0.02	371	0.01	488	0.01	558	0	735

I can do this by using RowSplitter and Joiner, however the maximum number of PSA values and hence iterations of Row Splitter / Joiner pairs is 95!

I’m thinking that there has to be an easier way to do this, hence my request.

Once again , many thanks for the past help.

A

MOSAIQ; OIS; Oncology Information System; PSA

takbb · July 27, 2023, 2:51am

Hi @AAM ,

As you are wanting to take row-based data and tabulate it across columns, this is the job of the Pivoting node.

You need to specify the “Grouping” column or columns, which is the column that acts as the “identifier” for the original rows that are being grouped together into a single row, and in your case will be PatientID

Then specify the Pivots, which is used in the creation of the individual columns, and provides a unique identifier or sequence that will be applied to the new column names. Your Rank column is ideally suited for this:

And finally specify the “Manual Aggregation” which is how the data is aggregated according to the Rank for each of the Patients.

As there is only one row per “rank” for each patient in your original table, you could arbitrarily specify various aggregations (sum, mean, first, last) and they would all give the same result, so here I have just used “First”.

In the “advanced settings”, setting the “Aggregation name” to “Keep original name(s)” means to use the original column names for the items being tabulated (“PSAvalue”, “Time2PSA”), and then by setting the “Column name” to “Aggregation name+Pivot Name”, it will append the pivot name (the value of rank) onto the end of each of these column names which will give you “PSAvalue+1”, “Time2PSA+1”, “PSAvalue+2”, “Time2PSA+2”, … which is (luckily ) very similar to the column names you require:

Finally a Column Rename (regex) can fix your column names with underscores instead of “+”

Here is a sample workflow. I have used your data and added a couple of extra dummy rows for a second patient to prove that it works…

Tabulating rows into columns.knwf (39.9 KB)

AAM · August 3, 2023, 7:29am

Thank you @takbb,

I will get that solution in place very soon. It was much easier than what I was doing and will help in other places too.

A

AAM · August 3, 2023, 8:01am

Follow up Question (@takbb I suspect it’s very similar to the last thing!).

I am able to calculate a PSA Doubling Time for each PSAvalue/PSAdate set (1/2, 2/3, 3/4, etc) using this formula.

ln(2)/(($PSAvalue_2$-$PSAvalue_1$)/(($Time2PSA_2$-$Time2PSA_1$)/28))

But once again it is repetitive, so will the multicolumn Math Formula do this for all pairings?

Once again, many thanks.

@AAM

takbb · August 4, 2023, 1:49am

hi @AAM , the Math Formula multi column wouldn’t be able to perform what you need here, as there is no facility for dealing with columns by offset position or dynamic column naming that you have in this example.

As your calculation is effectively a calculation with the values that come immediately prior, this would indicate that the Lag Column node is likely to be useful, and the calculation can be perfomed prior to the Pivoting node in the previously uploaded workflow.

Since Rank is providing the ascending value for each patient, it makes sense to use Rank as the Lag Column.

A joiner can then join each row on Patient ID and Rank = Rank (-1) (the rank of the previous row). Here I have told it to label the columns retrieved from the “previous” row as (prev), and I also told it to keep the rows where there was no rank match (because it was the first row for a patient).

From the (prev) row, we are just bringing across the PSAValue and Time2PSA columns, as these are the ones required by the formula

The Math Formula node (the single column version) can then perform the calculation using:

ln(2)/(($PSAvalue$ - $PSAvalue (prev)$) / (($Time2PSA$ - $Time2PSA (prev)$) / 28))

Finally, the data can then be pivoted as before but with the addition of the “PSA Doubling Time”

I’ve essentially put your formula in as given. I don’t know if the calcs are right, but hopefully this will help.

Tabulating rows into columns with lag calculation.knwf (53.5 KB)

AAM · August 6, 2023, 3:38am

@takbb, so I was wrong again! But thank you for the solution. You are right about the equation, I will have to some manual calculations to verify that it is correct.
PSA values are a real problem in my data as they are not linked to any significant event. The only way to analyse them is to divide them up as pre- & post- by date or assessment like nadir.
yet another instance of poorly defined information system.

Thanks again,

A

system · November 4, 2023, 3:39am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.