# Naive Newbie asking for help

I apologise for asking what may seem like basic questions, but I am a doctor and not trained in computer code and advanced software use, it is very high “price of entry”; exactly as high as the price of entry into working in the medical area and being confused by all the medical stuff even at a basic level.

Here is my problem. When I see a patient at each follow up clinic visit, I enter a contemporaneous evaluation of the clinical disease status for each individual cancer diagnosis. So each visit gets one entry per diagnosis. I end up with a spreadsheet with this follow up data:

I need to output from this spreadsheet a summary line for each unique patient/disease combination in a new spreadsheet. The logic for the returned entry is provided on the following image. To repeat it, each individual patient/diagnosis combination will have single entry for DATE and DISEASE_STATUS. The DISEASE_STATUS will be either NED (if there has been no recurrence, and the most recent DATE), or Recurrent (and therefore the DATE of the first Recurrence which is the censoring event).

My difficulty is that unless I see examples of how to do this, the software is so opaque to me that I can’t develop it myself. I understand that you may find this problem laughably basic, but it where I am stuck, and I am a sole medical practitioner working at the Informatics interface needing software help

(I feel like Julia Roberts in Notting Hill!! “I’m just a girl, …”)

Hi Dr. Julia Roberts ,
there are probably a variety of different ways to solve your problem.
I came up with a pretty short variant, which in case i understood your problem, should
be a possible solution.
A brief explanation: First of all, i convert your Date into the according format (otherwise you won’t be able to accurately sort/group on the column). Given the binary problem (NED or Recurrent) i split all NED and Recurrent cases into two different tables. Because NED cases are not relevant for patients where a recurrent case has been diagnosed at least once (is that the correct assumption?) i filter all patients from the NED table that also exist in the Recurrent table. Last but not least i group on the patient id to get the latest (Maximum) date. I attached the workflow, i hope it helps!

Patients_Analysis.knwf (14.8 KB)

2 Likes

Dear MH (timid, slightly dopey bookstore owner…)

Thank you for the help. Yes, using your method, I have been able to produce the summary line required for each patient and disease.

I remain in your debt, and am grateful for your time. Until next time (there will be a next time!),

AM (aka Dr Julia Roberts)

Dear MH,

It’s taken this long to get back to using your suggestion! But it is working to give me separate lists of ID, date, status and site, one for local/NED, local/Recurrence, regional/NED, regional/Recurrence, … etc.

I’m now having difficulty merging them back into a single table with one ID, one site and then the local (date, status), regional (date, status), etc.

AAM

You could the Joiner node - possibly a series of them? - to merge your information back together.

I have been trying to use Joiner, but it doesn’t give the output I want.

Here is my workflow (based on the original suggestion with slight modifications with the real data):

I am trying to put the two GroupBy results together (the first is anyone with ONLY ‘NED’ results, the second is anyone with ANY "Recurrent’ results. For each Patient/Diagnosis pair there is only one row.

Joiner requires me to do a join (obviously! I used Full Outer Join), and so I get what you see below, and I have lost the Pat_ID1 from the right table:
I

But this is not what I want. I want to end up with 4 columns only, Pat_ID1, Local_Fail, Diagnosis and FU_Date. That is the rows ?_Row0, …. etc would be moved across 3 columns and also have a Pat_ID1.

I have tried to find a concatenator function unsuccessfully.

A

1 Like

I am fairly sure that I have solved my problem!

Here is my workflow:

The problem was at the Joiner step. When I did the Full Outer Join on the two GroupBy outputs using Pat_ID1, I lost one of the Pat_ID1 fields from the resultant table, hence the left column missing value in ?_Row0 of the previous posts screenshot. I solved this by going back to my original database and altering the DB Query Reader to include a redundant SELECT for the field Ident.IDA.

In the Joiner module I now do the join using the two MRN values (MRN_NED, MRN_REC).

I now lose the MRN_REC column after the join, and I chose to not display the MRN_NED column in the output.

This leaves me with 8 columns, 2 for Pat_ID1, 2 for Local_Fail, 2 for FU_Date and 2 for Diagnosis.

The Column Aggregator function is configured focus on just 2 columns at a time ….

and I concatenate the entries, give the column a new name, and then remove the aggregation columns ….

to produce 7 columns! And I still have the same number of entries!

At the end of this process I have the following output:

Which is what I wanted, though I need to verify numbers independently. Now, I can do the same for Regional and Distant control!

[Apologies for the long post, but I am personally pleased. I can’t include the workflow file as the first connection is to our database and includes log in details. I can help with any personal request from a site that uses MOSAIQ 2.64 as its OIS.]

2 Likes

Hi @AAM,

glad you made it! Was just thinking would Concatenate node make this process a bit simpler cause it seems to me after 2 GroupBy nodes there are no matching PAT_ID values in your two data sets?

Br,
Ivan

I did try Concatenate, but there were problems. That’s why I used the redundant SELECT statement in the DB Query step.

A

Hi @AAM,

ok. I see.

Br,
Ivan

I discovered this need by observing that Joiner removes the right name in the join operation. I can’t remember why Concatenate didn’t work.

A

Hi @AAM,

think I know what you are talking about and there is a ticket to improve it

Br,
Ivan

Ivan,

If you look at the post starting with “I have been trying to use Joiner , but it doesn’t give the output I want.”, you can see the screenshot of the spreadsheet. There should be 2 Pat_ID1 columns but the right join column is routinely stripped out by the function even if you deliberately include it in the column selection.

A

Hi @AAM,

maybe you have option Remove joining column from bottom input checked - that is by default.

If not can you share workflow example and I can check.

Br,
Ivan

1 Like

Thank you Ivan!

It shows you that I have Rumsfeld Syndrome! Perhaps there should be another law describing how the more complex easy-to-use software becomes, the harder it is to use due to extinction by options!

A

1 Like

Hi @AAM,

you are welcome. You are a doctor so you should know

Br,
Ivan