I need to apologise to @Alice_Krebs. I lost track of my responsibility to provide you with the requested data. I have no defence, I just got busy and forgot about it (previously closed post).
I have included the file here and the headings below:
The Kaplan-Meier plot will use:
- one of 4 columns for time (Survival_Duration; LC_Duration; RC.Duration; DC_Duration) matched to
- one of 5 censoring columns (Death or Cancer Death; LocalControl, RegionalControl, DistantControl), while using
- any combination of grouping variables (Age, Gender, Birth_Place, DxDate, Histology, ICD10, T, N, M, Stage, P16, Tx_Intent, Surgery, Rad, Chemo, Immuno, Brachy, Hormone, Modality_Type)
Apologies once again for my tardiness.
CLINICALData.xls (14.5 KB)
No worries Thanks a lot for the data, makes understanding, testing and building workarounds much,
And also thanks a lot for your feedback, getting this from our community is very appreciated and valuable! I have added your feature requests to internal open tickets on that node, so in case we decide to re-write it, your feedback is taken into consideration. Unfortunately the Kaplan-Meier Estimator isn’t a very frequently used node, so tbh chances are low this will get a re-touch any time soon. Not because we don’t care or don’t want to, but simply because we don’t have the manpower to do so. I’m sorry!
On the positive node, there are workarounds for many of your requests. I attached a workflow giving ideas how to. I hope that helps a bit.
Kaplan_Meier_AAM.knwf (43.2 KB)
Detailed answers to your earlier post:
include an option to output to PNG file with a definable name and location
A: see workaround in workflow
when a group is chosen for analysis, permit the choice of 1 or more of the values
A: the node allows to have several groups within the analysis, so I am not exactly sure I understand what you mean?
permit the selection of groups.
a. For example, cancers with the ICD10 codes of C00-C14 and C30-C32 are all grouped into “Head & Neck Cancers”
b. For example, I may wish to compare a group of cancers (i.e., only C01, C08, C09) with a particular biological parameter (i.e., was p16+ or p16-) and a particular treatment (i.e., had chemotherapy or immunotherapy). This is a frequently used analysis type in medicine.
A: As these kind of comparisons are highly specific for individual data sets and not standardized, this is close to impossible to integrate into the configuration of the node. Workaround is respective data preparation (can be achieved e.g. with the Rule Engine node). In case your data sets do look pretty similar, you could create a component to do that.
where there is no selection of groups (see #2), the null/blank entries should be ignored. This could also be added as an option (Ignore blanks/Ignore nulls)
A: Workaround is respective data preparation (a Row Filter node)
Adding in a Cox Log-Rank statistic would be very very useful.
Adding in the 95% Confidence Interval would also be useful.
A: maybe the Math formula node or Single sample t-test node can help?
Thanks for your work @Alice_Krebs. So true that there are more issues to deal with than people to produce solutions. I have so far done some really useful stuff with KNIME. Currently struggling with ML in medical data and getting lost in the issue of data curation for each technique. This is typically done manually, but my experience of this is that if the data is from outside your expert domain (i.e., data mining) and you are doing anything more than removing spaces or changing ‘_’ to ‘.’, then you are probably destroying the expert domain data through ignorance since ‘I don’t understand’ goes in the bin! But that is a whole other set of posts.
I shall do some work on discovering whether the KNIME KM plots are the same as other statistical software too.
While writing nodes sounds like a good challenge, I know that at present it is an insurmountable challenge as the time for the learning curve is not present yet.
The Rules Engine work was very useful. What end of line character can you use in Rules Engine to shorten lines for easier review?
$ICD10$ LIKE "C00-C14" OR $ICD10$ LIKE "C30-32" => "head and neck cancer"
is not the reality! It is actually:
$ICD10$ LIKE "C00*" OR $ICD10$ LIKE "C01*" OR $ICD10$ LIKE "C02*" OR $ICD10$ LIKE "C03*" .... $ICD10$ LIKE "C31*" OR $ICD10$ LIKE "C32*" => "head and neck cancer"
So it turns out to be a very long ‘sentence’ with 17 definitions that runs far off the window.
Getting a line break in the view of a node configuration window goes deep, deep into the weeds, therefore here again a workaround please open the interactive view of the component and give it a go (and of course fine-tune and adapt it as needed, I didn’t exactly polish or thoroughly test it):
Kaplan_Meier_AAM2.knwf (95.4 KB)
Please note that it includes automatic re-execution of the widget nodes, which requires a reasonably new KNIME version.