You are a medical scientist researching heart failure, and one of your objectives is to better understand what factors likely have a high impact on it. Given an unbalanced dataset on heart failure cases, your initial goal is to train a model for its prediction and then use xAI techniques to identify the top three most important features influencing the model’s predictions. What patterns can you uncover in this study?
Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-6 .
Need help with tags? To add tag JKISeason4-6 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!
In this solution, I did some checks with the stats node and also some data cleaning - not the best ones but it is a start.
Interesting to use this optimization node and X-Partitioner nodes together!
I also did a visualization showing the scorer and AUC of the optimized ML model.
So interesting to use the Global Feature Importance component.
Not the most optimized ML Model, but a good basis to predict and analyze the results!
A screenshot of the workflow below and the resulting Data App.
While profiling I spotted clinically impossible zeros in RestingBP and Cholesterol.
→ flagged them as missing and imputed with the most-frequent value to keep the distribution intact.
The brief asked for NB, so I discretised each numeric feature and let a Parameter Optimisation Loop + 5-fold Cross Validation search the best: ➞ AUC ≈ 0.92.
Dropped both final models into the Global Feature Importance (Surrogate RF) component: Top 5 drivers: St_Slope, ChestPain, UP, MaxHR, and flat category.
That result matches cardiology literature – always encouraging when the machine agrees with humans.
Feedback, alternate tricks, or “why-didn’t-you-try-XYZ” comments are most welcome – learning never ends.
@Arief Rama Syarif thanks for the tip on “component eye-candy”!
That little trick instantly leveled-up my dashboard from clinical to clickable.
Appreciate you sharing the KNIME magic—will definitely keep experimenting!
(Now back to convincing my dataset that blood pressure can’t be zero…)
You turned a 10 K fun-run into a surprise triathlon and labeled it “medium.”
My algorithm is now asking for an ice bath —but its cardiology IQ just hit a new personal best, so I’d call that a win.
Sneaky workout, well played; can’t wait for the next mystery stage!
Like PVergati above noted the issues with resting BP & Cholesterol.
Made the call that the resting BP = 0 was probably already dead and removed that row.
Cholesterol = 0 was probably related to missing values but was interested in seeing the effects of retaining all the rows, removing only Cholesterol = 0 or removing the entire column.
I found this challenge to be very interesting and helped as a way to broaden my knowledge of machine learning within in Knime. I also have to agree with @rfeigel I would consider this challenge to be harder than medium difficulty! I had to reference some Knime experts here to complete the challenge.
I hadn’t seen any other methods for doing the feature importance so I went with the AutoML and Global Feature Importance community components. I was hoping that there would be a simpler method for solving this part (I’m sure there is, but I could not find one).
I noticed (as others mentioned earlier) the “0” values in Cholesterol and RestingBP columns, which are physiologically impossible. I handled these by substituting them with the mean values of their respective columns.
For the class imbalance, I used SMOTE to oversample the minority class (HeartDisease=1), then let the AutoML component do the heavy lifting with parameter optimization and cross-validation.
The top three features responsible for the model’s predictions are (based on surrogate random forest):
ST_Slope
ExerciseAngina
ChestPainType
These results make medical sense, as ST segment changes and exercise-induced symptoms are classic indicators of cardiac issues! (I’m nearly not as smart as this insight, just my AI companion completed me )
For me it was a really engaging challenge! It combined the technical of modeling, the optimization with the practical need for explainable AI in healthcare. Great learning experience for someone like me who’s more on the BI side!
PS: As I’m participating the hacking days I did my solution in 5.5, I hope it can be opened by everyone.
This was very hard.
Spent a lot of time trying to do it myself and debugging the various attempts I made. This for me is the most valuable outcome of the weekly challenges, it will makes research, study, try, fail, and learn.
I did have to refer to some of the models submitted by the colleagues above, and even then I’m not sure I fully understand what I did.
This was a difficult challenge for me and it’s always fun to try to resolve yourself but I definitely had to leverage other workflows submitted here just to understand the solution. I don’t use KNIME for statistics and machine learning so this is a good push to help me moving in the data science area.
I did a quick clinical “reality check” on that same row:
Normal ECG + MaxHR 155 + stress-test metrics… all while resting BP shows 0 mmHg. Either we just logged the fittest ghost in cardiology history or it’s a placeholder gone rogue.
I chose to tag it as a data-entry slip (0 → missing) and imputed, but since it’s a single record the difference between drop or fix is basically statistical background noise. Your approach is perfectly fine.
Same spirit for the cholesterol zeros: keep, drop, or impute — good chance results move less than our heart-rate reading the thread.
Always great to cross-check assumptions rather than let the model quietly hallucinate.
Happy to compare notes anytime — robust pipelines are built on great peer exchanges.
Let’s keep the exchange going — always learn something new from fellow knimers.
Honestly, I was planning to leave the challenge for the weekend because it looked a bit overwhelming at first. But after reading your detailed solution and seeing how you approached it, I got really inspired and ended up diving in and finishing my workflow in one go!
What I really appreciate about this challenge and the forum is how openly everyone shares their ideas, insights, and feedback. It’s incredibly helpful for beginners like me for learning new techniques and best practices. Thank you all!
Fingers crossed I can submit my solution before the next challenge kicks off!
thank you for the very kind tag.
Your notebook is so polished it made my own workflow run back to the draft folder and ask for a makeover — which, for a heart-failure project, feels appropriately ironic.
I’m glad my crumbs of insight were useful; they were scattered while I was sprint-reading papers during lunch break at work, so please handle with care. Looking forward to swapping more “why-does-this-even-work” moments in the next challenge!
Find my revised submission…as i struggled to take snip due to error when i rerun flow.
Help : I faced regular error on capture workflow end node where it shows error of Partial execution if i execute 2nd time…happens even if i change the cache to write to disk too… what could have been reason and how to overcome.