Solutions to “Just KNIME It!” Challenge 6 - Season 4

:sun_with_face: A new Just KNIME It! challenge is out! :partly_sunny:

:stethoscope: You are a medical scientist researching heart failure, and one of your objectives is to better understand what factors likely have a high impact on it. :microscope: Given an unbalanced dataset on heart failure cases, your initial goal is to train a model for its prediction and then use xAI techniques to identify the top three most important features influencing the model’s predictions. :mag: What patterns can you uncover in this study?

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-6 .

:sos: Need help with tags? To add tag JKISeason4-6 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. :blush: Let us know if you have any problems!

2 Likes

@k10shetty1 may think this was medium difficult. For dummies like me it was pretty difficult. Workflow includes three separate models.

5 Likes

Hi All,

Interesting one!

My solution is here: JKISeason 4-6 - Heart Failure Prediction – KNIME Community Hub

In this solution, I did some checks with the stats node and also some data cleaning - not the best ones but it is a start.
Interesting to use this optimization node and X-Partitioner nodes together!

I also did a visualization showing the scorer and AUC of the optimized ML model.
So interesting to use the Global Feature Importance component.

Not the most optimized ML Model, but a good basis to predict and analyze the results!

A screenshot of the workflow below and the resulting Data App.

Enjoy the Challenge!

Cheers
Jerome

6 Likes

:stethoscope: Heart-Failure Challenge – My KNIME Walk-Through:rocket:

Hi everyone,

here’s a short recap of my approach before the deadline closes:
The full workflow is annotated node-by-node on the Hub (link below).

My solution to JKISeason4-6 (Heart Failure Prediction_01 – KNIME Community Hub)

:one: Data preparation

  • While profiling I spotted clinically impossible zeros in RestingBP and Cholesterol.
    → flagged them as missing and imputed with the most-frequent value to keep the distribution intact.

  • Applied Z-score scaling to every continuous variable; categoricals went through one-hot encoding.

:two: Baseline model – Naïve Bayes

The brief asked for NB, so I discretised each numeric feature and let a
Parameter Optimisation Loop + 5-fold Cross Validation search the best: ➞ AUC ≈ 0.92.

:three: Quick detour – XGBoost ensemble :deciduous_tree:

Couldn’t resist comparing: same loop, tree/learning-rate grid.
Best run: AUC 0.93, Accuracy 0.866 – not surprise of the century, but nice benchmark.

:four: Explainability



Dropped both final models into the Global Feature Importance (Surrogate RF) component:
Top 5 drivers: St_Slope, ChestPain, UP, MaxHR, and flat category.
That result matches cardiology literature – always encouraging when the machine agrees with humans.


Feedback, alternate tricks, or “why-didn’t-you-try-XYZ” comments are most welcome – learning never ends.

@Arief Rama Syarif thanks for the tip on “component eye-candy”! :framed_picture:
That little trick instantly leveled-up my dashboard from clinical to clickable.
Appreciate you sharing the KNIME magic—will definitely keep experimenting!

(Now back to convincing my dataset that blood pressure can’t be zero…) :sweat_smile:

9 Likes

:wave::grin: … Happy KNIME’ing @PVergati

1 Like

Haha, fair point! @rfeigel :sweat_smile: The medium challenge might have accidentally included a mini ML marathon. Really nice to see you solve it ! :slight_smile:

2 Likes

You turned a 10 K fun-run into a surprise triathlon and labeled it “medium.” :man_running::biking_woman::man_swimming:
My algorithm is now asking for an ice bath :ice_cube::bathtub:—but its cardiology IQ just hit a new personal best, so I’d call that a win. :anatomical_heart::chart_with_upwards_trend:
Sneaky workout, well played; can’t wait for the next mystery stage! :brain::1st_place_medal:

1 Like

Like PVergati above noted the issues with resting BP & Cholesterol.
Made the call that the resting BP = 0 was probably already dead and removed that row.
Cholesterol = 0 was probably related to missing values but was interested in seeing the effects of retaining all the rows, removing only Cholesterol = 0 or removing the entire column.

3 Likes

Hello Knime It Challengers,

I found this challenge to be very interesting and helped as a way to broaden my knowledge of machine learning within in Knime. I also have to agree with @rfeigel I would consider this challenge to be harder than medium difficulty! I had to reference some Knime experts here to complete the challenge.

I hadn’t seen any other methods for doing the feature importance so I went with the AutoML and Global Feature Importance community components. I was hoping that there would be a simpler method for solving this part (I’m sure there is, but I could not find one).

My Solution:

3 Likes

Phew, this was quite a challenging task (especially for someone who’s not a core DS guy! :slight_smile: )

My solution: JKISeason4-6_berti093 – KNIME Community Hub

I noticed (as others mentioned earlier) the “0” values in Cholesterol and RestingBP columns, which are physiologically impossible. I handled these by substituting them with the mean values of their respective columns.

For the class imbalance, I used SMOTE to oversample the minority class (HeartDisease=1), then let the AutoML component do the heavy lifting with parameter optimization and cross-validation.

The Naive Bayes model came out victoriously:

  • F-measure: 0.90
  • Accuracy: 0.90


The top three features responsible for the model’s predictions are (based on surrogate random forest):

  1. ST_Slope
  2. ExerciseAngina
  3. ChestPainType

These results make medical sense, as ST segment changes and exercise-induced symptoms are classic indicators of cardiac issues! (I’m nearly not as smart as this insight, just my AI companion completed me :smiley: )

For me it was a really engaging challenge! It combined the technical of modeling, the optimization with the practical need for explainable AI in healthcare. Great learning experience for someone like me who’s more on the BI side! :dart:

PS: As I’m participating the hacking days I did my solution in 5.5, I hope it can be opened by everyone.

6 Likes

This was very hard.
Spent a lot of time trying to do it myself and debugging the various attempts I made. This for me is the most valuable outcome of the weekly challenges, it will makes research, study, try, fail, and learn.

I did have to refer to some of the models submitted by the colleagues above, and even then I’m not sure I fully understand what I did.

Here is the workflow:JKISeason4-6 – KNIME Community Hub

4 Likes

Hello everyone…here is my solution to this challenge.
JKISeason4-6_garcbcpa

This was a difficult challenge for me and it’s always fun to try to resolve yourself but I definitely had to leverage other workflows submitted here just to understand the solution. I don’t use KNIME for statistics and machine learning so this is a good push to help me moving in the data science area.

3 Likes

Hi Bro — loved your take on the mystery-patient!

I did a quick clinical “reality check” on that same row:

  • Normal ECG + MaxHR 155 + stress-test metrics… all while resting BP shows 0 mmHg. Either we just logged the fittest ghost in cardiology history :ghost: :muscle: or it’s a placeholder gone rogue.

I chose to tag it as a data-entry slip (0 → missing) and imputed, but since it’s a single record the difference between drop or fix is basically statistical background noise. Your approach is perfectly fine.

Same spirit for the cholesterol zeros: keep, drop, or impute — good chance results move less than our heart-rate reading the thread. :sweat_smile:

Always great to cross-check assumptions rather than let the model quietly hallucinate.

:point_right: Happy to compare notes anytime — robust pipelines are built on great peer exchanges.
Let’s keep the exchange going — always learn something new from fellow knimers.

Cheers,

Wow Bert, what a ride — and what a delivery, as always! :rocket:
You and your AI companion clearly make a solid diagnostic team. :smile:

There’s always something to learn from your approach. Too strong, man.
Looking forward to the next round of KNIME kung fu!

Cheers

1 Like

Thank you so much!

Honestly, I was planning to leave the challenge for the weekend because it looked a bit overwhelming at first. But after reading your detailed solution and seeing how you approached it, I got really inspired and ended up diving in and finishing my workflow in one go! :smiley:

Next week, same place, same time :smiley:

1 Like

What I really appreciate about this challenge and the forum is how openly everyone shares their ideas, insights, and feedback. It’s incredibly helpful for beginners like me for learning new techniques and best practices. Thank you all!

Fingers crossed I can submit my solution before the next challenge kicks off!

Have a great day everyone :blush:!

3 Likes

Hi everyone,

this is my solution for this week’s challenge: JKISeason4-6 – KNIME Community Hub

I’m with my fellow KNIMErs on this one — the challenge was definitely a tough one! :wink:

That said, I learned a ton from the documentation, forums, and YouTube videos. Huge thanks to @PVergati for the inspiration!

Understand the input data

Description about similar dataset can be found here:

Output




Excited for the next challenge — fingers crossed it’s a little more beginner-friendly! :grin:

Have a great day everyone!

3 Likes

thank you for the very kind tag.
Your notebook is so polished it made my own workflow run back to the draft folder and ask for a makeover — which, for a heart-failure project, feels appropriately ironic. :anatomical_heart::sweat_smile:

I’m glad my crumbs of insight were useful; they were scattered while I was sprint-reading papers during lunch break at work, so please handle with care. Looking forward to swapping more “why-does-this-even-work” moments in the next challenge!

Cheers

1 Like

Find my revised submission…as i struggled to take snip due to error when i rerun flow.

feature imp
feature imp 2

Help : I faced regular error on capture workflow end node where it shows error of Partial execution if i execute 2nd time…happens even if i change the cache to write to disk too… what could have been reason and how to overcome.

3 Likes

@PVergati Hope you enjoyed the challenge. Sounds like your algorithm got a solid workout there :slight_smile:

1 Like