Solutions to “Just KNIME It!” Challenge 6 - Season 4

forum · June 18, 2025, 2:05pm

A new Just KNIME It! challenge is out!

You are a medical scientist researching heart failure, and one of your objectives is to better understand what factors likely have a high impact on it. Given an unbalanced dataset on heart failure cases, your initial goal is to train a model for its prediction and then use xAI techniques to identify the top three most important features influencing the model’s predictions. What patterns can you uncover in this study?

Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-6 .

Need help with tags? To add tag JKISeason4-6 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!

rfeigel · June 19, 2025, 4:04am

@k10shetty1 may think this was medium difficult. For dummies like me it was pretty difficult. Workflow includes three separate models.

trj · June 19, 2025, 4:48am

Hi All,

Interesting one!

My solution is here: JKISeason 4-6 - Heart Failure Prediction – KNIME Community Hub

In this solution, I did some checks with the stats node and also some data cleaning - not the best ones but it is a start.
Interesting to use this optimization node and X-Partitioner nodes together!

I also did a visualization showing the scorer and AUC of the optimized ML model.
So interesting to use the Global Feature Importance component.

Not the most optimized ML Model, but a good basis to predict and analyze the results!

A screenshot of the workflow below and the resulting Data App.

Enjoy the Challenge!

Cheers
Jerome

PVergati · June 19, 2025, 11:17am

Heart-Failure Challenge – My KNIME Walk-Through

Hi everyone,

here’s a short recap of my approach before the deadline closes:
The full workflow is annotated node-by-node on the Hub (link below).

My solution to JKISeason4-6 (Heart Failure Prediction_01 – KNIME Community Hub)

Data preparation

While profiling I spotted clinically impossible zeros in RestingBP and Cholesterol.
→ flagged them as missing and imputed with the most-frequent value to keep the distribution intact.

image1068×912 33.2 KB
Applied Z-score scaling to every continuous variable; categoricals went through one-hot encoding.

Baseline model – Naïve Bayes

The brief asked for NB, so I discretised each numeric feature and let a
Parameter Optimisation Loop + 5-fold Cross Validation search the best: ➞ AUC ≈ 0.92.

Quick detour – XGBoost ensemble

Couldn’t resist comparing: same loop, tree/learning-rate grid.
Best run: AUC 0.93, Accuracy 0.866 – not surprise of the century, but nice benchmark.

Explainability

Dropped both final models into the Global Feature Importance (Surrogate RF) component:
Top 5 drivers: St_Slope, ChestPain, UP, MaxHR, and flat category.
That result matches cardiology literature – always encouraging when the machine agrees with humans.

Feedback, alternate tricks, or “why-didn’t-you-try-XYZ” comments are most welcome – learning never ends.

@Arief Rama Syarif thanks for the tip on “component eye-candy”!
That little trick instantly leveled-up my dashboard from clinical to clickable.
Appreciate you sharing the KNIME magic—will definitely keep experimenting!

(Now back to convincing my dataset that blood pressure can’t be zero…)

arief_rama · June 19, 2025, 11:26am

… Happy KNIME’ing @PVergati

k10shetty1 · June 19, 2025, 12:04pm

Haha, fair point! @rfeigel The medium challenge might have accidentally included a mini ML marathon. Really nice to see you solve it !

PVergati · June 19, 2025, 1:41pm

You turned a 10 K fun-run into a surprise triathlon and labeled it “medium.”
My algorithm is now asking for an ice bath —but its cardiology IQ just hit a new personal best, so I’d call that a win.
Sneaky workout, well played; can’t wait for the next mystery stage!

jproudfoot111 · June 19, 2025, 6:18pm

Like PVergati above noted the issues with resting BP & Cholesterol.
Made the call that the resting BP = 0 was probably already dead and removed that row.
Cholesterol = 0 was probably related to missing values but was interested in seeing the effects of retaining all the rows, removing only Cholesterol = 0 or removing the entire column.

Soybeans0000023 · June 19, 2025, 7:18pm

Hello Knime It Challengers,

I found this challenge to be very interesting and helped as a way to broaden my knowledge of machine learning within in Knime. I also have to agree with @rfeigel I would consider this challenge to be harder than medium difficulty! I had to reference some Knime experts here to complete the challenge.

I hadn’t seen any other methods for doing the feature importance so I went with the AutoML and Global Feature Importance community components. I was hoping that there would be a simpler method for solving this part (I’m sure there is, but I could not find one).

My Solution:

berti093 · June 19, 2025, 8:14pm

Phew, this was quite a challenging task (especially for someone who’s not a core DS guy! )

My solution: JKISeason4-6_berti093 – KNIME Community Hub

I noticed (as others mentioned earlier) the “0” values in Cholesterol and RestingBP columns, which are physiologically impossible. I handled these by substituting them with the mean values of their respective columns.

For the class imbalance, I used SMOTE to oversample the minority class (HeartDisease=1), then let the AutoML component do the heavy lifting with parameter optimization and cross-validation.

The Naive Bayes model came out victoriously:

F-measure: 0.90
Accuracy: 0.90

The top three features responsible for the model’s predictions are (based on surrogate random forest):

ST_Slope
ExerciseAngina
ChestPainType

These results make medical sense, as ST segment changes and exercise-induced symptoms are classic indicators of cardiac issues! (I’m nearly not as smart as this insight, just my AI companion completed me )

For me it was a really engaging challenge! It combined the technical of modeling, the optimization with the practical need for explainable AI in healthcare. Great learning experience for someone like me who’s more on the BI side!

PS: As I’m participating the hacking days I did my solution in 5.5, I hope it can be opened by everyone.

andre_carva · June 19, 2025, 9:41pm

This was very hard.
Spent a lot of time trying to do it myself and debugging the various attempts I made. This for me is the most valuable outcome of the weekly challenges, it will makes research, study, try, fail, and learn.

I did have to refer to some of the models submitted by the colleagues above, and even then I’m not sure I fully understand what I did.

Here is the workflow:JKISeason4-6 – KNIME Community Hub

garcbcpa · June 19, 2025, 9:58pm

Hello everyone…here is my solution to this challenge.
JKISeason4-6_garcbcpa

This was a difficult challenge for me and it’s always fun to try to resolve yourself but I definitely had to leverage other workflows submitted here just to understand the solution. I don’t use KNIME for statistics and machine learning so this is a good push to help me moving in the data science area.

PVergati · June 20, 2025, 6:51am

Hi Bro — loved your take on the mystery-patient!

I did a quick clinical “reality check” on that same row:

Normal ECG + MaxHR 155 + stress-test metrics… all while resting BP shows 0 mmHg. Either we just logged the fittest ghost in cardiology history or it’s a placeholder gone rogue.

I chose to tag it as a data-entry slip (0 → missing) and imputed, but since it’s a single record the difference between drop or fix is basically statistical background noise. Your approach is perfectly fine.

Same spirit for the cholesterol zeros: keep, drop, or impute — good chance results move less than our heart-rate reading the thread.

Always great to cross-check assumptions rather than let the model quietly hallucinate.

Happy to compare notes anytime — robust pipelines are built on great peer exchanges.
Let’s keep the exchange going — always learn something new from fellow knimers.

Cheers,

PVergati · June 20, 2025, 6:57am

Wow Bert, what a ride — and what a delivery, as always!
You and your AI companion clearly make a solid diagnostic team.

There’s always something to learn from your approach. Too strong, man.
Looking forward to the next round of KNIME kung fu!

Cheers

berti093 · June 20, 2025, 1:37pm

Thank you so much!

Honestly, I was planning to leave the challenge for the weekend because it looked a bit overwhelming at first. But after reading your detailed solution and seeing how you approached it, I got really inspired and ended up diving in and finishing my workflow in one go!

Next week, same place, same time

hanantoprabowo · June 21, 2025, 7:42am

What I really appreciate about this challenge and the forum is how openly everyone shares their ideas, insights, and feedback. It’s incredibly helpful for beginners like me for learning new techniques and best practices. Thank you all!

Fingers crossed I can submit my solution before the next challenge kicks off!

Have a great day everyone !

hanantoprabowo · June 22, 2025, 5:08am

Hi everyone,

this is my solution for this week’s challenge: JKISeason4-6 – KNIME Community Hub

I’m with my fellow KNIMErs on this one — the challenge was definitely a tough one!

That said, I learned a ton from the documentation, forums, and YouTube videos. Huge thanks to @PVergati for the inspiration!

Understand the input data

Description about similar dataset can be found here:

Output

Excited for the next challenge — fingers crossed it’s a little more beginner-friendly!

Have a great day everyone!

PVergati · June 22, 2025, 7:49am

thank you for the very kind tag.
Your notebook is so polished it made my own workflow run back to the draft folder and ask for a makeover — which, for a heart-failure project, feels appropriately ironic.

I’m glad my crumbs of insight were useful; they were scattered while I was sprint-reading papers during lunch break at work, so please handle with care. Looking forward to swapping more “why-does-this-even-work” moments in the next challenge!

Cheers

AnilKS · June 22, 2025, 2:27pm

Find my revised submission…as i struggled to take snip due to error when i rerun flow.

feature imp
feature imp 2

Help : I faced regular error on capture workflow end node where it shows error of Partial execution if i execute 2nd time…happens even if i change the cache to write to disk too… what could have been reason and how to overcome.

k10shetty1 · June 23, 2025, 9:06am

@PVergati Hope you enjoyed the challenge. Sounds like your algorithm got a solid workout there