Help needed

Good afternoon everyone,

I am a student and I’m trying to create a prediction model using this dataset global_cancer_patients_2015_2024 | Kaggle, but I’m not able to explore data properly and I was wondering what I’m missing. Could someone help me please?
I have never used Knime before, I’ve watched tutorials on multiple platforms but I’m not reaching the goal of analyzing a dataset and using a ML algorithm for the model created. Now my assignment is due in one week, if someone could please help me that would be a blessing.

Thanks to everyone who will reply!

Hi @Federica_1 and welcome to the forum.

First of all: what do you want to predict? :slight_smile:

Given that this is a student assignment, the community is going to be reluctant to provide you with a complete solution. Probably the best thing to do is post what you’ve tried already - your current workflow in progress. Then folks can give you pointers about new things to try, or indicate where you might have gone wrong.

1 Like

Hi, I thought I could try to predict the survival rate by considering the cost of treatment and comparing to the age of people suffering from cancer. I don’t really know if it’s possible without implementing the chosen dataset with another one, but I have no confidence with Knime and I’ve been struggling during the past weeks.
This exercise is driving me crazy.

Here’s what I did til now

PROVA.knwf (93.6 KB)

Hi @Federica_1 .

Take a look at this article.

You can allways use python with knime.

Best regards.

@Federica_1 about machine learning you can take a course for example.

In more general terms you can read my article

There you will also find some more resources linked.

1 Like

could you please tell me which node should I use to get the prediction with knime?

Hi @Federica_1.

There is no one node fits all.
It is a complex task that requires an holistic aproach and depends on the goal.

Have you taken [L4-ML] (Machine Learning Algorithms) specialization like @mlauber71 recomended?

To make a prediction with machine learning in general you must follow some phases like:

  1. Define the Problem: Clearly state what you want to predict and understand the type of problem (e.g., classification, regression)

  2. Collect and Explore the Data: Gather relevant data and explore it to understand its structure, type, quality, and any issues (like missing values)

  3. Prepare the Data: Clean the data (handle missing values, convert, remove duplicates), transform features (encode categories, scale numbers), and split into training and test sets.

  4. Select a Model: Choose a suitable algorithm based on your problem and data type (e.g., decision tree, logistic regression)

  5. Train the Model: Use the training data to teach the model to recognize patterns.

  6. Evaluate the Model: Test the model on unseen data (test set) and measure its performance using appropriate metrics (accuracy, precision, etc.)

  7. Tune and Optimize: Adjust model parameters or try different algorithms to improve results.

  8. Deploy and Predict: Use the final model to make predictions on new, real-world data.

This process is often iterative, if your model doesn’t perform well, you may need to revisit earlier steps and refine your approach.

There are several examples on community hub. Search it.

I recomend you to take [L4-ML] (Machine Learning Algorithms) specialization, that will help you to identify and use the relevant KNIME nodes for building and evaluating machine learning models.
It’s a must have.

Best regards

5 Likes