Dear KNIME-professionals, please help me. What model (node) should I choose to deal with: I have dataset with different student characteristics (age when enrolled, from country, admission year, graduation year, N failures to pass other courses, marks for other courses… -> mark for My course) I need to predict my course mark for new student (better to have probability/likelihood of that mark).
P.S. “marks for other courses” presented in form of many rows for the same student:
Alex, 19, Lithuania, 2016, ?, 1, Professional Practice, 10, 7
Alex, 19, Lithuania, 2016, ?, 1, Electrical Engineering, 5, 7
Alex, 19, Lithuania, 2016, ?, 1, Electronic Devices, 5, 7
Hi there @AlexVilnius,
welcome to KNIME Community Forum!
well this sounds as a classification problem so starting with Decision Tree and then taking it to more complex methods if necessary would be a way I believe.
As a start you need to prepare your data and one thing for sure is to have one student in one row. You can achieve it by using Pivoting node.
What I recommend is to take a KNIME, free&online, Introductory Course to Data Science which will help you with your project.
Additionally you can share part of your data (20-30 rows) that can be dummy and someone can give it a go and share example with you.
Hope this helps.
Br,
Ivan
Dear Ivan,
Thank you for reply.
Could I clarify: I have students from different study programs, even ERASMUS students - therefore the list of other courses may vary, but I think marks of other courses may influence the future mark of my course. So in order to make “one student in one row” I found node “One to Many”: to convert column COURSE into many new columns. Following your suggestion I prepared short piece of data without personal information. Questions:
- where I need to place this data?
- how to convert column COURSE into many new columns with marks for those courses?
- Decision Tree Learner need Target Column of String type, but GRADE is Integer type. May I use Simple Regression Tree Learner or Gradient Boosted Trees Learner (Regression)?
1.xlsx (12.2 KB)
Hi @AlexVilnius
When I read your question, I would start reorganizing the data as follows: the_analytical_node.knwf (31.0 KB)
It makes your dataset unique on ST_NUMBER and use the Pivot Node (as @ipazin suggested) to get the GRADE for every COURSE per ST_NUMBER.
gr. Hans
Thanks a lot, @HansS.
Your workflow solve the problem making “one student in one row”.
Could you recommend me analytic node to predict my course mark for new student (better to have probability/likelihood of that mark)?
Hi @AlexVilnius
I think you can start with a simple linear regression, and see how your model performs. Take this as a baseline and improve from there.
Choose another algorithm. Improve your model parameters. And maybe you can improve your model by some feature engineering e.g. taking the order the courses are completed and/or the time between courses into account.
gr. Hans
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.