supervised learning


i am trying to mine som data that i have extracted of a forum. i want to use supervised learning to train a classifier so it will mine the data that is relevant and irellevant and put them in two different tables. 


i am struggling to do this on knime i dont have a clue where to start, and what to do. i have attached the csv file with the data and would really appreciate if someone could have a look and let me know how i can do this 


i would really appreciate some feedback, 


kind regards, Maariya.

Hello Maariya,

Supervised learning is the machine learning task of inferring a function from labeled training data.

Your data does not contain labels like "relevant" or "irrelevant". So what you need to to is create training data. This is a set of posts/comments that have a label assigned. You could for example do this by reading them and typing the label manually, or if there is something like an upvote in the forum, you could include this data and try to classify the post according to the number of upvotes.

Now you have a target (the label), but you need some input. The post itself is not a useful input. You need to derive features from it, that are understandable and informative. I am no Text Miner, so there are probably better ideas, but what comes to my mind is: keyword counting, length of the post, Sentiment Analysis etc.
Maybe there are papers out there which could give you some guidance in this project.


Hey ferry,

Thank you for your feedback it is really helpful. i was talking to one of my lecturers today as i am in final year of university and he gave me similar advice.


thankyou, Maariya :)