Finding probability

Hi everyone, I’m new to KNIME and as a part of my uni project i would like to find how likely a person is to click a link based on their job title with help of historical data.
My data set looks something like that:
Job title Historical clicks
Professor 5
Director 30
Janitor 2
and so on.

My data set have around 40000 different job titles and i was trying to utilize Linear regression learner, but if i want to process everything at once it says that there are too many unique job titles. As far as I understand one column is a string and other is an intiger, and they don’t want to work in learner node and in linear correlation node. Could you please give me a hint in right direction?

Hi @2137

Welcome to KNIME Forum. If this is what your data looks like, I wouldn’t overcomplicate it. I would use a GroupBy on jobtitle and average the number of clicks as an indicator of how likely ‘a jobtitle’ is to click.
It may still be necessary to pay attention to records with extremely high clicks.

gr. Hans

6 Likes

Thank you HansS! As always i wanted to go the most complicated way possible, groupby works like a charm!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.