Regression Model for Salary Prediction

mlauber71 · January 7, 2024, 8:00pm

@Karim_Amarouche these things come to mind:

actually there is not that much data to get a precise amout of salary. You might be better off with rounding the numbers and maybe using them in groups. Depending on what is your goal.
also you rely heavily on one-to-many transformations for the categorical data. There might be other options *1). Also some data can be interpreted as numeric like age or years of experience. A model might benefit from having a real number (ordinal rank maybe) instead of a fixed point in time. 10 years of experience is significantly more than 2 and carries a meaning. Maybe use the mean of a ‘categorical’ column (2-10 years can be 6 or so)
there might be more information in the descriptions of the roles. Maybe try to extract topics or industries from there or extract a set of key words that you might be able to standardize and assigne to each case
also from what I saw you left out the additional benefits. Especially for managers they might form a relevant part of their compensation, so leaving them out might mislead the model in thinking someone with 20+ years of experience in a senior position would only earn less money when the rest is in the extras in this dataset (edit: just saw you used the whole number)

More examples how to deal with regression models here:

*1) some more advanced data preparation can be done for example with vtreat. I have code and an article about that:

If you want to learn more about machine learning there are some great KNIME ressources out there: