Predictive model assumptions

Imagine you’re creating a predictive model, for example, the price of used cars. You have all possible variables except one that seems important: whether the car has a mechanical issue. After talking with the person in charge of this inventory, they confirm that 70% of the cars have no issues, 20% have minor problems of little significance, and 10% have a serious problem.
1st question: Can this information be included in the predictive model?
2nd question: Can we identify which cars are part of the 10% with serious issues?
All good ideas are welcome.

Hi @Brain,
I think that is going to be difficult. In Bayesian statistics it might be possible to adjust the prior probability according to that knowledge, but I am not familiar enough with it to give advice here. Generally, though, if you have no idea and also no data about how the mechanical issue relates to the price or any other feature, it is going to be tough.
But maybe you have unlabeled data about cars where you know the features and whether it has mechanical issues? In that case, you can first build a model to predict whether the car has a mechanical issue, then you can use that “predicted” feature for the prediction of the price.
Kind regards,

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.