Tweet Sentiment analysis with ML

ScottF · May 17, 2022, 4:57pm

Regarding stemming, it’s an optional treatment most often used to reduce complexity in the feature space, as you have noted. If your model run time is acceptable without it and your accuracy improves, there’s no need to implement it. Whether that’s a worthwhile tradeoff is for you to judge.

On cross validation, you’re usually going to implement that if you’re not sure about how “stable” your model is - that is, whether the variation is mostly constant across folds. If it is, then you can be fairly sure using regular partitioning is going to work fine. If it’s not, you might want to look more closely at the distributions of key features in your dataset.

About F1, you can read more at this older forum post, or at our blog:

Hope that helps!