Comparing Text Datasets


I have textual data for responses from people in the US and Canada pertaining to questions around how they feel about their experience using a certain product. I’ve used the Topic Extractor and can’t really see any differences. I’m wonder if there is a way to statistically compare the male vs. female to see what the big topic differences are…versus me trying to speculate based on topics.

Thank you!

Hi @crohoman,

one thing that you could try is to train a predictive model that distinguished between male and female comments. If the model accuracy is good e.g. >80% than this is a clear indicator, that these comments are different and can be separated well. If you use a predictive model that you can read e.g. a decision tree you can even check the features (terms) that are use for separation. These features can give you an indication about the topics.

I hope this helps.

Cheers, Kilian

Thanks Kilian. I will do some research on this approach and test it out.