Hi all, I have the following problem, hope someone can shed some light.
I have a list of about 10k nicknames from an old forum. About half of this list has a gender attached to it (done by the very users that registered), and the other half has no gender.
I want to do some analysis based on gender, so having ~5K users without is not helping. I’ve been playing with the Palladian Text Classifier Learner, without much success.
As sources I’m using the very list that I already have without the unclassified nicknames, and a very large list of english gendered names I’ve found in the internet.
The results are pretty poor. Many nicknames are kind of [^-J0hn-^] you know what I mean, but it even misses many of the ones I already provided in the sources.
I don’t really know how to solve this. I’ve been thinking about using the forum messages but I don’t really know how to do it.