I have data which contains demographic information of candidates like first name, last name, father’s name, mother’s name, pincode, etc. I want to group these candidates over certain rules so as to find out which 2/3/multiple candidates belong to same family. I tried simply grouping them by last name and father’s name OR last name and pincode. Can there be a better approach?
@pujappathak first question would be: what would constitute a “family” and will your data contain the information to (possibly) get you there. Is it the name? Depending on the size and frequency of a name this might be a challenge. Then there could be an address and (maybe) a (last) name. Do you know the name of the father in a single data set (is the information reliable?).
Here is an example of automatically grouping similar addresses without a ‘ground truth’ - but the question would be would it be suitable for your task. Probably not:
For further inspirations about addresses and fingerprinting and so on you might want to explore this collection (not saying you will find an answer without answering the questions mentioned):