Scramble Data


I would like to use KMINE to scramble some sentive data from extracted CSV/Excel or Database

For example

ID, FirstName, LastName, DOB, Gender, PassportNo, LicenseNo, Medical History

1, Homer, Simpsons, 01/01/1960, M, K123456,D9089112, Asthma

2, Lisa, Simpsons, 01/01/1988, F, K123457,D9089113, HIV

3, Bart, Simpsons, 01/01/1985, M, K123458,D9089114, Mental Health



4000001, Maggie, Simpsons, 01/01/1964, F, K123459,D9089111, HIV


I want to be able to scramble the data so they looks like more dummy data, I can remove or filter or encrypt the identification fields (PassportNo and LicenseNo) of course , but I would need to keep and do some stats on Medical History. Also, I would be using the scramble data as 'representation of real life' testing data in future. 

my recordset would be in size of 3 - 6 million records. What would you recommend to accomplish such task?





Have you tried the Stresser node?


With 2.11 we released the Target Shuffling node, maybe if you apply it on all columns?

Also take a look into the ModularDataGeneration Workflows. We there generated an artificial customer data set. Basically I would split the data and rejoin it with the Random Matcher node.

Does this help?


Wow, that is great , thank you for your direction richards99 and iris



Modular data generation is a great tool for this.  Another option would be to create anonymized keys for each unique value in a column (GroupBy > String Manipulation) and use a cell replacer to replace each value with the anonymized key.  This has the advantage that you won't change any of the relationships in the data and in the future you can undo the anonymization with the same workflow that you used to create it, simply by switching the cell replacer column references.