Scramble Data

Hi 

I would like to use KMINE to scramble some sentive data from extracted CSV/Excel or Database

For example

ID, FirstName, LastName, DOB, Gender, PassportNo, LicenseNo, Medical History

1, Homer, Simpsons, 01/01/1960, M, K123456,D9089112, Asthma

2, Lisa, Simpsons, 01/01/1988, F, K123457,D9089113, HIV

3, Bart, Simpsons, 01/01/1985, M, K123458,D9089114, Mental Health

...

...

4000001, Maggie, Simpsons, 01/01/1964, F, K123459,D9089111, HIV

 

I want to be able to scramble the data so they looks like more dummy data, I can remove or filter or encrypt the identification fields (PassportNo and LicenseNo) of course , but I would need to keep and do some stats on Medical History. Also, I would be using the scramble data as 'representation of real life' testing data in future. 

my recordset would be in size of 3 - 6 million records. What would you recommend to accomplish such task?

 

Regards

 

 

Have you tried the Stresser node?

simon.

With 2.11 we released the Target Shuffling node, maybe if you apply it on all columns?

Also take a look into the ModularDataGeneration Workflows. We there generated an artificial customer data set. Basically I would split the data and rejoin it with the Random Matcher node.

Does this help?

Iris

Wow, that is great , thank you for your direction richards99 and iris

 

Thanks

Modular data generation is a great tool for this.  Another option would be to create anonymized keys for each unique value in a column (GroupBy > String Manipulation) and use a cell replacer to replace each value with the anonymized key.  This has the advantage that you won't change any of the relationships in the data and in the future you can undo the anonymization with the same workflow that you used to create it, simply by switching the cell replacer column references.