Page 1 of 1

CODA vs Data Scrubbing

Posted: Fri Jun 03, 2011 7:11 pm
by ctrpapa
While working at Microsoft I helped write a script to scrub all personally identifiable info from strings, phone numbers, credit info, etc.

This wasn't scrambled or encrypted but actually replaced strings with whatever was desired and in no way related to the original.

The database was at least 50GB but we were still able to make it rewrite all the info to another server minus anything that would be personally identifiable in a reasonable amount of time (less than a day)

Without knowing much about the data behind the live would it be possible to do something similar instead of using CODA?

Posted: Thu Jun 09, 2011 10:25 am
by TimRiker
The FORG initiative is looking into tools like this to clean live data.

CODA using a few tables to generate data. These include things like male and female first names, last names, etc. It's pretty basic at present. CODA does not have access to any live data.