Personally Identifiable Information (PII) redaction

When a data set is obtained from a government entity in the US, it is very likely that Personally Identifiable Information (PII) has already been redacted, vanished without a trace.  If you search for a data set from a government entity you are, for all practical purposes, searching for a Topic (Class-ification Token) not doing a Full Text search as you could easily assume.

PII is generally redacted at the Agency level, so hope springs eternal in Data Miners' hearts that mistakes will be made with mash-ups of raw data.  Good luck with that.  Access to the raw data breeds corruption and favoritism, good luck not getting caught at that.

Leaks of PII are much more likely with mash-ups of Third Party data with government releases.  Although the process is straight forward, I wrote a "formalism" which might help State, County and Municipal Governments duplicate the Federal Government classification and declassification of PII or at least better understand the issues.  The classification and declassification of PII has no effect upon the RDF content.  The scope of the tokens has no impact either.  The example includes G8, Data.gov, Next.gov and FACA schemes.


The example is here: http://www.rustprivacy.org/2013/egov/pii/

The software, and data base are free, as is the use of the PII Namespace.  Write me off-line.

--Gannon

Received on Tuesday, 30 July 2013 22:43:20 UTC