Data anonymization techniques
Important terminology
- De-identification: General term for any process of removing the association between a set of identifying data and the data subject.
- Pseudonymization: Particular type of data de-identification that both removes the association with a data subject and adds an association between a particular set of characteristics relating to the data subject and one or more pseudonyms.
- Anonymization: Process that removes the association between the identifying dataset and the data subject. Anonymization is another subcategory of de-identification. Unlike pseudonymization, it does not provide a means by which the information may be linked to the same person across multiple data records or information systems. Hence reidentification of anonymized data is not possible.
Note: As defined in ISO/TS 25237:2008.
Anonymization models
k-anonymity: K-anonymity can be described as a “hiding in the crowd”. Each quasi-identifier tuple occurs in at least k records for a dataset with k-anonymity. Definition: if each individual is part of a larger group, then any of the records in this group could correspond to a single person.
l-diversity: The l-diversity model is an extension of the k-anonymity and adds the promotion of intra-group diversity for sensitive values in the anonymization mechanism. The l-diversity model handles some of the weaknesses in the k-anonymity model where protected identities to the level of k-individuals is not equivalent to protecting the corresponding sensitive values that were generalized or suppressed, especially when the sensitive values within a group exhibit homogeneity.
t-closeness: t-closeness is a further refinement of l-diversity. The t-closeness model extends the l-diversity model by treating the values of an attribute distinctly by taking into account the distribution of data values for that attribute.
Feedback
Was this page helpful?