Data Anonymization Techniques

Privacy models as techniques for anonymizing data.

Important terminology

  • De-identification: A general term for any process of removing the association between a set of identifying data and the data subject.
  • Pseudonymization: A particular type of data de‑identification that removes the direct association with a data subject. It replaces that association by linking a specific set of characteristics to one or more pseudonyms.
  • Anonymization: A process that removes the association between the identifying dataset and the data subject. Anonymization is another subcategory of de-identification. Unlike pseudonymization, it does not provide a means by which the information may be linked to the same person across multiple data records or information systems. Hence reidentification of anonymized data is not possible.

Note: As defined in ISO/TS 25237:2008.

Protegrity Anonymization Models

  • k-anonymity: K-anonymity can be described as a “hiding in the crowd”, where each quasi‑identifier tuple occurs in at least k records within a dataset. As a result, each individual is part of a larger group, and any record in that group could correspond to a single person.

  • l-diversity: The l-diversity model is an extension of the k-anonymity and adds the promotion of intra-group diversity for sensitive values in the anonymization mechanism. It handles some of the weaknesses in the k-anonymity model where protected identities to the level of k-individuals are not equivalent to protecting the corresponding sensitive values that were generalized or suppressed, especially when the sensitive values within a group exhibit homogeneity.

  • t-closeness: t-closeness is a further refinement of l-diversity. The t-closeness model extends the l-diversity model by treating the values of an attribute distinctly by taking into account the distribution of data values for that attribute.


Last modified : March 24, 2026