Introduction on

Business cases

Mon, 01 Jan 0001 00:00:00 +0000

Consider the following business cases:

Case 1: A hospital wants to share patient data with a third-party research lab. The privacy of the patient, however, must be preserved.
Case 2: An organization requires customer data from several credit unions to create training data. The data will be used to train machine learning models looking for new insights. The customers, however, have not agreed to their data to be used.
Case 3: An organization which must be compliant with GDPR, CCPA, or other privacy regulations requires to keep some information beyond the period that meets regulations.
Case 4: An organization requires raw data to train their software for machine learning.

In all these cases, data forms an integral part of the source for continuing the business process or analysis. Additionally, only what was done is required in all the cases, who did it does not have any value in the data. In this case, personal information about individual users can be removed from the dataset. This removes the personal factor from the data and at the same time retains the value of the data from the business point of view. This data, since it does not have any private information, is also pulled from the legal requirements governing the data.

Data security and data privacy

Mon, 01 Jan 0001 00:00:00 +0000

Most organizations understand the need to secure access to personally identifiable information. Sensitive values in records are often protected at rest (storage), in transit (network) and in use (fine-grained access control), through a process known as de-identification. De-Identification is a spectrum where data security and data privacy issues must be balanced with data usability.

Pseudonymization

Pseudonymization is the process of de-identification by substituting sensitive values with a consistent, non-sensitive value. This is most often accomplished through encryption, tokenization, or dynamic data masking. Access to the process for re-identification (decryption, detokenization, unmasking) is controlled, so that only users with a business requirement will see the sensitive values.

Importance and types of data

Mon, 01 Jan 0001 00:00:00 +0000

A record consists of all the information pertaining to a user. This record consists of different fields of information, such as first name, last name, address, telephone number, age, and so on. These records might be linked with other records, such as income statements or medical records to provide valuable information. A record is made up of various fields and is private and user-centric. However, the individual fields may or may not be personal. Accordingly, based on the privacy level, the following data classifications are available:

Data Anonymization Techniques

Mon, 01 Jan 0001 00:00:00 +0000

Important terminology

De-identification: A general term for any process of removing the association between a set of identifying data and the data subject.
Pseudonymization: A particular type of data de‑identification that removes the direct association with a data subject. It replaces that association by linking a specific set of characteristics to one or more pseudonyms.
Anonymization: A process that removes the association between the identifying dataset and the data subject. Anonymization is another subcategory of de-identification. Unlike pseudonymization, it does not provide a means by which the information may be linked to the same person across multiple data records or information systems. Hence reidentification of anonymized data is not possible.

Note: As defined in ISO/TS 25237:2008.

How Protegrity Anonymization Works

Mon, 01 Jan 0001 00:00:00 +0000

Protegrity Anonymization is a software solution that processes data by removing personal information and transforming the remaining details to protect privacy.

In simple terms, it takes raw data as input, applies techniques like generalization and summarization, and outputs anonymized data that can still be used for analysis—without revealing individual identities. The following figure illustrates this process.

As shown in the above image, a sample table is fed as input into Protegrity Anonymization. The private data that can be used to identify a particular individual is removed from the table. The final table with anonymized information is provided as output. The output table shows data loss due to column and row removals during Protegrity Anonymization. This data loss is necessary to mitigate the risk of de-identification.