Anonymization SDK

Make your data unidentifiable

Anonymization is the underlying foundation of secure data sharing. Anonymization is a process of irreversible destruction of identifiable data through the removal of direct and indirect/quasi-identifiers in such a manner that the data subject is no longer identifiable. The process utilizes data privacy models to redact and transform direct and quasi-identifiers to generalize a dataset, and in return produces risk and utility metrics.

This section showcases the Protegrity Anonymization SDK, its transformation methods, privacy models available, and the statistical features. Note that the Anonymization SDK has been adjusted for public use and as such it is not an exact representation of the GA product. The SDK is a Python library that can be imported to your Data Science environment, like Jupyter Notebook. The modified SDK requires a valid connection to the API Playground. For more information about requesting access to the Private Endpoints, please refer to our documentation.

Usage

SDK: Email us at playground@protegrity.com to request the SDK.

METHOD: POST, GET

ENDPOINT: https://anon.playground.protegrity.com

FUNCTIONS:

[lib].Connection(url, api_key, jwt_token) Establish a connection to the Anonymization cluster. Set url to the Anonymization endpoint. Set the api_key to your group API Key, and the jwt_token to your token.

[lib].AnonElement(cluster, data, pty_storage=False) A dataframe object that represents the dataset to transform. In the job configuration, you will be calling transformations on each attribute fo the dataframe. Set cluster to the Anonymization cluster you authenticated to. Set the data to your dataframe, and keep the pty_storage as False.

[lib].K(num) assing to the dataframe to set the K value.

[lib].Redact() assign to an attribute to redact it.

[lib].Preserve() assign to an attribute to preserve it.

[lib].Gen_Interval([level, [level]], [importance = num]) specify the interval levels to generalize continuous data to – assign it to an attribute to transform. Optionally, specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.

[lib].Gen_Tree(pd.DataFrame(data=tree), [importance = num]) specify the hierarchy to generalize categorical data to – assign it to an attribute to transform. Optionally, specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.

[lib].MicroAgg(askdk.AggregateFunction.formula, , [importance = num]) specify a mathematical formula to group the data - assign it to an attribute to transform. Choose between mean and median for numeric data types (integer and decimal) or mode for all data types. Optionally, specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.

[lib].Gen_Mask(maskchar=mask, [importance = num]) specify the mask to obfuscate data – assign it to an attribute to transform. Provide the masking character as mask. Optionally, specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.

[lib].Gen_Rounding([level, [level]], [importance = num]) specify the groups to round the data too. Both date-based and number-based rounding is supported. Optionally, specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.

[lib].LDiv(lfactor=num) configure the l value of the sensitive attribure as l to mantain the l-diversity within each equivalence class.

[obj].config.k k-value setting of the dataframe (needs to be updated with [lib].K(num)).

[obj].config[‘maxSuppression’] specify the maximum fraction of records allowed to be removed from the dataset to achieve the set privacy goals.

[obj].assign(list, [lib].function) apply the same function as [lib].function to multiple attributes passed in a list.

[obj].describe list the characteristics and assigned transformations of each attribute in the dataframe.

[lib].anonymize(obj, pty_storage=False) run the anonymization job. Set the obj to your annotated dataframe, and keep the pty_storage as False. When assigned to an attribute it can be referenced later as a job object.

[job].status() monitor the status of the anonymization job.

[job].result() return the result of the anonymization job.

[job].riskStat() return the risk and utility statistics associated with the anonymization job.

Example

You can use the provided examples to get a feel of how Anonymization SDK works with data. The sample processes a dataset of synthetic banking customers, constructs an anonymization job, and measures the utility and re-identification risk of the produced dataset.

Download the source dataset

Download the preconfigured Jupyter Notebook

View the Jupyter Notebook (HTML)



Last modified June 27, 2025