Anonymization SDK
Anonymization is the underlying foundation of secure data sharing. Anonymization is a process of irreversible destruction of identifiable data through the removal of direct and indirect/quasi-identifiers in such a manner that the data subject is no longer identifiable. The process utilizes data privacy models to redact and transform direct and quasi-identifiers to generalize a dataset, and in return produces risk and utility metrics.
This section showcases the Protegrity Anonymization SDK, its transformation methods, privacy models available, and the statistical features. Note that the Anonymization SDK has been adjusted for public use and as such it is not an exact representation of the GA product. The SDK is a Python library that can be imported to your Data Science environment, like Jupyter Notebook. The modified SDK requires a valid connection to the API Playground. For more information about requesting access to the Private Endpoints, please refer to our documentation.
Usage
SDK: Email us at playground@protegrity.com to request the SDK.
METHOD: POST
, GET
ENDPOINT: https://anon.playground.protegrity.com
FUNCTIONS:
[lib].Connection(url, api_key, jwt_token)
Establish a connection to the Anonymization cluster. Set url
to the Anonymization endpoint. Set the api_key
to your group API Key, and the jwt_token
to your token.
[lib].AnonElement(cluster, data, pty_storage=False)
A dataframe object that represents the dataset to transform. In the job configuration, you will be calling transformations on each attribute fo the dataframe. Set cluster
to the Anonymization cluster you authenticated to. Set the data
to your dataframe, and keep the pty_storage
as False
.
[lib].K(num)
assing to the dataframe to set the K value.
[lib].Redact()
assign to an attribute to redact it.
[lib].Preserve()
assign to an attribute to preserve it.
[lib].Gen_Interval([level, [level]], [importance = num])
specify the interval levels to generalize continuous data to – assign it to an attribute to transform. Optionally, specify its importance
to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].Gen_Tree(pd.DataFrame(data=tree), [importance = num])
specify the hierarchy to generalize categorical data to – assign it to an attribute to transform. Optionally, specify its importance
to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].MicroAgg(askdk.AggregateFunction.formula, , [importance = num])
specify a mathematical formula to group the data - assign it to an attribute to transform. Choose between mean
and median
for numeric data types (integer and decimal) or mode
for all data types. Optionally, specify its importance
to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].Gen_Mask(maskchar=mask, [importance = num])
specify the mask to obfuscate data – assign it to an attribute to transform. Provide the masking character as mask
. Optionally, specify its importance
to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].Gen_Rounding([level, [level]], [importance = num])
specify the groups to round the data too. Both date-based and number-based rounding is supported. Optionally, specify its importance
to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].LDiv(lfactor=num)
configure the l value of the sensitive attribure as l
to mantain the l-diversity within each equivalence class.
[obj].config.k
k-value setting of the dataframe (needs to be updated with [lib].K(num)
).
[obj].config[‘maxSuppression’]
specify the maximum fraction of records allowed to be removed from the dataset to achieve the set privacy goals.
[obj].assign(list, [lib].function)
apply the same function as [lib].function
to multiple attributes passed in a list
.
[obj].describe
list the characteristics and assigned transformations of each attribute in the dataframe.
[lib].anonymize(obj, pty_storage=False)
run the anonymization job. Set the obj
to your annotated dataframe, and keep the pty_storage
as False
. When assigned to an attribute it can be referenced later as a job object.
[job].status()
monitor the status of the anonymization job.
[job].result()
return the result of the anonymization job.
[job].riskStat()
return the risk and utility statistics associated with the anonymization job.
Example
You can use the provided examples to get a feel of how Anonymization SDK works with data. The sample processes a dataset of synthetic banking customers, constructs an anonymization job, and measures the utility and re-identification risk of the produced dataset.
Download the preconfigured Jupyter Notebook
View the Jupyter Notebook (HTML)
Last modified June 27, 2025