This section documents private endpoints made available on the Protegrity API Playground. The private endpoints are the low-level APIs that closely resemble the product’s functionality. The endpoints allow for more flexibility and customization than the rest of the set. There are no performance limitations imposed (other than the network lag).
Access the endpoints can be requested through your Protegrity contact (Sales Engineer, Field Engineer, or Customer Success Manager). Running the endpoints requires an additional authentication key that will be set up specifically for your organization and project.
Protect any data. Specify the data element and user name to apply during the transformation.
ATTRIBUTES:
data(required) Input data to transform.
data_element(required) Data element to apply when transforming data. Consult the Policy Definition section for the list of supported data elements and their characteristics.
user(required) Choose a user to impersonate from the list of pre-configured roles.
encoding(optional) Type of encoding used for input (and output) data. Note that all encodings are supported for all data elements with the exception of utf8 that cannot be used with encryption data elements. Accepts: [ hex | base64 | utf8 ]. Defaults to utf8.
external_iv(optional) Provide your custom initialization vector to introduce additional variance in the output. The IV may be a number, letter, special character, or a combination of those. Learn more about the IVs in the Key Concepts section. Note that to unprotect the data back to its original value, the external_iv has to be provided in the payload.
Unprotect previously de-identified data, i.e. reverse it to its original state and clear any transformations. Specify the data element and user name to apply during the transformation.
ATTRIBUTES:
data(required) Input data to transform.
data_element(required) Data element to apply when transforming data. Consult the Policy Definition section for the list of supported data elements and their characteristics.
user(required) Choose a user to impersonate from the list of pre-configured roles.
encoding(optional) Type of encoding used for input (and output) data. Note that all encodings are supported for all data elements with the exception of utf8 that cannot be used with encryption data elements. Accepts: [ hex | base64 | utf8 ]. Defaults to utf8.
external_iv(optional) Provide the custom initialization vector used to protect the data. Learn more about the IVs in the Key Concepts section.
Detect and classify sensitive data in a given input. The endpoint will return a classification and confidence score for every sensitive attribute found, alongside its location – a column name or a start and end index. Confidence score returns values from 0.1 to 1. The higher the confidence score, the more the product is sure of the PII classification it produced. We recommend using the confidence score to prioritize inspection of found sensitive data.
The endpoint does not take any attributes and accepts plain text as payload.
HEADERS:
Content-Type(required) Format of data sent in the payload. Set to text/plain for processing unstructured text. Accepts: [ text/plain ].
SAMPLE REQUEST
curl --location 'https://api.playground.protegrity.com/v1/private/classify'\
--header 'x-api-key: <Group_API_Key>'\
--header 'Content-Type: application/json'\
--header 'Authorization: <JWT_TOKEN>'\
--data 'Hello, this is Peregrine Grey from Air Industries, could you give me a call back to my mobile number 212-456-7890. Have a lovely day!'
import requests
import json
JWT_Token ="<JWT_TOKEN>"API_Key ="<Group_API_Key>"url ='https://api.playground.protegrity.com/v1/private/classify'headers = { 'x-api-key': API_Key, 'Content-Type': 'application/json', 'Authorization': JWT_Token }
data = {
"Hello, this is Peregrine Grey from Air Industries, could you give me a call back to my mobile number 212-456-7890. Have a lovely day!"}
response = requests.post(url, headers=headers, data=json.dumps(data))
print(response.text)
import java.net.HttpURLConnection;
import java.net.URI;
import java.net.URL;
import java.io.OutputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;
publicclassAPIRequest { publicstaticvoidmain(String[] args) {
try {
String JWT_Token ="<JWT_TOKEN>" String API_Key ="<Group_API_Key>" URI uri =new URI("https://api.playground.protegrity.com/v1/private/classify");
URL url = uri.toURL();
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("x-api-key", API_Key);
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Authorization", JWT_Token); conn.setDoOutput(true);
String jsonInputString ="{ \"Hello, this is Peregrine Grey from Air Industries, could you give me a call back to my mobile number 212-456-7890. Have a lovely day!\"}";
try (OutputStream os = conn.getOutputStream()) {
byte[] input = jsonInputString.getBytes("utf-8");
os.write(input, 0, input.length);
}
try (BufferedReader br =new BufferedReader(new InputStreamReader(conn.getInputStream(), "utf-8"))) {
StringBuilder response =new StringBuilder();
String responseLine =null;
while ((responseLine = br.readLine()) !=null) {
response.append(responseLine.trim());
}
System.out.println(response.toString());
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
fetch('https://api.playground.protegrity.com/v1/private/classify',
{ method:'POST',
headers: { 'x-api-key':'<Group_API_Key>', 'Content-Type':'application/json', 'Authorization':'<JWT_TOKEN>' },
body:"Hello, this is Peregrine Grey from Air Industries, could you give me a call back to my mobile number 212-456-7890. Have a lovely day!" })
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
packagemainimport (
"io""fmt""strings""net/http" )
funcmain() {
JWT_Token:="<JWT_TOKEN>"API_Key:="<Group_API_Key>"url:="https://api.playground.protegrity.com/v1/private/classify"data:=strings.NewReader(`{
"Hello, this is Peregrine Grey from Air Industries, could you give me a call back to my mobile number 212-456-7890. Have a lovely day!"
}`)
req, err:=http.NewRequest("POST", url, data)
iferr!=nil {
fmt.Println(err)
return }
req.Header.Set("x-api-key", API_Key)
req.Header.Set("Content-Type", "application/json")
req.Header.Set("Authorization", JWT_Token)
client:=&http.Client{}
resp, err:=client.Do(req)
iferr!=nil {
fmt.Println(err)
return }
deferresp.Body.Close()
body, err:=io.ReadAll(resp.Body)
iferr!=nil {
fmt.Println(err)
return }
fmt.Println(string(body))
}
You can pass multiple payloads within your request to the protect and unprotect endpoints, mixing together data formats and transformation types. In order to do that, your requests need to be structured in the following manner:
TOP LEVEL ATTRIBUTES
user(required) Choose a user to impersonate from the list of pre-configured roles.
arguments(optional) A nesting element used for passing multiple protection requests as an array.
NESTED ATTRIBUTES
The following attributes have to be provided for every payload sent:
id(optional) ID or label of a single for logging purposes. It will be appended to the query_id field.
data(required) Input data to transform.
data_element(required) Data element to apply when transforming data. Consult the Policy Definition section for the list of supported data elements and their characteristics.
encoding(optional) Type of encoding used for input (and output) data. Note that all encodings are supported for all data elements with the exception of utf8 that cannot be used with encryption data elements. Accepts: [ hex | base64 | utf8 ]. Defaults to utf8.
external_iv(optional) Provide your custom initialization vector to introduce additional variance in the output. The IV may be a number, letter, special character, or a combination of those. Learn more about the IVs in the Key Concepts section. Note that to unprotect the data back to its original value, the external_iv has to be provided in the payload.
This section describes the Policy configuration used by the API Playground. All listed data elements can be used when calling the private endpoints.
Policy Definition
Generic Data Elements
Data Element
Method
Use Case
UTF Set
LP
PP
eIV
Role
Admin
Finance
Marketing
HR
P
U
P
U
P
U
P
U
datetime
Tokenization
A date or datetime string. Formats accepted: YYYY/MM/DD HH:MM:SS and YYYY/MM/DD. Delimiters accepted: /, - (required).
N/A
N/A
N/A
No
✓
X
X
X
X
✓
X
X
datetime_yc
Tokenization
A date or datetime string. Formats accepted: YYYY/MM/DD HH:MM:SS and YYYY/MM/DD. Delimiters accepted: /, - (required). Leaves the year in the clear.
N/A
N/A
N/A
No
✓
X
X
X
X
✓
X
X
int
Tokenization
An integer string (4 bytes).
Numeric
No
No
Yes
✓
X
X
X
X
✓
X
X
number
Tokenization
A numeric string. May produce leading zeroes.
Numeric
No
No
Yes
✓
X
X
X
X
✓
X
*
string
Tokenization
An alphanumeric string.
Latin + Numeric
No
No
Yes
✓
X
X
X
X
✓
X
X
long_text
Encryption
A long string (e.g., a comment field) using any character set. Use hex or base64 encoding to utilize.
All
No
No
Yes
✓
X
X
X
X
✓
X
X
*The data is returned as masked.
PCI DSS Data Elements
Data Element
Method
Use Case
UTF Set
LP
PP
eIV
Role
Admin
Finance
Marketing
HR
P
U
P
U
P
U
P
U
ccn
Tokenization
Credit card numbers.
Numeric
No
No
Yes
✓
X
X
✓
X
X
X
*
ccn_bin
Tokenization
Credit card numbers. Leaves 8-digit BIN in the clear.
Numeric
No
No
Yes
✓
X
X
✓
X
X
X
*
iban
Tokenization
IBAN numbers. Preserves the length, case, and position of the input characters but may create invalid IBAN codes.
Latin + Numeric
Yes
Yes
No
✓
X
X
✓
X
X
X
*
iban_cc
Tokenization
IBAN numbers. Leaves letters in the clear.
Latin + Numeric
Yes
Yes
Yes
✓
X
X
✓
X
X
X
*
*The data is returned as masked.
PII Data Elements
Data Element
Method
Use Case
UTF Set
LP
PP
eIV
Role
Admin
Finance
Marketing
HR
P
U
P
U
P
U
P
U
address
Tokenization
Street names
Latin + Numeric
No
No
Yes
✓
X
X
✓
X
X
X
✓
city
Tokenization
Town or city name
Latin
No
No
Yes
✓
X
X
✓
X
✓
X
✓
dob
Tokenization
A date string. Formats accepted: YYYY/MM/DD and YYYY/MM/DD. Delimiters accepted: /, - (required).
N/A
N/A
No
No
✓
X
X
✓
X
✓
X
✓
dob_yc
Tokenization
A date string. Formats accepted: YYYY/MM/DD and YYYY/MM/DD. Delimiters accepted: /, - (required). Leaves the year in the clear.
N/A
N/A
No
No
✓
X
X
✓
X
✓
X
✓
email
Tokenization
Email address. Leaves the domain in the clear.
Latin + Numeric
No
No
Yes
✓
X
X
✓
X
✓
X
✓
nin
Tokenization
National Insurance Number. Preserves the length, case, and position of the input characters but may create invalid NIN codes.
Latin + Numeric
Yes
Yes
No
✓
X
X
X
X
X
X
X
name
Tokenization
Person's name
Latin
No
No
Yes
✓
X
X
✓
X
✓
X
✓
passport
Tokenization
Passport codes. Preserves the length, case, and position of the input characters but may create invalid passport numbers.
Latin + Numeric
Yes
Yes
No
✓
X
X
X
X
X
X
X
phone
Tokenization
Phone number. May produce leading zeroes.
Latin + Numeric
Yes
No
Yes
✓
X
X
X
X
X
X
X
postcode
Tokenization
Postal codes with digits and characters. Preserves the length, case, and position of the input characters but may create invalid post codes.
Latin + numeric
Yes
Yes
No
✓
X
X
✓
X
✓
X
✓
ssn
Tokenization
Social Security Number (US)
Latin + Numeric
Yes
No
Yes
✓
X
X
X
X
X
X
X
zipcode
Tokenization
Zip codes with digits only. May produce leading zeroes.
Numeric
Yes
No
Yes
✓
X
X
✓
X
✓
X
✓
PII Data Elements
Data Element
Method
Use Case
UTF Set
LP
PP
eIV
Role
Admin
Finance
Marketing
HR
P
U
P
U
P
U
P
U
address_de
Tokenization
Street names (German)
Latin + German + Numeric
No
No
Yes
✓
X
X
✓
X
X
X
✓
address_fr
Tokenization
Street names (French)
Latin + French + Numeric
No
No
Yes
✓
X
X
✓
X
X
X
✓
city_de
Tokenization
Town or city name (German)
Latin + German
No
No
Yes
✓
X
X
✓
X
✓
X
✓
city_fr
Tokenization
Town or city name (French)
Latin + French
No
No
Yes
✓
X
X
✓
X
✓
X
✓
name_de
Tokenization
Person's name (German)
Latin + German
No
No
Yes
✓
X
X
✓
X
✓
X
✓
name_fr
Tokenization
Person's name (French)
Latin + French
No
No
Yes
✓
X
X
✓
X
✓
X
✓
LEGEND
eIV External IV
LP Length Preservation
PP Position Preservation
P User group can protect data
U User group can unprotect data
6 - Anonymization SDK
Make your data unidentifiable
Anonymization is the underlying foundation of secure data sharing. Anonymization is a process of irreversible destruction of identifiable data through the removal of direct and indirect/quasi-identifiers in such a manner that the data subject is no longer identifiable. Process utilizes Data privacy models, risk models, and key metrics to redact and transform direct and quasi-identifiers to generalize a dataset.
This section showcases the Protegrity Anonymization SDK, its transformation methods, privacy models available, and the statistical features. Note that the Anonymization SDK has been adjusted for public use and as such it is not an exact representation of the GA product. The SDK is a Python library that can be imported to your Data Science environment, like Jupyter Notebook. The modified SDK requires a valid connection to the API Playground. For more information about requesting access to the Private Endpoints, please refer to our documentation.
[lib].Connection(url, api_key, jwt_token) Establish a connection to the Anonymization cluster. Set url to the Anonymization endpoint. Set the api_key to your group API Key, and the jwt_token to your token.
[lib].AnonElement(cluster, data, pty_storage=False) A dataframe object that represents the dataset to transform. In the job configuration, you will be calling transformations on each attribute fo the dataframe. Set cluster to the Anonymization cluster you authenticated to. Set the data to your dataframe, and keep the pty_storage as False.
[lib].K(num) assing to the dataframe to set the K value.
[lib].Redact() assign to an attribute to redact it.
[lib].Preserve() assign to an attribute to preserve it.
[lib].Gen_Interval([ level, [level] ], [ importance = num ]) specify the interval levels to generalize continuous data to – assign it to an attribute to transform. Specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].Gen_Tree(pd.DataFrame(data=tree))) specify the hierarchy to generalize categorical data to – assign it to an attribute to transform. Specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].MicroAgg(askdk.AggregateFunction.formula) specify a mathematical formula to group the data - assign it to an attribute to transform. Choose between mean and median for numeric data types (integer and decimal) or mode for all data types. Specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].Gen_Mask(maskchar=mask) specify the mask to obfuscate data – assign it to an attribute to transform. Provide the masking character as mask. Specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].Gen_Rounding([ level, [level] ], [ importance = num ]) specify the groups to round the data too. Both date-based and number-based rounding is supported. Specify its importance to instruct the algorithm to mantain the data granularity, if possible and set to high, or the opposite, if set to low.
[lib].LDiv(lfactor=num) configure the l value of the sensitive attribure as l to mantain the l-diversity within each equivalence class.
[obj].config.k k-value setting of the dataframe (needs to be updated with [lib].K(num)).
[obj].config[‘maxSuppression’] specify the maximum fraction of records allowed to be removed from the dataset to achieve the set privacy goals.
[obj].assign assign multiple settings to the dataframe.
[obj].describe list the characteristics and assigned transformations of each attribute in the dataframe.
[lib].anonymize(obj, pty_storage=False) run the anonymization job. Set the obj to your annotated dataframe, and keep the pty_storage as False. When assigned to an attribute it can be referenced later as a job object.
[job].status() monitor the status of the anonymization job.
[job].result() return the result of the anonymization job.
[job].riskStat() return the risk and utility statistics associated with the anonymization job.
Example
You can use the provided examples to get a feel of how Anonymization SDK works with data. The sample processes a dataset of synthetic banking customers, constructs an anonymization job, and measures the utility and re-identification risk of the produced dataset.