This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Code Samples
Example library
Feel like it’s time to try thing out? Check out our code samples for examples of how the Playground can be called from your environment. We are constantly updating this section and adding more content, so stay tuned.
Please note that the samples included in this section include product functionality that has been simplified for demonstration purposes and will vary from production-grade implementations.
1 - AWS Glue Integration
Automated data protection using Glue and the Playground
Author: Dilraj Singh
About this Sample
In this scenario we will explore the use case of automatic data discovery and protection of an unstructured text file. As data lands in S3, it will be de-identified before it is made available for future processing.
This code sample combines the powers of AWS Glue and the Protegrity’s API Playground. AWS Glue is a serverless data integration service that is widely used as an ETL tool to move data from and to AWS and non-AWS data sources. We will utilize Glue to orchestrate an ETL job comprised of picking up data received in an S3 bucket, sending it to the Playground for automatic data classification and protection, and writing it to a target output directory.
Disclaimer
Non-GA Functionality: for demonstration purposes only. Note that Protegrity GenAI Security features are currently in Preview. Protegrity releases and supports an official product, the S3 Cloud Storage Protector to protect data in S3. The product is optimized for best performance, scalability, and security. We advise using this sample only to demonstrate the functionality.
Prerequisites
- Protegrity API Playground activated account
- Access to a AWS non-production account
- Access to Glue, S3, IAM, Lambda, and CloudWatch
IAM Role Setup
- Create 2 new IAM roles:
LambdaInvokeGlue
for the Lambda service. The role must be able to run Glue jobs and create logs.
GlueS3ReadWrite
for the Glue service. The role must be able to write and read from S3 and create logs. You can use the AWSGlueServiceRole
policy and tune it as necessary.
Lambda Setup
-
Go to AWS Lambda and create a new Lambda function. Call it: Glue Trigger
. Set the runtime to Python 3.13. Attach the LambdaInvokeGlue
IAM role to the function execution.
-
Once the serverless function is created, pass the following in the code source and hit deploy:
import boto3
import os
def lambda_handler(event, context):
JobName='GlueClassifyProtectWrite'
glue_client = boto3.client('glue', region_name='us-east-1')
print(event)
source_bucket = event['Records'][0]['s3']['bucket']['name']
source_key = event['Records'][0]['s3']['object']['key']
if source_key.endswith('.txt'):
# Pass bucket and object key to the Glue job
response = glue_client.start_job_run(
JobName='GlueClassifyProtectWrite',
Arguments={
'--JobName' : JobName,
'--source_bucket': source_bucket,
'--source_key': source_key
}
)
return {
'statusCode': 200,
'body': f"Started Glue job with ID {response['JobRunId']}"
}
else:
return {
'statusCode': 200,
'body': "Unsupported file format provided, ignoring."
}
This is the trigger definition that will start a Glue job when a text file is found in the S3 bucket.
S3 Setup
-
Create 2 buckets for this exercise. Call them anything you’d like, and to distinguish between the two, append -input
and -output
to their names. The policy of the input S3 directory should allow GlueS3ReadWrite
to write files with s3:GetObject
and the output directory should allow the role to read files with s3:PutObject
.
-
Set up a new event notification on the -input
bucket. Call it TextTrigger
and filter the suffix to only pick up files ending with .txt
. Set the destination as the Glue Trigger
Lambda function.
Glue Setup
-
The final step is creating the Glue job. Open AWS Glue and create a new ETL job. Let’s call it GlueClassifyProtectWrite
(if you choose a different name, make sure to update the Lambda function). Set the type to Spark, version to Glue 5.0 and the language to Python 3. The execution role should be set to the Glue service role, GlueS3ReadWrite
. You can keep the other defaults.
-
In the script, paste the following code. In the next step we will update the sample to match your own environment.
import requests
import boto3
import json
import shutil
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JobName',
'source_bucket',
'source_key'])
# S3 Bucket and File configuration
source_bucket = args['source_bucket']
source_key = args['source_key']
target_bucket = "api-playground-glue-output"
target_key = source_key.replace(".txt", "-protected.txt")
# Connection to S3
s3 = boto3.client('s3', region_name = "")
# Playground Login
logon_response = requests.post('https://api.playground.protegrity.com/auth/login',
headers={'Content-Type': 'application/json'},
verify= False,
json={ "email": "",
"password": ""})
# Retrieve JWT Token to authenticate requests
response_data = logon_response.json()
JWT_TOKEN = response_data['jwt_token']
# API Playground URL
API_URL = "https://api.playground.protegrity.com/v1/ai"
API_KEY = ""
API_VERSION = "v1"
# Request headers
headers = {
'Content-Type': 'application/json',
'x-api-key' : f"{API_KEY}",
'Authorization': 'Bearer ' f"{JWT_TOKEN}"
}
# Read function
def read_bucket(source_bucket, source_key):
try:
# Get the file from the source bucket
response = s3.get_object(Bucket=source_bucket, Key=source_key)
# Get the file content (binary)
file_content = response['Body'].read()
print(file_content)
# Assuming it's a text file, decode it to a string
file_text = file_content.decode('utf-8')
print(f"Successfully read the file {source_bucket}/{source_key}:")
except Exception as e:
print(f"Error: {e}")
return file_text
# Write function
def write_bucket(target_bucket, target_key, file_text):
try:
s3.put_object(Bucket = target_bucket, Key = target_key, Body=file_text)
print(f"Successfully copied {source_key} from {source_bucket} to {target_bucket}/{target_key}")
except Exception as e:
print(f"Error: {e}")
# Classify and Protect function – Calls API Playground
def classify_protect (file_text):
data_json = {"operation": "protect", "options": {"type": "mask", "tags" : False, "threshold": 0.6}, "data": [f"{file_text}"]}
response = requests.post(API_URL, json = data_json, headers=headers, verify= False)
return response.json()
# Run the job
file_text = read_bucket(source_bucket, source_key)
api_response = classify_protect(file_text)
response_data = api_response["results"]
write_bucket(target_bucket, target_key, response_data)
-
Adjust the script with your environment details, specifically the lines:
- within the Connection to S3 section, provide the region of your input and output buckets, e.g.
s3 = boto3.client(‘s3’, region_name = “us-east-1”)
- within the Playground Login section, specify your email and password used to authenticate with the Playground
- within the API Playground URL section, provide your API Key used to authorize your Playground requests
- Save your Glue job. We’re ready to roll!
Automatic Data Classification and Protection of Unstructured Files
This scenario showcases processing of an unstructured text file. The file contents are classified and protected entirely by the Protegrity API Playground.
-
Go to the -input
bucket and drop there a text file that you wish to de-identify. You can use our sample or provide your own.
Alexandra Rivera
Address: 1258 Maplewood Drive
Springfield, IL 62704
Phone: (217) 555-3927
Email: arivera82@email.com
Date: March 21, 2025
To:
Customer Disputes Department
First Horizon Credit Bank
4801 Westlake Blvd
Austin, TX 73301
Subject: Dispute of Unauthorized Credit Card Charge
Dear Customer Disputes Department,
I am writing to formally dispute a charge on my credit card account that I did not authorize.
Cardholder Name: Alexandra Rivera
Credit Card Number: 3709888761001982
Date of Charge: March 17, 2025
Amount: $198.76
Merchant Name: "TechMart Online – NY"
I did not authorize this transaction and have never conducted business with the above-mentioned merchant. I became aware of this charge after reviewing my recent statement and immediately verified that neither I nor anyone with authorized access to my account made this purchase.
In accordance with the Fair Credit Billing Act, I am requesting that this charge be removed from my account, that any related interest or fees be reversed, and that a corrected statement be issued as soon as possible. Please investigate this matter and notify me of the outcome.
Enclosed with this letter is a copy of my most recent statement highlighting the disputed charge. I have also taken the precaution of temporarily suspending the card to prevent any further unauthorized use.
Please confirm receipt of this letter and provide a timeline for the resolution of this issue. Should you require any additional information or documentation, feel free to contact me at the number or email address listed above.
Thank you for your prompt attention to this matter.
Sincerely,
Alexandra Rivera
-
The processing will take up to a minute to complete. Monitor the Glue logs to see the progress.
-
Once the job finalizes, you will see a de-identified file in your -output
bucket. It will look akin to this:
################
Address: 1258 ###############
###########, ## 62704
Phone: ##############
Email: ###################
Date: March 21, 2025
To:
Customer Disputes Department
First Horizon Credit Bank
4801 #############
######, ## 73301
Subject: Dispute of Unauthorized Credit Card Charge
Dear Customer Disputes Department,
I am writing to formally dispute a charge on my credit card account that I did not authorize.
Cardholder Name: ################
Account Number: ################
Credit Card Number: ################
Date of Charge: March 17, 2025
Amount: $198.76
Merchant Name: "TechMart Online – NY"
I did not authorize this transaction and have never conducted business with the above-mentioned merchant. I became aware of this charge after reviewing my recent statement and immediately verified that neither I nor anyone with authorized access to my account made this purchase.
In accordance with the Fair Credit Billing Act, I am requesting that this charge be removed from my account, that any related interest or fees be reversed, and that a corrected statement be issued as soon as possible. Please investigate this matter and notify me of the outcome.
Enclosed with this letter is a copy of my most recent statement highlighting the disputed charge. I have also taken the precaution of temporarily suspending the card to prevent any further unauthorized use.
Please confirm receipt of this letter and provide a timeline for the resolution of this issue. Should you require any additional information or documentation, feel free to contact me at the number or email address listed above.
Thank you for your prompt attention to this matter.
Sincerely,
################
Feel free to adjust the configuration of your Glue job to set your own risk tolerance (by choosing a low or high threshold
), adding classification tags, or choosing another protection type.
Summary
Pairing AWS Glue with Protegrity is a powerful value proposition: the combination facilitates custom file uploads to the Cloud whilst achieving high security and control over what is being shared. With the flexibility of Glue and the API-first approach of the Playground (and, by extension, Protegrity), the job is automated and scalable, ensuring that the best practices of data security are embedded in your cloud platform from the get-go.
This sample can be further extended to other file formats, other target systems, and platforms. You may want to leverage a data catalog to scan and store the file metadata (recommended for structured files). You may also decide to combine this example with our other samples (such as data unprotection in Snowflake).
2 - Snowflake Integration
Call the Playground from your Snowflake instance
Authors: Mark Tritaris, Iwona Rajca
About this Sample
The Playground can be easily leveraged from your Snowflake environment using Snowflake’s External Network Access functionality. An external access integration allows users to write their own UDFs that access external locations, such as the Playground APIs.
Disclaimer
Non-GA Functionality: for demonstration purposes only. Protegrity recommended and supported integration pattern is through
Snowflake External Functions which aligns with InfoSec’s best practices. Protegrity releases an official product, the Protegrity Snowflake Protector, for protecting data within Snowflake. The product is optimized for best performance, scalability, and security. We advise using this sample only to demonstrate the functionality.
Prerequisites
- Protegrity API Playground activated account
- Access to a Snowflake (test) instance
- Ability to make external calls from Snowflake
- Snowflake grant to:
CREATE SECRET
, REPLACE SECRET
,CREATE SECRET
, REPLACE SECRET
, CREATE FUNCTION
, REPLACE FUNCTION
, CREATE EXTERNAL ACCESS INTEGRATION
, REPLACE EXTERNAL ACCESS INTEGRATION
, CREATE NETWORK RULE
, REPLACE NETWORK RULE
- A sample table with some sensitive (test) data that needs protecting
Snowflake Setup
- Open a new worksheet.
- Choose the authorized role to run this sample. In our case we are using the
accountadmin
.
- Choose the warehouse and the database to use, e.g.
USE DATABASE PROTEGRITY
.
Playground Setup
-
To get any responses from the Playground, we’ll need to log in to it first. It’s recommended to create a mechanism to retrieve the JWT token from the Playground. This is not required for the integration itself but it will keep the code reusable: the Playground’s authentication token expires after 24 hours. Paste the following to your worksheet, then run it:
-- Generate a new JWT Token for the Playground
CREATE OR REPLACE FUNCTION pty_login(email text, password text)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'pty_login'
EXTERNAL_ACCESS_INTEGRATIONS = (pty_external_access_integration)
PACKAGES = ('requests', 'simplejson')
AS
$$
import _snowflake
import simplejson as json
import requests
def pty_login(email, password):
body = {
"email": email,
"password": password
}
url = "https://api.playground.protegrity.com/auth/login"
headers = {
"Content-type": "application/json"
}
session = requests.Session()
response = session.post(url, json=body, headers=headers)
response_as_json = json.loads(response.text)
return response_as_json['jwt_token']
$$;
-
Log in to the Playground by providing your email and password:
SELECT pty_login('YOUR_EMAIL','YOUR_PASSWORD') as jwt;
-
Construct a new secret to envelop all your account information. The secret will be very handy for authenticating your requests to the API: our fuction will simply retrieve your details when needed. Provide your API Key in the api_token
field and replace the jwt_token
value with the token returned from the previous query.
CREATE OR REPLACE SECRET pty_playground_login
TYPE = GENERIC_STRING
SECRET_STRING = '{
"clientName":"API Playground User",
"api_token":"YOUR_API_KEY",
"jwt_token": "YOUR_JWT_TOKEN"}';
-
Create a network rule that points to the Protegrity API main url, and create an external access integration that allows usage of the network rule and the secret we previously created.
-- Create a network rule to Protegrity API
CREATE OR REPLACE NETWORK RULE pty_network_rule
MODE = EGRESS
TYPE = HOST_PORT
VALUE_LIST = ('api.playground.protegrity.com');
-- Create an integration using the network rule and secret
CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION pty_external_access_integration
ALLOWED_NETWORK_RULES = (pty_network_rule)
ALLOWED_AUTHENTICATION_SECRETS = (pty_playground_login)
ENABLED = true;
Data Protection in Action
It’s time we put everything together and start protecting some data.
-
Create a universal UDF to call Protegrity endpoints for protecting data. The function will accept the name of the endpoint to call, and the data to protect.
-- Create a UDF to de-identify a single value
CREATE OR REPLACE FUNCTION pty_protect(endpoint text, data text)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = 3.8
HANDLER = 'pty_protect'
EXTERNAL_ACCESS_INTEGRATIONS = (pty_external_access_integration)
PACKAGES = ('requests', 'simplejson')
SECRETS = ('cred' = pty_playground_login)
AS
$$
import _snowflake
import simplejson as json
import requests
def pty_protect(endpoint, data):
credentials = json.loads(_snowflake.get_generic_secret_string('cred'), strict=False)
body = {
"operation": "protect",
"data": [
data
]
}
url = "https://api.playground.protegrity.com/v1/" + endpoint
headers = {
"x-api-key": credentials["api_token"],
"Authorization": credentials["jwt_token"],
"Content-type": "application/json"
}
session = requests.Session()
response = session.post(url, json=body, headers=headers)
response_as_json = json.loads(response.text)
return response_as_json[0]
$$;
-
Run the UDF to make sure it’s working as expected. To protect data, you can choose any of our published data protection endpoints, for example name
:
SELECT pty_protect('name','Mickey Mouse') as name;
You should receive a tokenized version of Mickey’s name: oWfepC TGEdC
. At last his privacy is protected.
-
You can now start protecting your data stored in Snowflake tables. Choose your test table and run a select statement, wrapping the column names in your function calls. Make sure to match the type of the data you are protecting with the appropriate tokenization type: for example, choose the name
for first and last names, or ssn
for Social Security Numbers. Try the query on a limited sample size first – you wouldn’t want to run out of your Playground credits!
Example SQL query:
select
pty_protect('name',first_name) as protected_first_name,
pty_protect('name',last_name) as protected_last_name,
pty_protect('email',email) as protected_email,
pty_protect('ssn',ssn) as protected_ssn,
pty_protect('iban',iban) as protected_iban,
pty_protect('dob',birthday) as protected_dob
from my_table limit 5;
Note that the query processes each field separately, which adds to the total processing time. In a production scenario, batch processing would be recommended for achieving best performance.
Summary
That’s it! This example builds a sample integration that allows Snowflake users to protect data. You can extend this sample by building an unprotect function and incorporating different user roles. Or, creating a function that accepts options, such as dictionary for French and German.