This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Code Samples

Example library

Feel like it’s time to try thing out? Check out our code samples for examples of how the Playground can be called from your environment. We are constantly updating this section and adding more content, so stay tuned.

Please note that the samples included in this section include product functionality that has been simplified for demonstration purposes and will vary from production-grade implementations.

1 - AWS Glue Integration

Automated data protection using Glue and the Playground

Author: Dilraj Singh

About this Sample

In this scenario we will explore the use case of automatic data discovery and protection of an unstructured text file. As data lands in S3, it will be de-identified before it is made available for future processing.

This code sample combines the powers of AWS Glue and the Protegrity’s API Playground. AWS Glue is a serverless data integration service that is widely used as an ETL tool to move data from and to AWS and non-AWS data sources. We will utilize Glue to orchestrate an ETL job comprised of picking up data received in an S3 bucket, sending it to the Playground for automatic data classification and protection, and writing it to a target output directory.

Prerequisites

  • Protegrity API Playground activated account
  • Access to a AWS non-production account
  • Access to Glue, S3, IAM, Lambda, and CloudWatch

IAM Role Setup

  1. Create 2 new IAM roles:
  • LambdaInvokeGlue for the Lambda service. The role must be able to run Glue jobs and create logs.
  • GlueS3ReadWrite for the Glue service. The role must be able to write and read from S3 and create logs. You can use the AWSGlueServiceRole policy and tune it as necessary.

Lambda Setup

  1. Go to AWS Lambda and create a new Lambda function. Call it: Glue Trigger. Set the runtime to Python 3.13. Attach the LambdaInvokeGlue IAM role to the function execution.

  2. Once the serverless function is created, pass the following in the code source and hit deploy:

    
    import boto3
    import os
    
    def lambda_handler(event, context):
        JobName='GlueClassifyProtectWrite'
        glue_client = boto3.client('glue', region_name='us-east-1')
        print(event)
        source_bucket = event['Records'][0]['s3']['bucket']['name']
        source_key = event['Records'][0]['s3']['object']['key']
     
        if source_key.endswith('.txt'):
            # Pass bucket and object key to the Glue job
            response = glue_client.start_job_run(
                JobName='GlueClassifyProtectWrite',
                Arguments={
                    '--JobName' : JobName,
                    '--source_bucket': source_bucket,
                    '--source_key': source_key
                }
            )
            
            return {
                'statusCode': 200,
                'body': f"Started Glue job with ID {response['JobRunId']}"
            }
        else:
            return {
                'statusCode': 200,
                'body': "Unsupported file format provided, ignoring."
            }
        

    This is the trigger definition that will start a Glue job when a text file is found in the S3 bucket.

S3 Setup

  1. Create 2 buckets for this exercise. Call them anything you’d like, and to distinguish between the two, append -input and -output to their names. The policy of the input S3 directory should allow GlueS3ReadWrite to write files with s3:GetObject and the output directory should allow the role to read files with s3:PutObject.

  2. Set up a new event notification on the -input bucket. Call it TextTrigger and filter the suffix to only pick up files ending with .txt. Set the destination as the Glue Trigger Lambda function.

Glue Setup

  1. The final step is creating the Glue job. Open AWS Glue and create a new ETL job. Let’s call it GlueClassifyProtectWrite (if you choose a different name, make sure to update the Lambda function). Set the type to Spark, version to Glue 5.0 and the language to Python 3. The execution role should be set to the Glue service role, GlueS3ReadWrite. You can keep the other defaults.

  2. In the script, paste the following code. In the next step we will update the sample to match your own environment.

    
    import requests
    import boto3
    import json
    import shutil 
    import sys
    from awsglue.utils import getResolvedOptions
    
    args = getResolvedOptions(sys.argv,
                              ['JobName',
                               'source_bucket',
                               'source_key'])
    
    # S3 Bucket and File configuration
    source_bucket = args['source_bucket']
    source_key = args['source_key']
    target_bucket = "api-playground-glue-output"
    target_key = source_key.replace(".txt", "-protected.txt")
    
    # Connection to S3
    s3 = boto3.client('s3', region_name = "")
    
    # Playground Login
    logon_response = requests.post('https://api.playground.protegrity.com/auth/login',
                                  headers={'Content-Type': 'application/json'},
                                  verify= False,
                                  json={ "email": "",
                                         "password": ""})
    
    # Retrieve JWT Token to authenticate requests
    response_data = logon_response.json()
    JWT_TOKEN = response_data['jwt_token']
    
    # API Playground URL
    API_URL     = "https://api.playground.protegrity.com/v1/ai"
    API_KEY   = ""
    API_VERSION = "v1"
    
    # Request headers
    headers = {
        'Content-Type': 'application/json',
        'x-api-key' : f"{API_KEY}",
        'Authorization': 'Bearer ' f"{JWT_TOKEN}"
    }
    
    # Read function
    def read_bucket(source_bucket, source_key):
        try:
            # Get the file from the source bucket
            response = s3.get_object(Bucket=source_bucket, Key=source_key)
            # Get the file content (binary)
            file_content = response['Body'].read() 
            print(file_content)
            # Assuming it's a text file, decode it to a string   
            file_text = file_content.decode('utf-8')  
            print(f"Successfully read the file {source_bucket}/{source_key}:") 
        except Exception as e:
            print(f"Error: {e}")
        return file_text
    
    # Write function
    def write_bucket(target_bucket, target_key, file_text):
        try:
            s3.put_object(Bucket = target_bucket, Key = target_key, Body=file_text)
            print(f"Successfully copied {source_key} from {source_bucket} to {target_bucket}/{target_key}")
    
        except Exception as e:
            print(f"Error: {e}")
    
    # Classify and Protect function – Calls API Playground
    def classify_protect (file_text):
        data_json = {"operation": "protect", "options": {"type": "mask", "tags" : False, "threshold": 0.6}, "data": [f"{file_text}"]}
        response = requests.post(API_URL, json = data_json, headers=headers, verify= False)
    
        return response.json()
    
    # Run the job
    file_text = read_bucket(source_bucket, source_key)
    api_response = classify_protect(file_text)
    response_data = api_response["results"]
    write_bucket(target_bucket, target_key, response_data)
        
  3. Adjust the script with your environment details, specifically the lines:

  • within the Connection to S3 section, provide the region of your input and output buckets, e.g. s3 = boto3.client(‘s3’, region_name = “us-east-1”)
  • within the Playground Login section, specify your email and password used to authenticate with the Playground
  • within the API Playground URL section, provide your API Key used to authorize your Playground requests
  1. Save your Glue job. We’re ready to roll!

Automatic Data Classification and Protection of Unstructured Files

This scenario showcases processing of an unstructured text file. The file contents are classified and protected entirely by the Protegrity API Playground.

  1. Go to the -input bucket and drop there a text file that you wish to de-identify. You can use our sample or provide your own.

    
    Alexandra Rivera
    Address: 1258 Maplewood Drive
    Springfield, IL 62704
    Phone: (217) 555-3927
    Email: arivera82@email.com
    
    Date: March 21, 2025
    
    To:
    Customer Disputes Department
    First Horizon Credit Bank
    4801 Westlake Blvd
    Austin, TX 73301
    
    Subject: Dispute of Unauthorized Credit Card Charge
    
    Dear Customer Disputes Department,
    
    I am writing to formally dispute a charge on my credit card account that I did not authorize.
    
    Cardholder Name: Alexandra Rivera
    Credit Card Number: 3709888761001982
    Date of Charge: March 17, 2025
    Amount: $198.76
    Merchant Name: "TechMart Online – NY"
    
    I did not authorize this transaction and have never conducted business with the above-mentioned merchant. I became aware of this charge after reviewing my recent statement and immediately verified that neither I nor anyone with authorized access to my account made this purchase.
    
    In accordance with the Fair Credit Billing Act, I am requesting that this charge be removed from my account, that any related interest or fees be reversed, and that a corrected statement be issued as soon as possible. Please investigate this matter and notify me of the outcome.
    
    Enclosed with this letter is a copy of my most recent statement highlighting the disputed charge. I have also taken the precaution of temporarily suspending the card to prevent any further unauthorized use.
    
    Please confirm receipt of this letter and provide a timeline for the resolution of this issue. Should you require any additional information or documentation, feel free to contact me at the number or email address listed above.
    
    Thank you for your prompt attention to this matter.
    
    Sincerely,
    Alexandra Rivera
        
  2. The processing will take up to a minute to complete. Monitor the Glue logs to see the progress.

  3. Once the job finalizes, you will see a de-identified file in your -output bucket. It will look akin to this:

    
    ################
    Address: 1258 ###############
    ###########, ## 62704
    Phone: ##############
    Email: ###################
    
    Date: March 21, 2025
    
    To:
    Customer Disputes Department
    First Horizon Credit Bank
    4801 #############
    ######, ## 73301
    
    Subject: Dispute of Unauthorized Credit Card Charge
    
    Dear Customer Disputes Department,
    
    I am writing to formally dispute a charge on my credit card account that I did not authorize.
    
    Cardholder Name: ################
    Account Number: ################
    Credit Card Number: ################
    Date of Charge: March 17, 2025
    Amount: $198.76
    Merchant Name: "TechMart Online – NY"
    
    I did not authorize this transaction and have never conducted business with the above-mentioned merchant. I became aware of this charge after reviewing my recent statement and immediately verified that neither I nor anyone with authorized access to my account made this purchase.
    
    In accordance with the Fair Credit Billing Act, I am requesting that this charge be removed from my account, that any related interest or fees be reversed, and that a corrected statement be issued as soon as possible. Please investigate this matter and notify me of the outcome.
    
    Enclosed with this letter is a copy of my most recent statement highlighting the disputed charge. I have also taken the precaution of temporarily suspending the card to prevent any further unauthorized use.
    
    Please confirm receipt of this letter and provide a timeline for the resolution of this issue. Should you require any additional information or documentation, feel free to contact me at the number or email address listed above.
    
    Thank you for your prompt attention to this matter.
    
    Sincerely,
    ################
        

    Feel free to adjust the configuration of your Glue job to set your own risk tolerance (by choosing a low or high threshold), adding classification tags, or choosing another protection type.

Summary

Pairing AWS Glue with Protegrity is a powerful value proposition: the combination facilitates custom file uploads to the Cloud whilst achieving high security and control over what is being shared. With the flexibility of Glue and the API-first approach of the Playground (and, by extension, Protegrity), the job is automated and scalable, ensuring that the best practices of data security are embedded in your cloud platform from the get-go.

This sample can be further extended to other file formats, other target systems, and platforms. You may want to leverage a data catalog to scan and store the file metadata (recommended for structured files). You may also decide to combine this example with our other samples (such as data unprotection in Snowflake).

2 - Snowflake Integration

Call the Playground from your Snowflake instance

Authors: Mark Tritaris, Iwona Rajca

About this Sample

The Playground can be easily leveraged from your Snowflake environment using Snowflake’s External Network Access functionality. An external access integration allows users to write their own UDFs that access external locations, such as the Playground APIs.

Prerequisites

  • Protegrity API Playground activated account
  • Access to a Snowflake (test) instance
  • Ability to make external calls from Snowflake
  • Snowflake grant to: CREATE SECRET, REPLACE SECRET,CREATE SECRET, REPLACE SECRET, CREATE FUNCTION, REPLACE FUNCTION, CREATE EXTERNAL ACCESS INTEGRATION, REPLACE EXTERNAL ACCESS INTEGRATION, CREATE NETWORK RULE, REPLACE NETWORK RULE
  • A sample table with some sensitive (test) data that needs protecting

Snowflake Setup

  1. Open a new worksheet.
  2. Choose the authorized role to run this sample. In our case we are using the accountadmin.
  3. Choose the warehouse and the database to use, e.g. USE DATABASE PROTEGRITY.

Playground Setup

  1. To get any responses from the Playground, we’ll need to log in to it first. It’s recommended to create a mechanism to retrieve the JWT token from the Playground. This is not required for the integration itself but it will keep the code reusable: the Playground’s authentication token expires after 24 hours. Paste the following to your worksheet, then run it:

    
        -- Generate a new JWT Token for the Playground
        CREATE OR REPLACE FUNCTION pty_login(email text, password text)
        RETURNS STRING
        LANGUAGE PYTHON
        RUNTIME_VERSION = 3.8
        HANDLER = 'pty_login'
        EXTERNAL_ACCESS_INTEGRATIONS = (pty_external_access_integration)
        PACKAGES = ('requests', 'simplejson')
        AS
        $$
        import _snowflake
        import simplejson as json
        import requests 
    
        def pty_login(email, password):
    
            body = {
              "email": email,
              "password": password
            }
    
            url = "https://api.playground.protegrity.com/auth/login"
            headers = {
                "Content-type": "application/json"
                
            }
    
            session = requests.Session()
            response = session.post(url, json=body, headers=headers)
            
            response_as_json = json.loads(response.text)
            
            return response_as_json['jwt_token']
        $$;
        
  2. Log in to the Playground by providing your email and password:

    
        SELECT pty_login('YOUR_EMAIL','YOUR_PASSWORD') as jwt;
        
  3. Construct a new secret to envelop all your account information. The secret will be very handy for authenticating your requests to the API: our fuction will simply retrieve your details when needed. Provide your API Key in the api_token field and replace the jwt_token value with the token returned from the previous query.

    
        CREATE OR REPLACE SECRET pty_playground_login
            TYPE = GENERIC_STRING
            SECRET_STRING = '{
              "clientName":"API Playground User",
              "api_token":"YOUR_API_KEY",
              "jwt_token": "YOUR_JWT_TOKEN"}';
        
  4. Create a network rule that points to the Protegrity API main url, and create an external access integration that allows usage of the network rule and the secret we previously created.

    
        -- Create a network rule to Protegrity API 
        CREATE OR REPLACE NETWORK RULE pty_network_rule
        MODE = EGRESS
        TYPE = HOST_PORT
        VALUE_LIST = ('api.playground.protegrity.com');
    
        -- Create an integration using the network rule and secret
        CREATE OR REPLACE EXTERNAL ACCESS INTEGRATION pty_external_access_integration
        ALLOWED_NETWORK_RULES = (pty_network_rule)
        ALLOWED_AUTHENTICATION_SECRETS = (pty_playground_login)
        ENABLED = true;
        

Data Protection in Action

It’s time we put everything together and start protecting some data.

  1. Create a universal UDF to call Protegrity endpoints for protecting data. The function will accept the name of the endpoint to call, and the data to protect.

    
        -- Create a UDF to de-identify a single value
        CREATE OR REPLACE FUNCTION pty_protect(endpoint text, data text)
        RETURNS STRING
        LANGUAGE PYTHON
        RUNTIME_VERSION = 3.8
        HANDLER = 'pty_protect'
        EXTERNAL_ACCESS_INTEGRATIONS = (pty_external_access_integration)
        PACKAGES = ('requests', 'simplejson')
        SECRETS = ('cred' = pty_playground_login)
        AS
        $$
        import _snowflake
        import simplejson as json
        import requests 
    
        def pty_protect(endpoint, data):
            credentials = json.loads(_snowflake.get_generic_secret_string('cred'), strict=False)
            
            body = {
              "operation": "protect",
              "data": [
                  data
              ]
            }
    
            url = "https://api.playground.protegrity.com/v1/" + endpoint
            headers = {
                "x-api-key": credentials["api_token"],
                "Authorization": credentials["jwt_token"],
                "Content-type": "application/json"
            }
    
            session = requests.Session()
            response = session.post(url, json=body, headers=headers)
            
            response_as_json = json.loads(response.text)
            
            return response_as_json[0]
        $$;
        
  2. Run the UDF to make sure it’s working as expected. To protect data, you can choose any of our published data protection endpoints, for example name:

    
        SELECT pty_protect('name','Mickey Mouse') as name;
        

    You should receive a tokenized version of Mickey’s name: oWfepC TGEdC. At last his privacy is protected.

  3. You can now start protecting your data stored in Snowflake tables. Choose your test table and run a select statement, wrapping the column names in your function calls. Make sure to match the type of the data you are protecting with the appropriate tokenization type: for example, choose the name for first and last names, or ssn for Social Security Numbers. Try the query on a limited sample size first – you wouldn’t want to run out of your Playground credits!

    Example SQL query:

    
        select 
        pty_protect('name',first_name) as protected_first_name, 
        pty_protect('name',last_name) as protected_last_name, 
        pty_protect('email',email) as protected_email, 
        pty_protect('ssn',ssn) as protected_ssn, 
        pty_protect('iban',iban) as protected_iban, 
        pty_protect('dob',birthday)  as protected_dob
        from my_table limit 5;
        

    Note that the query processes each field separately, which adds to the total processing time. In a production scenario, batch processing would be recommended for achieving best performance.

Summary

That’s it! This example builds a sample integration that allows Snowflake users to protect data. You can extend this sample by building an unprotect function and incorporating different user roles. Or, creating a function that accepts options, such as dictionary for French and German.