Discover

Detect and classify sensitive data

METHOD: POST

ENDPOINT: https://api.playground.protegrity.com/v1/discover

DESCRIPTION:

Detect and classify sensitive data in a given input. The endpoint will return a classification and confidence score for every sensitive attribute found, alongside its location – a column name or a start and end index. Confidence score returns values from 0.1 to 1. The higher the confidence score, the more the product is sure of the PII classification it produced. We recommend using the confidence score to prioritize inspection of found sensitive data.

ATTRIBUTES:

data (required) Input data to transform.

OPTIONS:

format (required) Specify the format of the input data. Option text covers unstructured text. Option csv is used for processing comma-separated values. Accepts: [ text | csv ].

header_line (optional) Specific to csv content only. Set to true if the data includes a header line in the first row. Set to false in case of the opposite. Accepts: [ true | false ]. Defaults to false.

raw (optional) Returns verbose output of the classification job when set to true. Accepts: [ true | false ]. Defaults to false.

SAMPLE REQUEST – TEXT

curl --location 'https://api.playground.protegrity.com/v1/discover' \
--header 'x-api-key: <API_Key>' \
--header 'Content-Type: application/json' \
--header 'Authorization: <JWT_TOKEN>' \
--data '{
    "options": {
        "format": "text",
        "raw": false
    },
    "data": ["Hello, this is Peregrine Grey from Air Industries, could you give me a call back to my mobile number 212-456-7890. Have a lovely day!"]
}'
  
  
  

SAMPLE RESPONSE – TEXT


[
    {
        "score": 0.85,
        "classification": "PHONE_NUMBER",
        "start_index": 101,
        "end_index": 113
    },
    {
        "score": 0.99,
        "classification": "PERSON",
        "start_index": 15,
        "end_index": 29
    }
]

SAMPLE REQUEST – CSV

curl --location 'https://api.playground.protegrity.com/v1/ai' \
--header 'x-api-key: <API_Key>' \
--header 'Content-Type: application/json' \
--header 'Authorization: <JWT_TOKEN>' \
--data '{
    "operation": "classify",
    "options": {
        "format": "csv",
        "header_line": true,
        "raw": false
    },
    "data": ["Social Security Number,Credit Card Number,IBAN,Phone Number\n589-25-1068,349384370543801,FR43 9255 4858 47BG 3EBG U4OK O18,(483) 9440301\n636-36-3077,4041594844904,AL50 8947 4215 KAEY GAPM NLYC FNZG,(113) 5143119\n748-82-2375,3558175715821800,AT34 4082 9269 0841 5702,(763) 5136237\n516-62-9861,560221027976015000,FR22 0068 7181 11FB UG8H ECEM 306,(726) 6031636\n121-49-9409,374283320982549,DK37 5687 8459 8060 79,(624) 9205200\n838-73-3299,5558216060144900,CR54 8952 8144 6403 4765 0,(356) 9479541\n439-11-5310,5048376143641900,RS76 6213 4824 0184 8983 74,(544) 5623326\n564-06-8466,3543299511845640,EE51 6882 3443 7863 4703,(702) 6093849\n518-54-5443,3543019452249540,IT65 D000 3874 2801 Z15I LNLL OOX,(584) 8618371"]
}'
  
  
  

SAMPLE RESPONSE – CSV


[
    {
        "score": 0.85,
        "classification": "US_SSN",
        "column_name": "Social Security Number",
        "column_index": 0
    },
    {
        "score": 0.11,
        "classification": "CREDIT_CARD",
        "column_name": "Credit Card Number",
        "column_index": 1
    },
    {
        "score": 0.04,
        "classification": "US_BANK_NUMBER",
        "column_name": "Credit Card Number",
        "column_index": 1
    },
    {
        "score": 0.0,
        "classification": "US_DRIVER_LICENSE",
        "column_name": "Credit Card Number",
        "column_index": 1
    },
    {
        "score": 0.01,
        "classification": "US_DRIVER_LICENSE",
        "column_name": "IBAN",
        "column_index": 2
    },
    {
        "score": 0.01,
        "classification": "US_DRIVER_LICENSE",
        "column_name": "Phone Number",
        "column_index": 3
    },
    {
        "score": 0.79,
        "classification": "IBAN_CODE",
        "column_name": "IBAN",
        "column_index": 2
    },
    {
        "score": 0.09,
        "classification": "DATE_TIME",
        "column_name": "IBAN",
        "column_index": 2
    },
    {
        "score": 0.75,
        "classification": "PHONE_NUMBER",
        "column_name": "Phone Number",
        "column_index": 3
    },
    {
        "score": 0.01,
        "classification": "PERSON",
        "column_name": "Phone Number",
        "column_index": 3
    }
]



Last modified January 16, 2025