How to Redact Social Security Numbers from OpenAI API Calls (Python)

May 29, 2026·6 min read·security

If your application sends user-generated text to the OpenAI API, it will eventually send a Social Security number. Support tickets, form fields, chat messages — SSNs show up in production data constantly. This guide shows three ways to catch and redact them before they reach any LLM provider.

The Problem

When you call client.chat.completions.create(), the entire prompt — including any PII embedded in it — is sent to OpenAI's servers. Even with OpenAI's zero-retention API policy, the data still transits their infrastructure, which may violate HIPAA, CCPA, GDPR, or your internal data classification policies.

The risk: raw PII in prompts

from openai import OpenAI
client = OpenAI()

# This sends "John Smith, SSN 123-45-6789" to OpenAI
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{
        "role": "user",
        "content": "Summarize this support ticket: Customer John Smith "
                   "(SSN 123-45-6789) called about billing issue #4521."
    }]
)

Approach 1: Regex Pattern Matching

The fastest approach. Catches SSNs in standard formats (XXX-XX-XXXX, XXX XX XXXX, XXXXXXXXX). Works for known patterns but misses free-text PII like names or addresses.

Regex-based SSN redaction

import re

SSN_PATTERN = re.compile(
    r'\b(?!000|666|9\d{2})\d{3}[- ]?(?!00)\d{2}[- ]?(?!0000)\d{4}\b'
)

def redact_ssn(text: str) -> str:
    return SSN_PATTERN.sub("[SSN_REDACTED]", text)

# Usage
prompt = "Customer SSN is 123-45-6789 and their card is 4111-1111-1111-1111"
safe_prompt = redact_ssn(prompt)
# "Customer SSN is [SSN_REDACTED] and their card is 4111-1111-1111-1111"
#  ^ SSN caught, but credit card missed

Limitation: Regex only catches SSNs. You'd need separate patterns for credit cards, phone numbers, email addresses, driver's licenses, passport numbers, IBAN codes, and every other PII type. Maintaining dozens of regex patterns is error-prone and misses context-dependent PII like names.

Approach 2: NLP-Based Detection (Presidio)

Microsoft Presidio uses NLP models + pattern matching to detect 30+ entity types including SSNs, credit cards, names, addresses, and more. Much broader coverage than regex.

Presidio-based PII redaction

from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine()
anonymizer = AnonymizerEngine()

def redact_pii(text: str) -> str:
    results = analyzer.analyze(
        text=text,
        language="en",
        entities=[
            "US_SSN", "CREDIT_CARD", "PHONE_NUMBER",
            "EMAIL_ADDRESS", "PERSON", "US_DRIVER_LICENSE",
        ],
    )
    anonymized = anonymizer.anonymize(text=text, analyzer_results=results)
    return anonymized.text

# Usage
prompt = "Customer John Smith (SSN 123-45-6789) called about billing."
safe = redact_pii(prompt)
# "Customer <PERSON> (SSN <US_SSN>) called about billing."

Trade-off: Presidio adds ~30-200ms latency per request (depending on text length and model). You also need to deploy and maintain the NLP models. Works well for batch processing; can be tight for real-time chat.

Approach 3: Gateway-Level Redaction (Zero Code Changes)

Instead of adding redaction code to every API call, route requests through an AI gateway that automatically detects and redacts PII before forwarding to OpenAI. Change two lines of code; get 30+ entity type protection including SSNs, credit cards, names, addresses, and more.

Gateway-level redaction with AI Security Gateway

from openai import OpenAI

# Change these two lines — everything else stays the same
client = OpenAI(
    base_url="https://api.aisecuritygateway.ai/v1",
    api_key="aisg_your_key_here",
)

# SSNs, credit cards, names, addresses — all auto-redacted
response = client.chat.completions.create(
    model="gpt-4.1",
    messages=[{
        "role": "user",
        "content": "Customer John Smith (SSN 123-45-6789) called about "
                   "billing issue #4521. Card ending 4111-1111-1111-1111."
    }]
)
# What OpenAI sees: "Customer [PERSON] (SSN [US_SSN]) called about
#   billing issue #4521. Card ending [CREDIT_CARD]."

Which Approach Should You Use?

Criteria	Regex	Presidio NLP	Gateway
Entity coverage	SSN only (per pattern)	30+ entity types	30+ entity types
Setup time	5 minutes	30-60 minutes	2 minutes
Added latency	< 1ms	30-200ms	< 50ms
Catches names/addresses	No	Yes	Yes
Code changes required	Per API call	Per API call	2 lines total
Maintenance	High (pattern updates)	Medium (model updates)	None
Works with all providers	Manual per provider	Manual per provider	Automatic

Beyond SSNs: Other PII You Should Redact

SSNs are the most obvious, but production LLM traffic contains much more PII:

Credit/debit card numbers — Luhn-validated, all major networks
Phone numbers — US, UK, international formats
Email addresses — including corporate domains
Person names — NLP-based, handles "John Smith" and "Dr. Jane Doe"
Physical addresses — street, city, state, ZIP
Driver's license numbers — state-specific formats
Medical record numbers — HIPAA-relevant
IBAN / bank account numbers — EU banking identifiers
Passport numbers — multi-country formats

Stop writing PII regex patterns

AI Security Gateway auto-redacts 30+ entity types from every API call. Two lines of code, under 50ms latency. Works with OpenAI, Anthropic, Google, Meta, and 8+ more providers.

Try Free — 1M Credits Read the DLP Guide

Join the Community

GitHub LinkedIn X (Twitter)YouTube