Skip to content

ralenjor/py-validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

py-validator

A comprehensive, zero-dependency Python input validation library designed for web applications, APIs, and data processing pipelines.

Python 3.10+ License: MIT

Features

  • Simple API - Function-based validators that return (is_valid, error_message) tuples
  • Fully Typed - Complete type hints for IDE autocomplete and static analysis
  • Zero Dependencies - Uses only Python standard library
  • Practical Validators - Email, URL, phone, UUID, IP addresses, credit cards, and more
  • File Security - Path traversal prevention, filename validation, MIME type checking
  • Sanitizers - Clean and normalize user input safely

Important: Security Model

This library provides input validation, not security controls. It helps you verify that data matches expected formats.

What This Library Does

  • Validates data formats (email, phone, UUID, IP address, etc.)
  • Prevents path traversal attacks (../ sequences)
  • Validates and sanitizes filenames
  • Provides HTML escaping for safe output
  • Enforces business rules (length limits, allowed characters, etc.)

What This Library Does NOT Do

For injection attacks, you must use proper architectural defenses:

Attack Proper Defense Why Validation Alone Fails
SQL Injection Parameterized queries Attackers can encode/obfuscate payloads infinitely
Command Injection subprocess.run([...], shell=False) Shell metacharacters vary by context
XSS Context-aware output encoding Input validation can't predict output context

This library intentionally does not include SQL/command/XSS "detection" validators because they create a false sense of security. A blocklist approach cannot catch all attack variants and may block legitimate input.

Installation

Copy validator.py to your project:

# Copy directly
curl -O https://raw.githubusercontent.com/yourusername/py-validator/main/validator.py

# Or clone the repo
git clone https://github.com/yourusername/py-validator.git

Quick Start

from validator import (
    validate_email,
    validate_path_safe,
    validate_filename,
    validate_ip_address,
    sanitize_html,
)

# Validate an email
result = validate_email("[email protected]")
if result.is_valid:
    print("Email is valid!")
else:
    print(f"Error: {result.error}")

# Prevent path traversal attacks
result = validate_path_safe("../../../etc/passwd")
# ValidationResult(is_valid=False, error="Path contains traversal sequences")

# Validate uploaded filenames
result = validate_filename("report.pdf", allowed_extensions={".pdf", ".docx"})
# ValidationResult(is_valid=True, error=None)

# Validate IP addresses
result = validate_ip_address("192.168.1.1")
# ValidationResult(is_valid=True, error=None)

# Sanitize user input for HTML display (PRIMARY XSS defense)
safe_html = sanitize_html("<script>alert('xss')</script>")
# "&lt;script&gt;alert(&#x27;xss&#x27;)&lt;/script&gt;"

API Reference

Validation Result

All validators return a ValidationResult named tuple:

from validator import ValidationResult

result: ValidationResult = validate_email("[email protected]")
result.is_valid  # bool - True if validation passed
result.error     # str | None - Error message if validation failed

String Validators

validate_email(email: str) -> ValidationResult

Validates email addresses against RFC 5322 (simplified).

validate_email("[email protected]")      # ✓ Valid
validate_email("[email protected]")  # ✓ Valid
validate_email("invalid-email")          # ✗ Invalid email format

validate_url(url: str, allowed_schemes: list[str] = None, require_tld: bool = True) -> ValidationResult

Validates URL format with configurable scheme restrictions.

validate_url("https://example.com/path")           # ✓ Valid
validate_url("ftp://files.example.com",
             allowed_schemes=["ftp", "sftp"])       # ✓ Valid
validate_url("javascript:alert(1)")                 # ✗ Scheme not allowed

validate_phone(phone: str, allow_extensions: bool = False) -> ValidationResult

Validates international phone number formats.

validate_phone("+1-555-123-4567")        # ✓ Valid
validate_phone("+44 20 7123 4567")       # ✓ Valid
validate_phone("(123) 456-7890")         # ✓ Valid

validate_username(username: str, min_length: int = 3, max_length: int = 32, ...) -> ValidationResult

Validates usernames with configurable rules.

validate_username("john_doe")            # ✓ Valid
validate_username("ab")                  # ✗ Too short (min 3)
validate_username("_invalid")            # ✗ Must start with alphanumeric

validate_password_strength(password: str, min_length: int = 8, ...) -> ValidationResult

Validates password complexity requirements.

validate_password_strength("SecureP@ss123")  # ✓ Valid
validate_password_strength("weak")            # ✗ Too short, missing requirements

validate_uuid(uuid_string: str, version: int = None) -> ValidationResult

Validates UUID format (versions 1-5).

validate_uuid("550e8400-e29b-41d4-a716-446655440000")  # ✓ Valid
validate_uuid("not-a-uuid")                             # ✗ Invalid format

validate_length(value: str, min_length: int = None, max_length: int = None) -> ValidationResult

Validates string length constraints.

validate_length("hello", min_length=3, max_length=10)  # ✓ Valid
validate_length("hi", min_length=3)                     # ✗ Too short

validate_not_empty(value: str, strip: bool = True) -> ValidationResult

Validates that a string is not empty or whitespace-only.

validate_not_empty("hello")   # ✓ Valid
validate_not_empty("   ")     # ✗ Empty after stripping

validate_alphanumeric(value: str, allow_spaces: bool = False, allow_underscores: bool = False) -> ValidationResult

Validates alphanumeric strings.

validate_alphanumeric("Hello123")                    # ✓ Valid
validate_alphanumeric("Hello World", allow_spaces=True)  # ✓ Valid
validate_alphanumeric("Hello!")                      # ✗ Invalid character

validate_slug(value: str) -> ValidationResult

Validates URL-friendly slugs (lowercase, numbers, hyphens).

validate_slug("my-blog-post")    # ✓ Valid
validate_slug("my-post-123")     # ✓ Valid
validate_slug("My Blog Post")    # ✗ Invalid format

validate_hex_string(value: str, expected_length: int = None) -> ValidationResult

Validates hexadecimal strings.

validate_hex_string("deadbeef")              # ✓ Valid
validate_hex_string("abc123", expected_length=6)  # ✓ Valid
validate_hex_string("0xDEADBEEF")            # ✓ Valid (0x prefix allowed)

validate_ascii(value: str) -> ValidationResult

Validates that a string contains only ASCII characters.

validate_ascii("Hello, World!")  # ✓ Valid
validate_ascii("Héllo")          # ✗ Contains non-ASCII

validate_printable(value: str, allow_newlines: bool = True) -> ValidationResult

Validates that a string contains only printable characters.

validate_printable("Hello\nWorld")       # ✓ Valid
validate_printable("Hello\x00World")     # ✗ Contains control character

validate_choice(value: Any, choices: Sequence[Any]) -> ValidationResult

Validates that a value is one of the allowed choices.

validate_choice("red", ["red", "green", "blue"])     # ✓ Valid
validate_choice("purple", ["red", "green", "blue"])  # ✗ Not in choices

validate_contains_only(value: str, allowed_chars: str) -> ValidationResult

Validates that a string contains only specified characters.

validate_contains_only("123-456", "0123456789-")  # ✓ Valid
validate_contains_only("abc", "0123456789")       # ✗ Invalid characters

validate_json(value: str) -> ValidationResult

Validates that a string is valid JSON.

validate_json('{"name": "John"}')  # ✓ Valid
validate_json('{invalid}')         # ✗ Invalid JSON

validate_base64(value: str, urlsafe: bool = False) -> ValidationResult

Validates Base64 encoded strings.

validate_base64("SGVsbG8gV29ybGQ=")  # ✓ Valid
validate_base64("not-valid!")        # ✗ Invalid Base64

validate_semver(value: str) -> ValidationResult

Validates semantic version strings (SemVer 2.0.0).

validate_semver("1.0.0")                  # ✓ Valid
validate_semver("1.2.3-alpha.1+build.123")  # ✓ Valid
validate_semver("1.2")                    # ✗ Invalid format

validate_credit_card(number: str) -> ValidationResult

Validates credit card numbers using the Luhn algorithm.

validate_credit_card("4532015112830366")   # ✓ Valid
validate_credit_card("4532-0151-1283-0366")  # ✓ Valid (separators OK)
validate_credit_card("1234567890123456")   # ✗ Invalid (fails Luhn)

validate_regex(value: str, pattern: str, flags: int = 0) -> ValidationResult

Validates against a custom regex pattern.

validate_regex("ABC123", r"^[A-Z]+\d+$")  # ✓ Valid

Number Validators

validate_integer(value: int | str, min_value: int = None, max_value: int = None) -> ValidationResult

Validates integers with optional range constraints.

validate_integer(42)                              # ✓ Valid
validate_integer("42")                            # ✓ Valid (string coercion)
validate_integer(42, min_value=0, max_value=100)  # ✓ Valid
validate_integer(-5, min_value=0)                 # ✗ Below minimum

validate_float(value: float | str, min_value: float = None, max_value: float = None, max_decimals: int = None) -> ValidationResult

Validates floats with precision control.

validate_float(3.14)                     # ✓ Valid
validate_float(3.14, max_decimals=2)     # ✓ Valid
validate_float(3.14159, max_decimals=2)  # ✗ Too many decimals

validate_positive(value: float | int | str, allow_zero: bool = False) -> ValidationResult

Validates positive numbers.

validate_positive(42)                  # ✓ Valid
validate_positive(0, allow_zero=True)  # ✓ Valid
validate_positive(-5)                  # ✗ Not positive

validate_range(value: float | int | str, min_value: float, max_value: float, inclusive: bool = True) -> ValidationResult

Validates numeric range boundaries.

validate_range(50, 0, 100)                    # ✓ Valid
validate_range(100, 0, 100, inclusive=False)  # ✗ Exclusive bounds

validate_port(port: int | str) -> ValidationResult

Validates network port numbers (1-65535).

validate_port(8080)   # ✓ Valid
validate_port(70000)  # ✗ Out of range

Network Validators

validate_ipv4(ip: str) -> ValidationResult

Validates IPv4 addresses.

validate_ipv4("192.168.1.1")  # ✓ Valid
validate_ipv4("256.1.1.1")    # ✗ Invalid octet

validate_ipv6(ip: str) -> ValidationResult

Validates IPv6 addresses.

validate_ipv6("2001:0db8:85a3:0000:0000:8a2e:0370:7334")  # ✓ Valid
validate_ipv6("::1")                                       # ✓ Valid

validate_ip_address(ip: str) -> ValidationResult

Validates any IP address (IPv4 or IPv6).

validate_ip_address("192.168.1.1")  # ✓ Valid
validate_ip_address("::1")          # ✓ Valid

validate_ip_network(network: str, strict: bool = True) -> ValidationResult

Validates IP networks in CIDR notation.

validate_ip_network("192.168.1.0/24")  # ✓ Valid
validate_ip_network("10.0.0.0/8")      # ✓ Valid

validate_mac_address(mac: str) -> ValidationResult

Validates MAC addresses.

validate_mac_address("00:11:22:33:44:55")  # ✓ Valid
validate_mac_address("00-11-22-33-44-55")  # ✓ Valid
validate_mac_address("0011.2233.4455")     # ✓ Valid

Date/Time Validators

validate_date(date_string: str, format: str = "%Y-%m-%d") -> ValidationResult

Validates date strings against a format.

validate_date("2024-01-15")                         # ✓ Valid
validate_date("15/01/2024", format="%d/%m/%Y")      # ✓ Valid

validate_datetime(datetime_string: str, format: str = "%Y-%m-%d %H:%M:%S") -> ValidationResult

Validates datetime strings.

validate_datetime("2024-01-15 14:30:00")  # ✓ Valid

validate_date_range(date_string: str, min_date: str = None, max_date: str = None, format: str = "%Y-%m-%d") -> ValidationResult

Validates dates within a range.

validate_date_range("2024-06-15",
                    min_date="2024-01-01",
                    max_date="2024-12-31")  # ✓ Valid

validate_iso8601(datetime_string: str) -> ValidationResult

Validates ISO 8601 datetime formats.

validate_iso8601("2024-01-15")                   # ✓ Valid
validate_iso8601("2024-01-15T14:30:00Z")         # ✓ Valid
validate_iso8601("2024-01-15T14:30:00+00:00")    # ✓ Valid
validate_iso8601("2024-01-15T14:30:00.123456Z")  # ✓ Valid

File & Path Security

These validators provide real security value by preventing common attack patterns.

validate_path_safe(path: str, allow_absolute: bool = False) -> ValidationResult

Prevents path traversal attacks. Use this for any user-provided file paths.

validate_path_safe("data/file.txt")        # ✓ Safe
validate_path_safe("../etc/passwd")        # ✗ Traversal detected
validate_path_safe("%2e%2e%2fpasswd")      # ✗ Encoded traversal
validate_path_safe("/etc/passwd")          # ✗ Absolute path

validate_filename(filename: str, allowed_extensions: set[str] = None, block_dangerous: bool = True) -> ValidationResult

Validates filenames for safe filesystem use.

validate_filename("document.pdf")                    # ✓ Safe
validate_filename("script.exe")                      # ✗ Dangerous extension
validate_filename("photo.jpg",
                  allowed_extensions={".jpg", ".png"})  # ✓ Safe
validate_filename("../passwd")                       # ✗ Path separator

validate_file_extension(filename: str, allowed_extensions: set[str]) -> ValidationResult

Validates against an extension whitelist.

validate_file_extension("photo.jpg", {".jpg", ".png", ".gif"})  # ✓ Valid
validate_file_extension("script.php", {".jpg", ".png"})          # ✗ Not allowed

validate_mime_type(mime_type: str, filename: str, strict: bool = True) -> ValidationResult

Validates MIME type matches file extension.

validate_mime_type("image/jpeg", "photo.jpg")        # ✓ Valid
validate_mime_type("application/php", "image.jpg")  # ✗ Mismatch

Sanitizers

sanitize_string(value: str, trim: bool = True, normalize_whitespace: bool = True, ...) -> str

Cleans and normalizes strings.

sanitize_string("  Hello   World  ")       # "Hello World"
sanitize_string("text", max_length=3)      # "tex"

sanitize_html(value: str) -> str

Primary defense against XSS. Escapes HTML special characters.

Always use this when displaying user input in HTML context.

sanitize_html("<script>alert('xss')</script>")
# "&lt;script&gt;alert(&#x27;xss&#x27;)&lt;/script&gt;"

sanitize_html("5 > 3 && 2 < 4")
# "5 &gt; 3 &amp;&amp; 2 &lt; 4"

sanitize_filename(filename: str, replacement: str = "_", max_length: int = 255) -> str

Makes filenames safe for all filesystems.

sanitize_filename("my<file>:name.txt")      # "my_file__name.txt"
sanitize_filename("../../../etc/passwd")    # "passwd"
sanitize_filename("CON.txt")                # "_CON.txt" (Windows reserved)

sanitize_path(path: str, base_dir: str = None, allow_absolute: bool = False) -> str | None

Normalizes and constrains paths to a base directory.

sanitize_path("data/file.txt")                         # "data/file.txt"
sanitize_path("../config/db.json", base_dir="/app")    # "/app/config/db.json"
sanitize_path("../../../etc/passwd", base_dir="/app")  # None (escapes base)

Utilities

validate_all(value: str, validators: list) -> ValidationResult

Run multiple validators and collect all errors.

result = validate_all("ab@x", [
    validate_email,
    (validate_length, {"min_length": 10}),
])
# ValidationResult(is_valid=False, error="Invalid email format; Value must be at least 10 characters")

create_validator(*validators) -> ValidatorFunc

Create a reusable composite validator.

email_validator = create_validator(
    validate_email,
    (validate_length, {"max_length": 100}),
)

result = email_validator("[email protected]")

Real-World Examples

File Upload Validation

from validator import (
    validate_filename,
    validate_file_extension,
    validate_mime_type,
    validate_path_safe,
    sanitize_filename,
)

def validate_upload(filename: str, mime_type: str, save_dir: str) -> tuple[bool, str | None, str | None]:
    """
    Validate an uploaded file before saving.
    Returns (is_valid, error_message, safe_filename).
    """
    ALLOWED_EXTENSIONS = {".jpg", ".jpeg", ".png", ".gif", ".pdf"}

    # 1. Validate filename is safe
    result = validate_filename(filename, allowed_extensions=ALLOWED_EXTENSIONS)
    if not result.is_valid:
        return False, result.error, None

    # 2. Verify MIME type matches extension
    result = validate_mime_type(mime_type, filename)
    if not result.is_valid:
        return False, result.error, None

    # 3. Validate save path doesn't escape upload directory
    result = validate_path_safe(save_dir)
    if not result.is_valid:
        return False, result.error, None

    # 4. Generate safe filename
    safe_name = sanitize_filename(filename)

    return True, None, safe_name

API Input Validation

from validator import (
    validate_email,
    validate_username,
    validate_password_strength,
    validate_choice,
    sanitize_string,
    sanitize_html,
)

def validate_registration(data: dict) -> dict[str, str]:
    """Validate user registration data. Returns dict of field errors."""
    errors = {}

    # Email
    email = sanitize_string(data.get("email", ""))
    result = validate_email(email)
    if not result.is_valid:
        errors["email"] = result.error

    # Username
    username = sanitize_string(data.get("username", ""))
    result = validate_username(username)
    if not result.is_valid:
        errors["username"] = result.error

    # Password
    result = validate_password_strength(data.get("password", ""))
    if not result.is_valid:
        errors["password"] = result.error

    # Role (must be valid choice)
    role = data.get("role", "")
    result = validate_choice(role, ["user", "admin", "moderator"])
    if not result.is_valid:
        errors["role"] = result.error

    return errors

def display_user_profile(user: dict) -> dict:
    """Prepare user data for HTML display."""
    return {
        "name": sanitize_html(user["name"]),  # XSS prevention
        "bio": sanitize_html(user["bio"]),
        "email": user["email"],  # Already validated, not user-displayed
    }

Data Import Pipeline

from validator import (
    validate_path_safe,
    validate_filename,
    sanitize_path,
)
from pathlib import Path

IMPORT_DIR = Path("/app/data/imports")

def process_import_request(user_path: str) -> Path:
    """
    Safely resolve a user-provided file path for import.
    Raises ValueError if path is unsafe.
    """
    # 1. Validate no path traversal
    result = validate_path_safe(user_path)
    if not result.is_valid:
        raise ValueError(f"Invalid path: {result.error}")

    # 2. Validate filename
    filename = Path(user_path).name
    result = validate_filename(filename, allowed_extensions={".csv", ".json", ".xml"})
    if not result.is_valid:
        raise ValueError(f"Invalid file: {result.error}")

    # 3. Constrain to import directory
    safe_path = sanitize_path(user_path, base_dir=IMPORT_DIR)
    if safe_path is None:
        raise ValueError("Path escapes allowed directory")

    return Path(safe_path)

Network Configuration Validation

from validator import (
    validate_ip_address,
    validate_ip_network,
    validate_port,
    validate_mac_address,
)

def validate_network_config(config: dict) -> dict[str, str]:
    """Validate network configuration. Returns dict of errors."""
    errors = {}

    # Server IP
    result = validate_ip_address(config.get("server_ip", ""))
    if not result.is_valid:
        errors["server_ip"] = result.error

    # Port
    result = validate_port(config.get("port", ""))
    if not result.is_valid:
        errors["port"] = result.error

    # Allowed network (CIDR)
    if "allowed_network" in config:
        result = validate_ip_network(config["allowed_network"])
        if not result.is_valid:
            errors["allowed_network"] = result.error

    # Device MAC
    if "device_mac" in config:
        result = validate_mac_address(config["device_mac"])
        if not result.is_valid:
            errors["device_mac"] = result.error

    return errors

Secure Coding Practices

This library is one part of a secure application. Here's how to handle common security concerns:

SQL Injection

# WRONG - Never do this
query = f"SELECT * FROM users WHERE name = '{username}'"

# RIGHT - Use parameterized queries
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))

Command Injection

# WRONG - Never do this
os.system(f"convert {filename} output.png")

# RIGHT - Use subprocess with shell=False
import subprocess
subprocess.run(["convert", filename, "output.png"], shell=False)

XSS (Cross-Site Scripting)

# Use sanitize_html when displaying user content
from validator import sanitize_html

user_comment = "<script>alert('xss')</script>"
safe_comment = sanitize_html(user_comment)
# Now safe to include in HTML: &lt;script&gt;...

# Or use a template engine with auto-escaping (Jinja2, Django templates)

Path Traversal

from validator import validate_path_safe, sanitize_path

user_file = request.args.get("file")

# Validate first
result = validate_path_safe(user_file)
if not result.is_valid:
    return "Invalid path", 400

# Constrain to allowed directory
safe_path = sanitize_path(user_file, base_dir="/app/uploads")
if safe_path is None:
    return "Access denied", 403

Requirements

  • Python 3.10+
  • No external dependencies

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Design Principles

When contributing, please keep these principles in mind:

  1. Validators validate, they don't secure - Input validation checks format, not intent
  2. No blocklist-based security - Blocklists can always be bypassed
  3. Clear naming - Function names should describe exactly what they check
  4. Honest documentation - Don't oversell security properties

About

A comprehensive, zero-dependency Python input validation library designed for web applications, APIs, and data processing pipelines.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages