WordPress Hooks Crawler

A fast, fully-typed Python crawler that collects every hook documented at developer.wordpress.org/reference/hooks and exports the data in five formats: JSON, YAML, Markdown, HTML, and plain text.

Pre-built output (all hooks, ready to use) is published separately at github.com/BaseMax/wordpress-hooks.

Features

Crawls all 49+ paginated listing pages automatically
Extracts per-hook: name, URL, type, inferred kind (action / filter), description, used-by count, uses count, source file & line, GitHub source link, since-version(s), packages
HTTP cache SQLite-backed via requests-cache; re-runs are near-instant and the polite delay is skipped for cache hits
Automatic retry with exponential back-off on network errors
Five export formats out of the box
Strict Python type annotations throughout (mypy --strict clean)
Single dependency group, managed with uv

Output formats

File	Format	Best for
`hooks.json`	JSON (pretty-printed)	Programmatic processing, APIs
`hooks.yaml`	YAML	Config files, readable diffs
`hooks.md`	Markdown	GitHub rendering, wikis
`hooks.html`	Self-contained HTML	Browser viewing, searchable table
`hooks.txt`	Plain text	Terminal paging, grep

Requirements

Python ≥ 3.11
uv (recommended) or pip

Installation

With uv (recommended)

git clone https://github.com/BaseMax/wordpress-crawler-hooks.git
cd wordpress-crawler-hooks
uv sync

With pip

git clone https://github.com/BaseMax/wordpress-crawler-hooks.git
cd wordpress-crawler-hooks
pip install -r requirements.txt

Usage

# uv
uv run python crawler.py

# plain Python
python crawler.py

Output files are written to ./output/ by default.

All CLI options

usage: crawler.py [-h] [--output-dir DIR] [--delay SECONDS]
                  [--cache-dir DIR] [--cache-ttl SECONDS] [--no-cache]

options:
  --output-dir DIR      Directory where output files are written. (default: output)
  --delay SECONDS       Pause between HTTP requests. (default: 0.5)
  --cache-dir DIR       Directory for the SQLite HTTP cache database. (default: .cache)
  --cache-ttl SECONDS   How long a cached response stays fresh. (default: 86400 = 24 h)
  --no-cache            Disable the HTTP cache and always fetch live pages.

Examples

# Custom output directory
python crawler.py --output-dir data/

# Shorter cache lifetime (1 hour)
python crawler.py --cache-ttl 3600

# Always fetch live, no cache
python crawler.py --no-cache

# Faster scraping (smaller delay) with custom cache location
python crawler.py --delay 0.25 --cache-dir /tmp/wp-cache

Project structure

wordpress-crawler-hooks/
├── crawler.py          # main script
├── pyproject.toml      # project metadata & dependencies (PEP 621)
├── requirements.txt    # pip-compatible pin file
├── output/             # generated output (git-ignored)
│   ├── hooks.json
│   ├── hooks.yaml
│   ├── hooks.md
│   ├── hooks.html
│   └── hooks.txt
└── .cache/             # SQLite HTTP cache (git-ignored)

Data schema

Each hook object contains:

Field	Type	Description
`post_id`	`int`	WordPress post ID
`name`	`str`	Hook name, e.g. `admin_init`
`url`	`str`	Full URL on developer.wordpress.org
`hook_type`	`str`	Label from the site (`hook`)
`hook_kind`	`str`	Inferred: `action`, `filter`, or `unknown`
`description`	`str`	Short description
`used_by_count`	`int`	Number of functions that use this hook
`uses_count`	`int`	Number of functions this hook calls
`source_file`	`str`	WordPress source file path
`source_line`	`str`	Line number in that file
`source_github_url`	`str`	Direct GitHub link to the line
`since_versions`	`list[str]`	WordPress version(s) this hook was introduced
`packages`	`list[str]`	WordPress package(s) the hook belongs to

Development

# Install dev dependencies
uv sync --group dev

# Lint & format
uv run ruff check crawler.py
uv run ruff format crawler.py

# Type-check
uv run mypy crawler.py

Pre-built hook data

The crawled output (JSON, YAML, Markdown, HTML, TXT) for all WordPress hooks is published and kept up to date in the companion repository:

github.com/BaseMax/wordpress-hooks

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
crawler.py		crawler.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WordPress Hooks Crawler

Features

Output formats

Requirements

Installation

With uv (recommended)

With pip

Usage

All CLI options

Examples

Project structure

Data schema

Development

Pre-built hook data

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WordPress Hooks Crawler

Features

Output formats

Requirements

Installation

With uv (recommended)

With pip

Usage

All CLI options

Examples

Project structure

Data schema

Development

Pre-built hook data

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages