A fast, fully-typed Python crawler that collects every hook documented at developer.wordpress.org/reference/hooks and exports the data in five formats: JSON, YAML, Markdown, HTML, and plain text.
Pre-built output (all hooks, ready to use) is published separately at github.com/BaseMax/wordpress-hooks.
- Crawls all 49+ paginated listing pages automatically
- Extracts per-hook: name, URL, type, inferred kind (action / filter), description, used-by count, uses count, source file & line, GitHub source link, since-version(s), packages
- HTTP cache SQLite-backed via
requests-cache; re-runs are near-instant and the polite delay is skipped for cache hits - Automatic retry with exponential back-off on network errors
- Five export formats out of the box
- Strict Python type annotations throughout (
mypy --strictclean) - Single dependency group, managed with
uv
| File | Format | Best for |
|---|---|---|
hooks.json |
JSON (pretty-printed) | Programmatic processing, APIs |
hooks.yaml |
YAML | Config files, readable diffs |
hooks.md |
Markdown | GitHub rendering, wikis |
hooks.html |
Self-contained HTML | Browser viewing, searchable table |
hooks.txt |
Plain text | Terminal paging, grep |
- Python ≥ 3.11
uv(recommended) or pip
git clone https://github.com/BaseMax/wordpress-crawler-hooks.git
cd wordpress-crawler-hooks
uv syncgit clone https://github.com/BaseMax/wordpress-crawler-hooks.git
cd wordpress-crawler-hooks
pip install -r requirements.txt# uv
uv run python crawler.py
# plain Python
python crawler.pyOutput files are written to ./output/ by default.
usage: crawler.py [-h] [--output-dir DIR] [--delay SECONDS]
[--cache-dir DIR] [--cache-ttl SECONDS] [--no-cache]
options:
--output-dir DIR Directory where output files are written. (default: output)
--delay SECONDS Pause between HTTP requests. (default: 0.5)
--cache-dir DIR Directory for the SQLite HTTP cache database. (default: .cache)
--cache-ttl SECONDS How long a cached response stays fresh. (default: 86400 = 24 h)
--no-cache Disable the HTTP cache and always fetch live pages.
# Custom output directory
python crawler.py --output-dir data/
# Shorter cache lifetime (1 hour)
python crawler.py --cache-ttl 3600
# Always fetch live, no cache
python crawler.py --no-cache
# Faster scraping (smaller delay) with custom cache location
python crawler.py --delay 0.25 --cache-dir /tmp/wp-cachewordpress-crawler-hooks/
├── crawler.py # main script
├── pyproject.toml # project metadata & dependencies (PEP 621)
├── requirements.txt # pip-compatible pin file
├── output/ # generated output (git-ignored)
│ ├── hooks.json
│ ├── hooks.yaml
│ ├── hooks.md
│ ├── hooks.html
│ └── hooks.txt
└── .cache/ # SQLite HTTP cache (git-ignored)
Each hook object contains:
| Field | Type | Description |
|---|---|---|
post_id |
int |
WordPress post ID |
name |
str |
Hook name, e.g. admin_init |
url |
str |
Full URL on developer.wordpress.org |
hook_type |
str |
Label from the site (hook) |
hook_kind |
str |
Inferred: action, filter, or unknown |
description |
str |
Short description |
used_by_count |
int |
Number of functions that use this hook |
uses_count |
int |
Number of functions this hook calls |
source_file |
str |
WordPress source file path |
source_line |
str |
Line number in that file |
source_github_url |
str |
Direct GitHub link to the line |
since_versions |
list[str] |
WordPress version(s) this hook was introduced |
packages |
list[str] |
WordPress package(s) the hook belongs to |
# Install dev dependencies
uv sync --group dev
# Lint & format
uv run ruff check crawler.py
uv run ruff format crawler.py
# Type-check
uv run mypy crawler.pyThe crawled output (JSON, YAML, Markdown, HTML, TXT) for all WordPress hooks is published and kept up to date in the companion repository:
github.com/BaseMax/wordpress-hooks
MIT License
Copyright © 2026 Seyyed Ali Mohammadiyeh (Max Base)