Author: Fabio Dossi | Version: 1.5.0 | 📖 User Manual (Wiki)
ZotFetch is a Zotero 8 plugin that automatically downloads PDFs for multiple library items in a single operation. It uses a modular two-stage pipeline — source resolution then PDF extraction — so that landing pages from publishers, proxies, and repositories are correctly resolved to their actual PDF URL before import.
- Two-stage pipeline: SourceResolver → PDFResolver → AttachmentImporter. Landing pages are never accidentally imported as PDFs.
- Multi-source download: Native OA → Unpaywall → Semantic Scholar → OpenAlex → Europe PMC → CORE → OA Repository (+ arXiv direct-pdf) → Zotero Native DOI → DOI Landing → Institutional Proxy → CAPES
- PMCID support: extracts
PMCID: PMCxxxxxxxfrom item metadata for an instant Europe PMC fast path — no API call needed - HTML-to-PDF extraction: publisher-specific rules (Springer, Nature, Wiley, Taylor & Francis, ACS, Royal Society of Chemistry, SAGE, Oxford University Press, IEEE, MDPI, Frontiers, Elsevier/ScienceDirect, SciELO.br), a generic HTML extractor, or a Gecko hidden browser that automatically solves Cloudflare/Akamai JS challenges
- Fast Mode: Two-pass strategy — lightweight open-access sources first, institutional fallbacks only for unresolved items
- Ultra Fast Mode: Single-pass, maximum speed, prioritises open-access sources
- Adaptive rate limiting: Per-domain request spacing with exponential backoff on failures
- Anti-captcha protection: Tracks consecutive captcha hits per domain; blocks domain after 3 hits, resets on success
- DOI lookup: Automatically resolves missing DOIs via CrossRef before attempting downloads
- Retry Failed Items / Retry After Auth: Targeted retry flows for failed or auth-blocked items
- Live progress: Color-coded status (🔵/🟢/🟡/🔴) with
PDFs X/Y (Z%)counter - Institutional proxy: Configurable EZproxy/Shibboleth URL for legal paywalled access; Semantic Scholar links are also routed through the proxy when applicable
- Extensible: New SourceResolvers and PDFResolvers can be added without touching the orchestration code
- Download
zotfetch-X.Y.Z.xpifrom the Releases page - In Zotero, go to Tools → Add-ons
- Click the gear icon (⚙️) → Install Add-on From File…
- Select the downloaded
.xpifile - Restart Zotero when prompted
Requirements: Python 3.8+
git clone https://github.com/fdossi/zotfetch.git
cd autoPDFdownloader
python build.py --cleanThis creates zotfetch-X.Y.Z.xpi. Install it via the steps above.
Select one or more items in your Zotero library, then right-click and hover over ZotFetch ▶ to open the submenu:
| Command | Description |
|---|---|
| Batch Download | Downloads PDFs for all selected items using all sources (two-pass if Fast Mode is on) |
| Ultra Fast | Single-pass, fastest mode — open-access sources only. May download fewer PDFs than Batch Download. |
| Retry Failed | Re-attempts only items that failed in the most recent batch |
| Retry After Auth | Opens up to 3 DOI URLs in your browser for manual login/captcha, then retries blocked items |
| Preferences | Opens the graphical Preferences dialog |
ZotFetch tries sources in this order of priority:
| Priority | Source | Notes |
|---|---|---|
| 110 | Native OA | Zotero's built-in open-access finder |
| 100 | Unpaywall | Free legal OA PDF lookup (requires email). Collects direct PDF URLs and OA landing pages across all OA locations. |
| 95 | Semantic Scholar | Graph API openAccessPdf field; falls back to arXiv direct-pdf when S2 knows the arXiv ID |
| 90 | OpenAlex | best_oa_location PDF URL or OA landing page |
| 89 | Europe PMC | Biomedical papers. Instant direct URL when PMCID is known; API search via PMID or DOI otherwise |
| 88 | CORE | core.ac.uk — 234M+ institutional repository records. Free key recommended. |
| 86–85 | OA Repository | Direct arXiv PDF when arXiv ID is known (86); item URL when host is in the safe OA list (85) |
| 83 | Zotero Native (DOI) | Zotero's translator-based DOI resolver. On institutional IP/VPN, resolves paywalled PDFs automatically. |
| 80 | DOI Landing | Publisher page via doi.org (HTML→PDF extraction) |
| 75–72 | Institutional Proxy | DOI routed through your proxy; also routes Semantic Scholar PDF URLs through the proxy |
| 70 | CAPES | Brazilian CAPES portal or custom DOI proxy URL |
In Fast Mode, sources 110–75 run in Pass 1; unresolved items proceed to Pass 2 (CAPES and institutional fallbacks). In Ultra Fast Mode, all sources run in a single pass.
Note: Additional fallback sources can be enabled in
about:config. See the Configuration section for available options.
When a source returns a landing page URL (not a direct PDF), ZotFetch tries to find the real PDF link in this order:
- Publisher-specific rule (Springer, Nature, Wiley, Taylor & Francis, ACS, Royal Society of Chemistry, SAGE, Oxford University Press, IEEE, MDPI, Frontiers, Elsevier/ScienceDirect, SciELO.br)
- Generic HTML extractor:
<meta name="citation_pdf_url">,<link rel="alternate" type="application/pdf">,<iframe>/<embed>/<object>with PDF src,<a>links to.pdfor with "Download PDF" text - Gecko hidden browser (
Zotero.HTTP.processDocuments): for URLs where steps 1–2 failed, loads the page in a full hidden Firefox engine. Automatically passes Cloudflare "Just a moment…" and Akamai Bot Manager JS challenges. Publisher-specific rules and generic extraction are then re-applied on the fully rendered DOM.
Each candidate URL is validated by a HEAD request confirming Content-Type: application/pdf before the file is imported.
Open the graphical settings dialog via right-click → ZotFetch ▶ → Preferences. It covers all options grouped into sections — no need to use about:config.
Click ? Help in the dialog to open the User Manual with detailed setup instructions for institutional proxy and CAPES.
All preferences are also editable directly in about:config (Zotero's advanced config editor, Edit → Settings → Advanced → Config Editor), filtering by extensions.zotfetch.
Security note (v1.4.3+): Sensitive credentials — email, API key, and proxy URLs — are stored in Zotero's encrypted Login Manager, not in
about:config. Use the Preferences dialog to set or change them; editing theextensions.zotfetch.*pref keys for these fields inabout:confighas no effect.
| Preference key | Default | Description |
|---|---|---|
extensions.zotfetch.fastMode |
true |
Enables two-pass Fast Mode |
extensions.zotfetch.batchSize |
30 |
Max items to process per batch run |
extensions.zotfetch.requestDelayMs |
900 |
Base delay between requests (ms), ±60% jitter applied |
extensions.zotfetch.domainGapMs |
1500 |
Minimum gap between requests to the same domain (ms) |
extensions.zotfetch.requestTimeoutMs |
15000 |
Per-request HTTP timeout (ms) |
extensions.zotfetch.antiCaptchaMode |
true |
Skip domains that are currently in cooldown |
extensions.zotfetch.enableCapesFallback |
true |
Enable CAPES/DOI proxy as a fallback source |
extensions.zotfetch.unpaywallTimeoutMs |
12000 |
Timeout for Unpaywall API requests (ms) |
extensions.zotfetch.crossrefTimeoutMs |
10000 |
Timeout for CrossRef DOI lookup (ms) |
Credentials (managed via Preferences dialog only — stored in encrypted Login Manager):
| Field | Description |
|---|---|
| Email address | Your email for Unpaywall/OpenAlex/CrossRef API access. Required for most sources. |
| Institutional Proxy URL | Your institution's EZproxy URL, e.g. https://proxy.myuniversity.edu/login?url= |
| CAPES Gateway URL | CAPES authenticated gateway URL |
| CORE API Key | Free key from core.ac.uk/services/api (optional) |
ZotFetch supports all common proxy URL patterns:
| Format | Example |
|---|---|
| Template placeholder | https://proxy.myuniv.edu/login?url={url} |
| Direct query parameter | https://proxy.myuniv.edu/?url= |
| Query string | https://proxy.myuniv.edu/proxy?url= |
| EZproxy bare base | https://proxy.myuniv.edu |
processItem()
├─ IdentifierExtractor.fromItem() (identifiers.mjs)
│ extracts DOI, arXiv ID, PMID, PMCID, URL, title, year, firstAuthor
├─ SourceResolver[].buildCandidates() (source-resolvers.mjs)
│ returns SourceCandidate[] sorted by priority
└─ for each candidate:
PDFResolver[].resolve() (pdf-resolvers.mjs)
├─ DirectPDFResolver HEAD/GET, confirm Content-Type: application/pdf
├─ PublisherPatternResolver publisher-specific DOM rules
├─ HtmlLandingPDFResolver generic meta/link/iframe/anchor extraction
└─ HiddenBrowserPDFResolver Gecko JS engine (Cloudflare/Akamai challenges)
AttachmentImporter.importResolvedPdf() (importer.mjs)
- Create a class in
source-resolvers.mjsthat implements:id— unique string identifierenabled()— returnsbooleanbuildCandidates(item, ids)— returnsPromise<SourceCandidate[]>
- Add it to
_buildSourceResolvers()infetch.mjsat the appropriate priority position.
- Create a class in
pdf-resolvers.mjsthat implements:canResolve(candidate)— returnsbooleanresolve(candidate, ctx)— returnsPromise<PDFResolutionResult>
- Add it to
_buildPdfResolvers()infetch.mjs.
ZotFetch tracks request history per domain:
- Domain gap: Waits at least
domainGapMsmilliseconds between requests to the same domain - Adaptive backoff: Exponential extra wait on consecutive non-captcha failures (+1 s, +2 s, +4 s… capped at +15 s). Resets on any success.
- Captcha threshold: After 3 consecutive captcha responses from the same domain, that domain is blocked for 30 minutes
- Failure classification:
cloudflare→captcha→auth→blocked→timeout→nopdf→network
- Zotero 8.0.4 or later (not compatible with Zotero 7 or earlier)
- A registered email address for Unpaywall (free, no sign-up required — just a valid email in the preferences)
- Captcha-blocked sessions: When a publisher serves a captcha, use ZotFetch ▶ Retry After Auth to open the URL in your browser, solve the captcha, then retry.
- JavaScript-rendered publisher pages: ZotFetch now includes a Gecko hidden browser (
HiddenBrowserPDFResolver) as the last-resort resolver. It runs a full Firefox engine, automatically passing Cloudflare "Just a moment…" and Akamai Bot Manager JS challenges. Only human-verification CAPTCHAs (reCAPTCHA checkbox, hCaptcha image grids) still require manual intervention via Retry After Auth. - Paywalled content without proxy: Without an institutional proxy configured, ZotFetch can only retrieve freely available (OA) versions of paywalled articles.
Released under the MIT License.
Contributions via pull requests are welcome.
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY — WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE — ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The user assumes all risks and sole responsibility for any consequences arising from the use of this plugin. This includes, but is not limited to, any legal, institutional, or technical consequences related to the access, download, or storage of copyrighted materials. The author makes no representations regarding the legality of any particular use of this plugin in any given jurisdiction. It is the user's responsibility to ensure compliance with all applicable laws, institutional policies, and the terms of service of any platform or content provider accessed through this plugin.
- Security: Sensitive credentials (email, API key, institutional proxy URL, CAPES URL) are now stored in Zotero's encrypted Login Manager (
nsILoginManager) instead of plain-text SQLite prefs. Credentials are protected by the OS keychain or Zotero master password. - Migration: On first startup after upgrading, any existing plain-text pref values are automatically migrated to SecureStorage and wiped from
extensions.zotfetch.*prefs — no user action required. - New: After a batch run, if items failed due to subscription walls and no institutional access method is configured, ZotFetch now shows a clear alert listing affected titles with actionable next-steps (configure proxy / run Retry Failed on VPN).
- Fix: SemanticScholar API User-Agent now reports the current plugin version (
ZotFetch/1.4.3) instead of a hardcoded stale string.
- Fix:
NativeDoiSourceResolverwas unconditionally skipping protected publishers (Elsevier, Wiley, Springer, ACS), which prevented downloads on VPN even when institutional access was active. The early-abort guard is now correctly limited toNativeSourceResolver(Zotero's built-in OA finder), which reports itself with Zotero's real UA.
- Fix: Elsevier
nopdfresults (article not in the subscribed collection) no longer trigger a domain captcha cooldown. The cascade that was blocking subsequent items in the same batch from any Elsevier-hosted source has been resolved. - Fix:
ScienceDirectpublisher routing improved — publisher-pattern resolver correctly handles linkinghub.elsevier.com redirect targets.
- New: Graphical Preferences dialog (
Tools → ZotFetch → Preferencesor right-click submenu). All settings configurable in a clean UI — noabout:configrequired. - New: ? Help button in the Preferences dialog opens the User Manual wiki.
- New: Cross-item domain cooldown — the first captcha from a publisher during a batch blocks all subsequent items from hitting that same publisher in the same run.
- New:
NativeSourceResolverandNativeDoiSourceResolvernow skip known bot-detection publishers (Elsevier, Wiley, Springer, ACS) entirely — preventing Zotero's native UA from triggering captcha floods. - New:
native-doisentinel now checksnegativeCacheand_localHostAbortsbefore firing, preventing redundant requests. - Removed: Sci-Hub integration removed from the plugin entirely.
- Fix:
OaRepositorySourceResolvertests corrected to useitemUrlidentifier key. - Updated: Browser User-Agent strings refreshed — Chrome 132→135, Edge 132→135, Firefox 135→136, Safari 17→18.
- License: Changed from GPLv3 to MIT.
- Author: Updated author from "Fabio" to "Fabio Dossi".
- Breaking internal change: Discovery and import are now decoupled.
fetchPDF()is a thin compatibility wrapper; the real work is done bySourceResolver → PDFResolver → AttachmentImporter. - New modules:
identifiers.mjs,source-resolvers.mjs,pdf-resolvers.mjs,importer.mjs - New
IdentifierExtractorscans DOI field, URL field, Extra field, archiveID, and DOI-in-URL patterns - New
HtmlLandingPDFResolvercorrectly extracts PDF links from publisher landing pages usingresponseType: "document"+ DOM parsing. Strategies:citation_pdf_urlmeta,<link rel="alternate">,<iframe>/<embed>, and PDF anchors - New
PublisherPatternResolverwith host-specific DOM rules for: Springer, Nature, Wiley, Taylor & Francis, ACS, IEEE, MDPI, Frontiers, Elsevier/ScienceDirect, SciELO.br InstitutionalProxySourceResolvernow also routes Semantic Scholar PDF URLs through the proxy- New
DoiLandingSourceResolverfor explicit DOI landing page resolution requestTimeoutMspreference added (extensions.zotfetch.requestTimeoutMs, default 15 000 ms)- All failure reasons now logged with structured context (source, host, method, elapsed ms)
tests/pipeline.test.mjsadded covering 20+ scenarios
- All menu items consolidated under a single ZotFetch ▶ submenu
- Separators added between download actions, retry actions, and preferences
- Added institutional proxy support with 4 URL pattern formats
- Fast Mode and Ultra Fast Mode with two-pass download strategy
- Live color-coded progress display
- Retry Failed Items and Retry After Auth commands
- Failure classification pipeline
- Per-domain adaptive backoff
- DOI lookup via CrossRef
- Initial release: Native OA, Unpaywall, CAPES fallback