Skip to content

Commit cff0ae1

Browse files
authored
For browser build, set the background web worker explicitly. (#37)
* Add TypeDoc config, docs renames, and LoadParameters type Introduces a TypeDoc configuration file and updates .gitignore for typedoc output. Renames documentation files for clarity and updates references in README.md. Adds a new LoadParameters type definition for PDF loading options. Adds typedoc as a dev dependency and a build script. Includes new test data and a test for HTML PDF parsing. Updates Vite config to use the correct entry point. * Rename DocumentInitParameters to LoadParameters Replaces all usage and documentation of DocumentInitParameters with LoadParameters for clarity and consistency. Updates type exports, API docs, examples, and internal references. Also improves TypeDoc config, adds type documentation link to reports, and fixes a typo in the report:build script. * Update API extractor configs and demo usage Changed API extractor config files to use 'undocumented' report filenames and disabled doc model and TSDoc metadata generation. Removed generated API docs. Updated demo HTML files to explicitly set the worker path for PDFParse and improved CDN import examples. Bumped package version to 2.4.5. * Refactor API Extractor configs and update build scripts Renamed API Extractor config files to the 'configs/' directory and updated their paths and token usage for improved maintainability. Updated build scripts in package.json to reference the new config locations. Added generated API documentation files for node, pdf-parse, and worker builds. * Add example test script and update test workflow Introduces scripts/example.test.mjs to run example scripts for testing. Updates package.json with new test:e and test:all commands, adds tsx as a dev dependency, and modifies the GitHub Actions workflow to use npm run test:all. Also updates example HTML files to set the correct worker path and improves exception handling in exception-handling.ts. * Remove extra logging and update test scripts Eliminated unnecessary console output in example and integration test scripts for cleaner logs. Updated 'test:u' npm script to use the 'dot' reporter for unsupported tests. Commented out a log in the worker and clarified error output for PDF worker loading. * Update CDN links, add funding info, and improve pack script Updated README to use new CDN links and worker configuration for pdf-parse v2.4.5. Added .github/FUNDING.yml and funding field in package.json for GitHub sponsorship. Modified npm pack script to check for outdated packages before packing, and corrected unpkg field to use UMD build. * Refactor worker config docs and add troubleshooting guide Streamlined worker configuration instructions in README.md and moved detailed troubleshooting steps to a new docs/troubleshooting.md file. The new guide covers common errors, platform-specific setup, Node.js version compatibility, and manual worker configuration for custom environments.
1 parent 54937b6 commit cff0ae1

36 files changed

Lines changed: 668 additions & 168 deletions

.github/FUNDING.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
github: mehmet-kozan

.github/workflows/test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,4 @@ jobs:
3232
run: npm i --prefer-offline --no-audit --no-fund --silent
3333

3434
- name: Unit tests
35-
run: npm test
35+
run: npm run test:all

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,5 @@ reports/test
157157
reports/coverage
158158
reports/benchmark
159159
reports/api
160+
reports/typedoc
160161
reports/demo/dist-web

.sonarcloud.properties

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@ sonar.organization=pdf-parse
44

55
# Path to sources (root of TypeScript sources)
66
sonar.sources=src
7-
sonar.inclusions=src/**
7+
sonar.inclusions=src/**/*
88
sonar.exclusions=**/dist/**,**/node_modules/**,**/tests/**,**/examples/**,**/reports/**
99

1010
# Path to tests
1111
sonar.tests=test
12-
sonar.test.inclusions=tests/**
12+
sonar.test.inclusions=tests/**/*
1313
sonar.test.exclusions=**/*.txt,**/*_images/**
1414

1515
# Language and encoding

.vscode/settings.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@
6060
"api-extractor.node.json": "jsonc",
6161
"api-extractor.worker.json": "jsonc",
6262
"node-tsdoc-metadata.json": "jsonc",
63-
"worker-tsdoc-metadata.json": "jsonc"
63+
"worker-tsdoc-metadata.json": "jsonc",
64+
"typedoc.json": "jsonc"
6465
}
6566
}

README.md

Lines changed: 35 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,6 @@
1616
[![biome](https://img.shields.io/badge/code_style-biome-60a5fa?logo=biome)](https://biomejs.dev)
1717
[![vitest](https://img.shields.io/badge/tested_with-vitest-6E9F18?logo=vitest)](https://vitest.dev)
1818
[![codecov](https://codecov.io/github/mehmet-kozan/pdf-parse/graph/badge.svg?token=FZL3G8KNZ8)](https://codecov.io/github/mehmet-kozan/pdf-parse)
19-
[![socket badge](https://socket.dev/api/badge/npm/package/pdf-parse)](https://socket.dev/npm/package/pdf-parse)
2019
[![test & coverage reports](https://img.shields.io/badge/reports-view-brightgreen.svg)](https://mehmet-kozan.github.io/pdf-parse/)
2120

2221
</div>
@@ -47,7 +46,7 @@ run();
4746

4847
- CJS, ESM, Node.js, and browser support.
4948
- Can be integrated with `React`, `Vue`, `Angular`, or any other web framework.
50-
- **Command-line interface** for quick PDF processing: [`CLI Documentation`](./docs/README.cli.md)
49+
- **Command-line interface** for quick PDF processing: [`CLI Documentation`](./docs/command-line.md)
5150
- [`Security Policy`](https://github.com/mehmet-kozan/pdf-parse?tab=security-ov-file#security-policy)
5251
- Retrieve headers and validate PDF : [`getHeader`](#getheader--node-utility-pdf-header-retrieval-and-validation)
5352
- Extract document info : [`getInfo`](#getinfo--extract-metadata-and-document-information)
@@ -57,7 +56,7 @@ run();
5756
- Detect and extract tabular data : [`getTable`](#gettable--extract-tabular-data)
5857
- Well-covered with [`unit tests`](./tests)
5958
- [`Integration tests`](./tests/integration) to validate end-to-end behavior across environments.
60-
- See [DocumentInitParameters](./docs/README.options.md#documentinitparameters) and [ParseParameters](./docs/README.options.md#parseparameters) for all available options.
59+
- See [LoadParameters](./docs/options.md#load-parameters) and [ParseParameters](./docs/options.md#parse-parameters) for all available options.
6160
- Examples: [`live demo`](./reports/demo/), [`examples`](./examples/), [`tests`](./tests/unit/) and [`tests example`](./tests/unit/test-example/) folders.
6261
- Supports: [`Next.js + Vercel`](https://github.com/mehmet-kozan/vercel-next-app-demo), Netlify, AWS Lambda, Cloudflare Workers.
6362

@@ -88,7 +87,7 @@ Or use it directly with npx:
8887
npx pdf-parse --help
8988
```
9089

91-
For detailed CLI documentation and usage examples, see: [CLI Documentation](./docs/README.cli.md)
90+
For detailed CLI documentation and usage examples, see: [CLI Documentation](./docs/command-line.md)
9291

9392
## Usage
9493

@@ -155,8 +154,8 @@ console.log(result.text);
155154
```
156155
For a complete list of configuration options, see:
157156
158-
- [DocumentInitParameters](./docs/README.options.md#documentinitparameters) - document initialization options
159-
- [ParseParameters](./docs/README.options.md#parseparameters) - parse options
157+
- [LoadParameters](./docs/options.md#load-parameters)
158+
- [ParseParameters](./docs/options.md#parse-parameters)
160159
161160
162161
Usage Examples:
@@ -189,7 +188,7 @@ await writeFile('bitcoin.png', result.pages[0].data);
189188
```
190189
191190
Usage Examples:
192-
- Limit output resolution or specific pages using [ParseParameters](./docs/README.options.md#parseparameters)
191+
- Limit output resolution or specific pages using [ParseParameters](./docs/options.md#parse-parameters)
193192
- `getScreenshot({scale:1.5})` — Increase rendering scale (higher DPI / larger image)
194193
- `getScreenshot({desiredWidth:1024})` — Request a target width in pixels; height scales to keep aspect ratio
195194
- `imageDataUrl` (default: `true`) — include base64 data URL string in the result.
@@ -251,10 +250,10 @@ for (const row of result.pages[0].tables[0]) {
251250
## Exception Handling & Type Usage
252251
253252
```ts
254-
import type { DocumentInitParameters, ParseParameters, TextResult } from 'pdf-parse';
253+
import type { LoadParameters, ParseParameters, TextResult } from 'pdf-parse';
255254
import { PasswordException, PDFParse, VerbosityLevel } from 'pdf-parse';
256255
257-
const initParams: DocumentInitParameters = {
256+
const loadParams: LoadParameters = {
258257
url: 'https://mehmet-kozan.github.io/pdf-parse/pdf/password-123456.pdf',
259258
verbosity: VerbosityLevel.WARNINGS,
260259
password: 'abcdef',
@@ -265,7 +264,7 @@ const parseParams: ParseParameters = {
265264
};
266265
267266
// Initialize the parser class without executing any code yet
268-
const parser = new PDFParse(initParams);
267+
const parser = new PDFParse(loadParams);
269268
270269
function handleResult(result: TextResult) {
271270
console.log(result.text);
@@ -283,12 +282,13 @@ try {
283282
// UnknownErrorException
284283
if (error instanceof PasswordException) {
285284
console.error('Password must be 123456\n', error);
285+
} else {
286+
throw error;
286287
}
287288
} finally {
288289
// Always call destroy() to free memory
289290
await parser.destroy();
290291
}
291-
292292
```
293293
294294
## Web / Browser <a href="https://www.jsdelivr.com/package/npm/pdf-parse" target="_blank"><img align="right" src="https://img.shields.io/jsdelivr/npm/hm/pdf-parse"></a>
@@ -297,71 +297,51 @@ try {
297297
- **Live Demo:** [`https://mehmet-kozan.github.io/pdf-parse/`](https://mehmet-kozan.github.io/pdf-parse/)
298298
- **Demo Source:** [`reports/demo`](reports/demo)
299299
- **ES Module**: `pdf-parse.es.js` **UMD/Global**: `pdf-parse.umd.js`
300+
- For browser build, set the [`background web worker`](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers) explicitly.
300301
301302
### CDN Usage
302303
303304
```html
304305
<!-- ES Module -->
305306
<script type="module">
306-
import {PDFParse} from 'https://cdn.jsdelivr.net/npm/pdf-parse@latest/+esm';
307+
308+
import {PDFParse} from 'https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/pdf-parse/web/pdf-parse.es.js';
309+
//// Available Worker Files
310+
// pdf.worker.mjs
311+
// pdf.worker.min.mjs
312+
// If you use a custom build or host pdf.worker.mjs yourself, configure worker accordingly.
313+
PDFParse.setWorker('https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/pdf-parse/web/pdf.worker.mjs');
314+
307315
const parser = new PDFParse({url:'https://mehmet-kozan.github.io/pdf-parse/pdf/bitcoin.pdf'});
308-
const result = await parser.getText()
316+
const result = await parser.getText();
317+
309318
console.log(result.text)
310319
</script>
311320
```
312321
313322
**CDN Options: https://www.jsdelivr.com/package/npm/pdf-parse**
314323
315324
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/pdf-parse/web/pdf-parse.es.js`
316-
- `https://cdn.jsdelivr.net/npm/[email protected].4/dist/pdf-parse/web/pdf-parse.es.js`
325+
- `https://cdn.jsdelivr.net/npm/[email protected].5/dist/pdf-parse/web/pdf-parse.es.js`
317326
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/pdf-parse/web/pdf-parse.umd.js`
318-
- `https://cdn.jsdelivr.net/npm/[email protected]/dist/pdf-parse/web/pdf-parse.umd.js`
319-
320-
321-
## Worker Configuration (Node / Serverless Platforms)
322-
323-
Next.js & Vercel, Edge Functions, Serverless Functions, AWS Lambda, Netlify Functions, or Cloudflare Workers may require additional worker configuration.
324-
325-
This will most likely resolve all worker-related issues.
326-
```js
327-
import 'pdf-parse/worker'; // Import this before importing "pdf-parse"
328-
import {PDFParse} from 'pdf-parse';
329-
330-
// or CommonJS
331-
require ('pdf-parse/worker'); // Import this before importing "pdf-parse"
332-
const {PDFParse} = require('pdf-parse');
333-
```
334-
335-
To ensure `pdf-parse` works correctly with Next.js (especially on serverless platforms like Vercel), add the following configuration to your `next.config.ts` file. This allows Next.js to include `pdf-parse` as an external package for server-side usage:
336-
337-
```js
338-
// next.config.ts
339-
import type { NextConfig } from "next";
340-
341-
const nextConfig: NextConfig = {
342-
serverExternalPackages: ["pdf-parse"],
343-
};
344-
345-
export default nextConfig;
346-
```
327+
- `https://cdn.jsdelivr.net/npm/[email protected]/dist/pdf-parse/web/pdf-parse.umd.js`
347328
348-
> **Note:** Similar configuration may be required for other serverless platforms (such as AWS Lambda, Netlify, or Cloudflare Workers) to ensure that `pdf-parse` and its worker files are properly included and executed in your deployment environment.
329+
**Worker Options:**
349330
350-
Custom builds, Electron/NW.js, or specific deployment environments—you may need to manually configure the worker source.
331+
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/pdf-parse/web/pdf.worker.mjs`
332+
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/pdf-parse/web/pdf.worker.min.mjs`
351333
352-
```js
353-
// Import this before importing "pdf-parse"
354-
import {getPath, getData} from "pdf-parse/worker";
355-
import {PDFParse} from "pdf-parse";
356334
357-
// CommonJS
358-
// const {getWorkerSource, getWorkerPath} = require('pdf-parse/worker');
335+
## Worker Configuration & Troubleshooting
359336
360-
PDFParse.setWorker(getPath());
361-
// or PDFParse.setWorker(getData());
337+
See [docs/troubleshooting.md](./docs/troubleshooting.md) for detailed troubleshooting steps and worker configuration for Node.js and serverless environments.
362338
363-
```
339+
- Worker setup for Node.js, Next.js, Vercel, AWS Lambda, Netlify, Cloudflare Workers
340+
- Common error messages and solutions
341+
- Manual worker configuration for custom builds and Electron/NW.js
342+
- Node.js version compatibility
364343
344+
If you encounter issues, please refer to the [Troubleshooting Guide](./docs/troubleshooting.md).
365345
366346
## Similar Packages
367347
@@ -384,27 +364,7 @@ Integration tests run on Node.js 20–24, see [`test_integration.yml`](./.github
384364
385365
### Unsupported Node.js Versions (18.x, 19.x, 21.x)
386366
387-
Requires additional setup — import and configure a compatible CanvasFactory or worker implementation before initializing pdf-parse; see the examples below.
388-
389-
ESM
390-
```js
391-
// Import this before importing "pdf-parse"
392-
import { CanvasFactory } from 'pdf-parse/worker';
393-
import { PDFParse } from 'pdf-parse';
394-
395-
const parser = new PDFParse({ data: buffer, CanvasFactory });
396-
// then use parser
397-
```
398-
399-
CJS
400-
```js
401-
// Import this before importing "pdf-parse"
402-
const { CanvasFactory } = require('pdf-parse/worker');
403-
const { PDFParse } = require('pdf-parse');
404-
405-
const parser = new PDFParse({ data: buffer, CanvasFactory });
406-
// then use parser
407-
```
367+
Requires additional setup see [docs/troubleshooting.md](./docs/troubleshooting.md).
408368
409369
## Contributing
410370
Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
* SUPPORTED TOKENS: <lookup>
3333
* DEFAULT VALUE: "<lookup>"
3434
*/
35-
//"projectFolder": "..",
35+
"projectFolder": "..",
3636

3737
/**
3838
* (REQUIRED) Specifies the .d.ts file to be used as the starting point for analysis. API Extractor
@@ -45,7 +45,7 @@
4545
*
4646
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
4747
*/
48-
"mainEntryPointFilePath": "dist/node/esm/index.d.ts",
48+
"mainEntryPointFilePath": "<projectFolder>/dist/node/esm/index.d.ts",
4949

5050
/**
5151
* A list of NPM package names whose exports should be treated as part of this package.
@@ -110,7 +110,7 @@
110110
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
111111
* DEFAULT VALUE: "<projectFolder>/tsconfig.json"
112112
*/
113-
"tsconfigFilePath": "tsconfig.node.json"
113+
"tsconfigFilePath": "<projectFolder>/tsconfig.node.json"
114114
/**
115115
* Provides a compiler configuration that will be used instead of reading the tsconfig.json file from disk.
116116
* The object must conform to the TypeScript tsconfig schema:
@@ -184,7 +184,7 @@
184184
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
185185
* DEFAULT VALUE: "<projectFolder>/etc/"
186186
*/
187-
"reportFolder": "docs/"
187+
"reportFolder": "<projectFolder>/docs/"
188188

189189
/**
190190
* Specifies the folder where the temporary report file is written. The file name portion is determined by
@@ -218,7 +218,7 @@
218218
/**
219219
* (REQUIRED) Whether to generate a doc model file.
220220
*/
221-
"enabled": true,
221+
"enabled": false,
222222

223223
/**
224224
* The output path for the doc model file. The file extension should be ".api.json".
@@ -278,7 +278,7 @@
278278
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
279279
* DEFAULT VALUE: "<projectFolder>/dist/<unscopedPackageName>.d.ts"
280280
*/
281-
"untrimmedFilePath": "dist/node/cjs/index.d.cts"
281+
"untrimmedFilePath": "<projectFolder>/dist/node/cjs/index.d.cts"
282282

283283
/**
284284
* Specifies the output path for a .d.ts rollup file to be generated with trimming for an "alpha" release.
@@ -341,7 +341,7 @@
341341
*
342342
* DEFAULT VALUE: true
343343
*/
344-
"enabled": true,
344+
"enabled": false,
345345
/**
346346
* Specifies where the TSDoc metadata file should be written.
347347
*
Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
* SUPPORTED TOKENS: <lookup>
3333
* DEFAULT VALUE: "<lookup>"
3434
*/
35-
// "projectFolder": "..",
35+
"projectFolder": "..",
3636

3737
/**
3838
* (REQUIRED) Specifies the .d.ts file to be used as the starting point for analysis. API Extractor
@@ -45,7 +45,7 @@
4545
*
4646
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
4747
*/
48-
"mainEntryPointFilePath": "dist/pdf-parse/esm/index.d.ts",
48+
"mainEntryPointFilePath": "<projectFolder>/dist/pdf-parse/esm/index.d.ts",
4949

5050
/**
5151
* A list of NPM package names whose exports should be treated as part of this package.
@@ -110,7 +110,7 @@
110110
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
111111
* DEFAULT VALUE: "<projectFolder>/tsconfig.json"
112112
*/
113-
"tsconfigFilePath": "tsconfig.json"
113+
"tsconfigFilePath": "<projectFolder>/tsconfig.json"
114114
/**
115115
* Provides a compiler configuration that will be used instead of reading the tsconfig.json file from disk.
116116
* The object must conform to the TypeScript tsconfig schema:
@@ -184,7 +184,7 @@
184184
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
185185
* DEFAULT VALUE: "<projectFolder>/etc/"
186186
*/
187-
"reportFolder": "docs/",
187+
"reportFolder": "<projectFolder>/docs/",
188188

189189
/**
190190
* Specifies the folder where the temporary report file is written. The file name portion is determined by
@@ -218,7 +218,7 @@
218218
/**
219219
* (REQUIRED) Whether to generate a doc model file.
220220
*/
221-
"enabled": true,
221+
"enabled": false,
222222

223223
/**
224224
* The output path for the doc model file. The file extension should be ".api.json".
@@ -278,7 +278,7 @@
278278
* SUPPORTED TOKENS: <projectFolder>, <packageName>, <unscopedPackageName>
279279
* DEFAULT VALUE: "<projectFolder>/dist/<unscopedPackageName>.d.ts"
280280
*/
281-
"untrimmedFilePath": "dist/pdf-parse/cjs/index.d.cts"
281+
"untrimmedFilePath": "<projectFolder>/dist/pdf-parse/cjs/index.d.cts"
282282

283283
/**
284284
* Specifies the output path for a .d.ts rollup file to be generated with trimming for an "alpha" release.
@@ -341,7 +341,7 @@
341341
*
342342
* DEFAULT VALUE: true
343343
*/
344-
"enabled": true,
344+
"enabled": false,
345345
/**
346346
* Specifies where the TSDoc metadata file should be written.
347347
*

0 commit comments

Comments
 (0)