Releases: dotcommander/defuddle
v0.11.0
Defuddle Go v0.11.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/[email protected]Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- 521f3ac: feat(cli): add --render-settle post-load delay for slow SPAs
- 470c703: feat(cli): add --render-wait-for to block on a selector before snapshot
- 9bc53be: feat(cli): add --tables-json flag to parse command
- 6bfe932: feat(tables): add ExtractTables for structured table extraction
Others
- 9a32c5c: docs(changelog): add v0.11.0 entry
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: cmd/defuddle/v0.10.0...v0.11.0
v0.10.0
Defuddle Go v0.10.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/[email protected]Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- a591a5b: feat(defuddle): treat HTTP 304 as ErrNotModified in fetch path
Bug fixes
- de3ad8b: fix(sanitize): block data:image/svg+xml URIs (XSS bypass vector)
Refactors
- 2b58925: refactor(defuddle-cli): replace cobra command singletons with per-invocation factories
Others
- 5b28cf0: chore(release): tidy cli go.mod, gofmt youtube.go, doc-comment Discourse.Extract
- 3f456ce: docs(changelog): add v0.10.0 entry (ErrNotModified, data:svg sanitizer fix)
- 110101b: test(defuddle-cli): build fresh commands per test via factories, enable t.Parallel()
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.9.0...v0.10.0
v0.9.0
Defuddle Go v0.9.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/[email protected]Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- e4a7e0a: feat(defuddle): fall back to ReadBuildInfo version on go-install builds
Bug fixes
- 20f311f: fix(defuddle): suppress non-fatal CDP unmarshal warnings in render
- 4051601: fix(defuddle): sync go.work.sum to unblock goreleaser dirty-tree check
Refactors
- 72485dd: refactor(chatgpt): collapse duplicated OuterHtml-append into appendOuterHTML
- 674c5e9: refactor(chatgpt): extract footnoteIndexForURL from processFootnotes
- 07a2dfd: refactor(claude): extract claudeMessageRoleContent from ExtractMessages
- bd20f54: refactor(cli): extract encodeBatchResults from runBatch
- 6d686b6: refactor(conversation): extract conversationFootnotesHTML from CreateContentHTML
- 9e3bae1: refactor(defuddle): collapse dimension parsing into parseDimensionAttr
- 5ca10d9: refactor(defuddle): collapse extractor-variable overrides via setIfPresent
- 814b635: refactor(defuddle): extract isContentLayoutTable from findTableBasedContent
- 2a62e3d: refactor(defuddle): extract isHiddenElement predicate from removeHiddenElements
- 26ef3c4: refactor(defuddle): extract isProtectedFromPartialRemoval predicate
- 106c358: refactor(defuddle): extract preferSpecificChild from findMainContent
- 3018be5: refactor(defuddle): extract render-cmd wiring out of parse.go
- 143d5e9: refactor(defuddle): extract schemaItemTypes from countSchemaTypes
- fc37c7a: refactor(defuddle): extract schemaItemsFromScript from extractSchemaOrgData
- 21effb9: refactor(defuddle): extract selectMainContent from parseInternal
- cdcf68c: refactor(defuddle): split removeBySelector into removeExactSelectors/removePartialSelectors
- 6321bfa: refactor(discourse): extract opDescription from Extract
- 7187c0d: refactor(elements): collapse duplicated source-picking into bestSourceSrcset
- 1066b1c: refactor(elements): collapse math detection blocks into mathMLData/laTeXData helpers
- ea86d04: refactor(elements): extract promoteImageURLAttrs from transformLazyImages
- 7b8b354: refactor(elements): extract transformPicture from transformPictures loop
- 89d562e: refactor(elements): extract uniImageFigcaption from transformUniImages
- e96b567: refactor(elements): split findCaption into named caption-strategy helpers
- b90a9e1: refactor(extractors): collapse conversation CanExtract into canExtractFromSelection helper
- 047da5c: refactor(extractors): collapse conversation GetMetadata into conversationMetadata helper
- c79293e: refactor(extractors): collapse github date formatting into formatGitHubDate
- e65beb9: refactor(extractors): collapse nytimes heading-block cases via nytHeadingTags map
- 69dbf3a: refactor(extractors): collapse registry closure boilerplate into register helpers
- 87ab4f6: refactor(extractors): drop vestigial attr param from mastodonMetaAttr
- 44f2cae: refactor(extractors): extract commentHTML from renderCommentThread
- 966a3db: refactor(extractors): extract gemini user/model message builders from ExtractMessages
- a2fa203: refactor(extractors): extract geminiSourceFootnote from extractSources
- 46bcf81: refactor(extractors): extract quotedPostParagraphs from threads extractQuotedPost
- d1bffe4: refactor(extractors): extract reddit fallbackPostContent from getPostContent
- 05d9987: refactor(extractors): extract relativeTimeDatetime in github extractIssue
- 04acdc3: refactor(extractors): extract resolvePostPermalink shared by bluesky/threads
- 112c795: refactor(extractors): extract simplifyLinkedInLinks from cleanTextContent
- 9406087: refactor(extractors): extract tweetImageHTML from twitter extractImages
- cf3f0d2: refactor(extractors): extract videoDataFromOGMeta from getVideoDataFromLDJSONScripts
- 9087736: refactor(github): extract appendIssueComments from extractIssue
- e672c96: refactor(github): flatten extractAuthor nesting with early-continue guards
- 4d9b449: refactor(grok): extract footnoteIndexForURL from processFootnotes
- 3b7a6b7: refactor(grok): extract grokMessageRoleContent from ExtractMessages
- e3eeb5a: refactor(hackernews): collapse 3-way date extraction into hnDateFromAge
- de75429: refactor(hackernews): extract commentPageContent from getPostContent
- bdd8a7e: refactor(lwn): extract isStructuralCommentChild from getCommentContent
- 3d5b791: refactor(markdown): extract bestSrcsetURL from getBestImageSrc
- d4feee2: refactor(markdown): extract collectArXivEquations from renderArXivEquationTable
- 4c0984d: refactor(markdown): extract extractKaTeXLatex from renderKaTeX
- cfa5f6a: refactor(markdown): extract figureImgAndCaption from renderFigure
- de5fa3e: refactor(markdown): extract footnoteDefinition from renderFootnotesList
- b78f078: refactor(markdown): extract isBlockMath from renderKaTeX
- 041adef: refactor(markdown): extract liIndexInParent for renderListItem position counting
- ca742c3: refactor(medium): extract hasWordChar predicate from getDescription
- 25cd440: refactor(medium): extract removeUITextNoise from cleanArticle
- 723f126: refactor(metadata): collapse Extract documentURL fallback into cmp.Or
- f09824b: refactor(metadata): collapse author dedup-cap-join into joinUniqueAuthors
- ef8988c: refactor(metadata): extract resolveDocumentURL from Extract
- 928a638: refactor(metadata): extract resolveFaviconURL from getFavicon
- 02259c0: refactor(metadata): hoist searchSchema recursion out of getSchemaProperty
- 8534071: refactor(removals): extract linksLookLikeBreadcrumb from isBreadcrumbList
- e0a4c2f: refactor(scoring): extract hasNavigationHeading/hasSocialProfileLink from isLikelyContent
- 84fdb29: refactor(scoring): reuse hasSocialProfileLink in scoreNonContentBlock
- 5af2c36: refactor(standardize): collapse leading/trailing
removal into removeEdgeHRs - 79cfb58: refactor(standardize): extract convertLiteYouTube from standardizeElements
- 6230576: refactor(standardize): extract ensureInlineSpacing from cleanupEmptyElements
- ec9d1cf: refactor(standardize): extract isConsecutiveBr from stripExtraBrElements
- 76612b8: refactor(standardize): extract isRemovableEmptyElement predicate from removeEmptyElements
- 8745eda: refactor(standardize): extract restructureHeadingLink from unwrapSpecialLinks
- 3f9aec7: refactor(standardize): extract writeAllowedAttributes shared by element/heading copy
- 72e6fa0: refactor(standardize): hoist hasContentAfter out of removeTrailingHeadings
- c2b780e: refactor(standardize): hoist isWrapperElement out of flattenWrapperElements
- 1f287c6: refactor(standardize): hoist pure predicates out of flattenWrapperElements
- 6fc2d99: refactor(standardize): hoist removeEmptyLines passes to package funcs
- e8a3f00: refactor(standardize): hoist standardizeSpaces recursion to standardizeSpacesNode
- e173ba1: refactor(standardize): hoist stripElementAttributes out of stripUnwantedAttributes
- c40f822: refactor(standardize): split ARIA list transforms into list_elements.go
- bbabaf7: refactor(threads): extract spanParagraph from extractPostConte...
v0.8.0
Defuddle Go v0.8.0
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/[email protected]Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Features
- d7636ef: feat(cli): add opt-in --render JS-rendering path via chromedp
Refactors
- da6b04f: refactor(cli): collapse property name/accessor maps into one source of truth
Others
- 406c433: docs(cli): rewrite CLI reference to cover all surfaces
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.7.3...v0.8.0
v0.7.3
[v0.7.3] — 2026-06-10
Fixed
- Sanitize site-specific extractor output before returning
Result.Content, matching the generic parser sanitizer path. - Honor
ProcessCode,ProcessImages,ProcessHeadings,ProcessMath,ProcessFootnotes, andProcessRolesoptions during standardization. - Cap
ParseFromURLresponse reads before buffering the body, returningErrTooLargefor oversized responses. - Return structured
ErrHTTPStatus/HTTPStatusErrorfor non-2xx URL fetches instead of parsing error pages. - Resolve implicit metadata URLs against the final redirect target while preserving an explicit caller-supplied
Options.URL. - Sync selected upstream parser fixes from
kepano/defuddle: ChatGPT split assistant messages, YouTube JSON-LD video metadata selection, markdown link destinations with spaces, and weekday-aware byline cleanup.
Changed
task verifynow runsgovulncheck ./...through the newtask vulngate.
v0.7.2
v0.7.2
Fixed
- fix(extractors/grok): extract body inner HTML instead of full document wrapper
Changed
- refactor(scoring): single-pass anchor metrics in scoreNonContentBlock
v0.7.1
Defuddle Go v0.7.1
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/[email protected]Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- 6a60544: fix(lint): errcheck on test pipes and tidy go.mod
Others
- 1223a0a: chore(taskfile): gate tag target on verify
- 0bb9f37: test(cli): add RunE integration tests for all subcommands
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.7.0...v0.7.1
v0.6.0
Full Changelog: v0.5.3...v0.6.0
v0.5.3
Full Changelog: v0.5.2...v0.5.3
v0.5.2
Defuddle Go v0.5.2
Web content extraction library and CLI tool for Go.
📦 Installation
Download Pre-built Binaries
Download the appropriate binary for your platform from the assets below.
Install with Go
go install github.com/dotcommander/defuddle/cmd/[email protected]Install from Source
git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cliChangelog
Bug fixes
- c748dab: fix(removals): subdomain-aware same-site hostname matching
Performance improvements
- 6ee7089: perf: pre-compiled CSS selectors, regex fast-path, and avoid re-parse on word count
🔍 Usage Examples
# Extract content from URL
defuddle parse https://example.com/article
# Convert to markdown
defuddle parse https://example.com/article --markdown
# Get JSON output with metadata
defuddle parse https://example.com/article --json
# Extract specific property
defuddle parse https://example.com/article --property titleFull Changelog: v0.5.1...v0.5.2