Skip to content

Releases: dotcommander/defuddle

v0.11.0

25 Jun 03:20

Choose a tag to compare

Defuddle Go v0.11.0

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/[email protected]

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Features

  • 521f3ac: feat(cli): add --render-settle post-load delay for slow SPAs
  • 470c703: feat(cli): add --render-wait-for to block on a selector before snapshot
  • 9bc53be: feat(cli): add --tables-json flag to parse command
  • 6bfe932: feat(tables): add ExtractTables for structured table extraction

Others

  • 9a32c5c: docs(changelog): add v0.11.0 entry

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: cmd/defuddle/v0.10.0...v0.11.0

v0.10.0

24 Jun 21:31

Choose a tag to compare

Defuddle Go v0.10.0

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/[email protected]

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Features

  • a591a5b: feat(defuddle): treat HTTP 304 as ErrNotModified in fetch path

Bug fixes

  • de3ad8b: fix(sanitize): block data:image/svg+xml URIs (XSS bypass vector)

Refactors

  • 2b58925: refactor(defuddle-cli): replace cobra command singletons with per-invocation factories

Others

  • 5b28cf0: chore(release): tidy cli go.mod, gofmt youtube.go, doc-comment Discourse.Extract
  • 3f456ce: docs(changelog): add v0.10.0 entry (ErrNotModified, data:svg sanitizer fix)
  • 110101b: test(defuddle-cli): build fresh commands per test via factories, enable t.Parallel()

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.9.0...v0.10.0

v0.9.0

23 Jun 13:55

Choose a tag to compare

Defuddle Go v0.9.0

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/[email protected]

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Features

  • e4a7e0a: feat(defuddle): fall back to ReadBuildInfo version on go-install builds

Bug fixes

  • 20f311f: fix(defuddle): suppress non-fatal CDP unmarshal warnings in render
  • 4051601: fix(defuddle): sync go.work.sum to unblock goreleaser dirty-tree check

Refactors

  • 72485dd: refactor(chatgpt): collapse duplicated OuterHtml-append into appendOuterHTML
  • 674c5e9: refactor(chatgpt): extract footnoteIndexForURL from processFootnotes
  • 07a2dfd: refactor(claude): extract claudeMessageRoleContent from ExtractMessages
  • bd20f54: refactor(cli): extract encodeBatchResults from runBatch
  • 6d686b6: refactor(conversation): extract conversationFootnotesHTML from CreateContentHTML
  • 9e3bae1: refactor(defuddle): collapse dimension parsing into parseDimensionAttr
  • 5ca10d9: refactor(defuddle): collapse extractor-variable overrides via setIfPresent
  • 814b635: refactor(defuddle): extract isContentLayoutTable from findTableBasedContent
  • 2a62e3d: refactor(defuddle): extract isHiddenElement predicate from removeHiddenElements
  • 26ef3c4: refactor(defuddle): extract isProtectedFromPartialRemoval predicate
  • 106c358: refactor(defuddle): extract preferSpecificChild from findMainContent
  • 3018be5: refactor(defuddle): extract render-cmd wiring out of parse.go
  • 143d5e9: refactor(defuddle): extract schemaItemTypes from countSchemaTypes
  • fc37c7a: refactor(defuddle): extract schemaItemsFromScript from extractSchemaOrgData
  • 21effb9: refactor(defuddle): extract selectMainContent from parseInternal
  • cdcf68c: refactor(defuddle): split removeBySelector into removeExactSelectors/removePartialSelectors
  • 6321bfa: refactor(discourse): extract opDescription from Extract
  • 7187c0d: refactor(elements): collapse duplicated source-picking into bestSourceSrcset
  • 1066b1c: refactor(elements): collapse math detection blocks into mathMLData/laTeXData helpers
  • ea86d04: refactor(elements): extract promoteImageURLAttrs from transformLazyImages
  • 7b8b354: refactor(elements): extract transformPicture from transformPictures loop
  • 89d562e: refactor(elements): extract uniImageFigcaption from transformUniImages
  • e96b567: refactor(elements): split findCaption into named caption-strategy helpers
  • b90a9e1: refactor(extractors): collapse conversation CanExtract into canExtractFromSelection helper
  • 047da5c: refactor(extractors): collapse conversation GetMetadata into conversationMetadata helper
  • c79293e: refactor(extractors): collapse github date formatting into formatGitHubDate
  • e65beb9: refactor(extractors): collapse nytimes heading-block cases via nytHeadingTags map
  • 69dbf3a: refactor(extractors): collapse registry closure boilerplate into register helpers
  • 87ab4f6: refactor(extractors): drop vestigial attr param from mastodonMetaAttr
  • 44f2cae: refactor(extractors): extract commentHTML from renderCommentThread
  • 966a3db: refactor(extractors): extract gemini user/model message builders from ExtractMessages
  • a2fa203: refactor(extractors): extract geminiSourceFootnote from extractSources
  • 46bcf81: refactor(extractors): extract quotedPostParagraphs from threads extractQuotedPost
  • d1bffe4: refactor(extractors): extract reddit fallbackPostContent from getPostContent
  • 05d9987: refactor(extractors): extract relativeTimeDatetime in github extractIssue
  • 04acdc3: refactor(extractors): extract resolvePostPermalink shared by bluesky/threads
  • 112c795: refactor(extractors): extract simplifyLinkedInLinks from cleanTextContent
  • 9406087: refactor(extractors): extract tweetImageHTML from twitter extractImages
  • cf3f0d2: refactor(extractors): extract videoDataFromOGMeta from getVideoDataFromLDJSONScripts
  • 9087736: refactor(github): extract appendIssueComments from extractIssue
  • e672c96: refactor(github): flatten extractAuthor nesting with early-continue guards
  • 4d9b449: refactor(grok): extract footnoteIndexForURL from processFootnotes
  • 3b7a6b7: refactor(grok): extract grokMessageRoleContent from ExtractMessages
  • e3eeb5a: refactor(hackernews): collapse 3-way date extraction into hnDateFromAge
  • de75429: refactor(hackernews): extract commentPageContent from getPostContent
  • bdd8a7e: refactor(lwn): extract isStructuralCommentChild from getCommentContent
  • 3d5b791: refactor(markdown): extract bestSrcsetURL from getBestImageSrc
  • d4feee2: refactor(markdown): extract collectArXivEquations from renderArXivEquationTable
  • 4c0984d: refactor(markdown): extract extractKaTeXLatex from renderKaTeX
  • cfa5f6a: refactor(markdown): extract figureImgAndCaption from renderFigure
  • de5fa3e: refactor(markdown): extract footnoteDefinition from renderFootnotesList
  • b78f078: refactor(markdown): extract isBlockMath from renderKaTeX
  • 041adef: refactor(markdown): extract liIndexInParent for renderListItem position counting
  • ca742c3: refactor(medium): extract hasWordChar predicate from getDescription
  • 25cd440: refactor(medium): extract removeUITextNoise from cleanArticle
  • 723f126: refactor(metadata): collapse Extract documentURL fallback into cmp.Or
  • f09824b: refactor(metadata): collapse author dedup-cap-join into joinUniqueAuthors
  • ef8988c: refactor(metadata): extract resolveDocumentURL from Extract
  • 928a638: refactor(metadata): extract resolveFaviconURL from getFavicon
  • 02259c0: refactor(metadata): hoist searchSchema recursion out of getSchemaProperty
  • 8534071: refactor(removals): extract linksLookLikeBreadcrumb from isBreadcrumbList
  • e0a4c2f: refactor(scoring): extract hasNavigationHeading/hasSocialProfileLink from isLikelyContent
  • 84fdb29: refactor(scoring): reuse hasSocialProfileLink in scoreNonContentBlock
  • 5af2c36: refactor(standardize): collapse leading/trailing
    removal into removeEdgeHRs
  • 79cfb58: refactor(standardize): extract convertLiteYouTube from standardizeElements
  • 6230576: refactor(standardize): extract ensureInlineSpacing from cleanupEmptyElements
  • ec9d1cf: refactor(standardize): extract isConsecutiveBr from stripExtraBrElements
  • 76612b8: refactor(standardize): extract isRemovableEmptyElement predicate from removeEmptyElements
  • 8745eda: refactor(standardize): extract restructureHeadingLink from unwrapSpecialLinks
  • 3f9aec7: refactor(standardize): extract writeAllowedAttributes shared by element/heading copy
  • 72e6fa0: refactor(standardize): hoist hasContentAfter out of removeTrailingHeadings
  • c2b780e: refactor(standardize): hoist isWrapperElement out of flattenWrapperElements
  • 1f287c6: refactor(standardize): hoist pure predicates out of flattenWrapperElements
  • 6fc2d99: refactor(standardize): hoist removeEmptyLines passes to package funcs
  • e8a3f00: refactor(standardize): hoist standardizeSpaces recursion to standardizeSpacesNode
  • e173ba1: refactor(standardize): hoist stripElementAttributes out of stripUnwantedAttributes
  • c40f822: refactor(standardize): split ARIA list transforms into list_elements.go
  • bbabaf7: refactor(threads): extract spanParagraph from extractPostConte...
Read more

v0.8.0

16 Jun 22:33

Choose a tag to compare

Defuddle Go v0.8.0

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/[email protected]

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Features

  • d7636ef: feat(cli): add opt-in --render JS-rendering path via chromedp

Refactors

  • da6b04f: refactor(cli): collapse property name/accessor maps into one source of truth

Others

  • 406c433: docs(cli): rewrite CLI reference to cover all surfaces

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.7.3...v0.8.0

v0.7.3

10 Jun 23:56

Choose a tag to compare

[v0.7.3] — 2026-06-10

Fixed

  • Sanitize site-specific extractor output before returning Result.Content, matching the generic parser sanitizer path.
  • Honor ProcessCode, ProcessImages, ProcessHeadings, ProcessMath, ProcessFootnotes, and ProcessRoles options during standardization.
  • Cap ParseFromURL response reads before buffering the body, returning ErrTooLarge for oversized responses.
  • Return structured ErrHTTPStatus / HTTPStatusError for non-2xx URL fetches instead of parsing error pages.
  • Resolve implicit metadata URLs against the final redirect target while preserving an explicit caller-supplied Options.URL.
  • Sync selected upstream parser fixes from kepano/defuddle: ChatGPT split assistant messages, YouTube JSON-LD video metadata selection, markdown link destinations with spaces, and weekday-aware byline cleanup.

Changed

  • task verify now runs govulncheck ./... through the new task vuln gate.

v0.7.2

29 May 21:20

Choose a tag to compare

v0.7.2

Fixed

  • fix(extractors/grok): extract body inner HTML instead of full document wrapper

Changed

  • refactor(scoring): single-pass anchor metrics in scoreNonContentBlock

v0.7.1

25 May 01:46

Choose a tag to compare

Defuddle Go v0.7.1

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/[email protected]

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Bug fixes

  • 6a60544: fix(lint): errcheck on test pipes and tidy go.mod

Others

  • 1223a0a: chore(taskfile): gate tag target on verify
  • 0bb9f37: test(cli): add RunE integration tests for all subcommands

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.7.0...v0.7.1

v0.6.0

25 Apr 18:34

Choose a tag to compare

Full Changelog: v0.5.3...v0.6.0

v0.5.3

25 Apr 18:05

Choose a tag to compare

Full Changelog: v0.5.2...v0.5.3

v0.5.2

25 Apr 17:36

Choose a tag to compare

Defuddle Go v0.5.2

Web content extraction library and CLI tool for Go.

📦 Installation

Download Pre-built Binaries

Download the appropriate binary for your platform from the assets below.

Install with Go

go install github.com/dotcommander/defuddle/cmd/[email protected]

Install from Source

git clone https://github.com/dotcommander/defuddle.git
cd defuddle-go
make build-cli

Changelog

Bug fixes

  • c748dab: fix(removals): subdomain-aware same-site hostname matching

Performance improvements

  • 6ee7089: perf: pre-compiled CSS selectors, regex fast-path, and avoid re-parse on word count

🔍 Usage Examples

# Extract content from URL
defuddle parse https://example.com/article

# Convert to markdown
defuddle parse https://example.com/article --markdown

# Get JSON output with metadata
defuddle parse https://example.com/article --json

# Extract specific property
defuddle parse https://example.com/article --property title

Full Changelog: v0.5.1...v0.5.2