Skip to content

feat: complete TODO parity with microposts and local rewriter#15

Merged
ronaldtse merged 3 commits into
mainfrom
feat/final-parity-gap-fill
May 13, 2026
Merged

feat: complete TODO parity with microposts and local rewriter#15
ronaldtse merged 3 commits into
mainfrom
feat/final-parity-gap-fill

Conversation

@ronaldtse
Copy link
Copy Markdown
Contributor

Summary

Completes the final two remaining TODO parity items:

1. Page#microposts (TODO 13.6 — Content Extraction Pipeline)

  • Extracts article/blog post content from HTML pages
  • Detects article containers via semantic selectors (article, [role=article], .post, .entry, etc.)
  • Extracts title, body text, date, and author from each container
  • Falls back to body content when no article containers are found
  • 10 specs

2. LocalRewriter + rewrite-local CLI (TODO 11.5 — Local-Only Rewrite Mode)

  • Rewrites previously downloaded files without fetching from the internet
  • Handles HTML, CSS, and JS files with appropriate rewriting strategies
  • Supports in-place rewriting or output to a separate directory
  • CLI: archaeo rewrite-local INPUT_DIR [--output DIR] [--prefix PREFIX]
  • 10 specs

Changes

  • lib/archaeo/page.rb — microposts method with article extraction
  • lib/archaeo/local_rewriter.rb — new LocalRewriter class
  • lib/archaeo/cli.rb — rewrite-local command
  • lib/archaeo.rb — autoloads for LocalRewriter, LocalRewriteSummary

Test Results

561 examples, 0 failures

ronaldtse added 3 commits May 13, 2026 16:35
Extracts structured content from HTML pages including title, body text,
date, and author from common article markup patterns (article elements,
role=article, blog post classes). Falls back to body content when no
article containers are found.
Adds LocalRewriter class that rewrites previously downloaded files
by converting archive URLs to local paths without fetching. The
rewrite-local CLI command processes HTML, CSS, and JS files from
an input directory and writes rewritten output to an output directory.
].freeze

def rewrite_candidate?(content)
content.include?("web.archive.org")
@ronaldtse ronaldtse merged commit 9dbffd1 into main May 13, 2026
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants