Skip to content

Refactor metadata parsing and update publication workflow#62

Merged
fradav merged 11 commits intomasterfrom
master
May 5, 2026
Merged

Refactor metadata parsing and update publication workflow#62
fradav merged 11 commits intomasterfrom
master

Conversation

@fradav
Copy link
Copy Markdown
Contributor

@fradav fradav commented May 5, 2026

Important

From now on, refresh of publications list is triggered ONLY with manual workflow dispatch + putting "true" in the input field, as shown there

image

This PR introduces significant refactoring to the publication metadata collection pipeline:

Key Changes

1. New PublicationUpdater CLI Tool
— Created src/PublicationUpdater/ with a new F# CLI library
— Added PublicationUpdater.fs module for centralized publication metadata processing
— Added unit tests in src/PublicationUpdater.Tests/

2. Quarto Inspect Integration
— Replaced manual metadata extraction with quarto inspect logic (from src/QuartoInspect/)
— More robust and maintainable approach for extracting document metadata
— Added schema definitions for Quarto inspect JSON outputs

3. Main Script Refactoring
src/getcomputo—pub.fsx now uses the new refactored approach
— RSS feed generation moved to the main publication script
— Simplified repository parsing logic

4. Documentation
— Added docs/DEV—REPORTS/ with comprehensive development reports
— Includes implementation details, quick reference, and schema documentation

5. Build System Updates
— Updated Build.fs with new targets for the PublicationUpdater project
— Added .NET tool configuration in .config/dotnet—tools.json
— Updated workflow to stage both YAML and XML published files

6. Infrastructure
— Added comprehensive .gitignore
— Updated dependencies in paket.dependencies and paket.lock

Related Development Reports

docs/DEV—REPORTS/QUARTO_PROVIDER_IMPLEMENTATION.md
docs/DEV—REPORTS/SCHEMA_BASED_PROVIDERS.md
docs/DEV—REPORTS/QUICK_REFERENCE.md

fradav and others added 11 commits January 19, 2026 13:14
…` logic instead of manual metadata extraction/parsing.

- Removed a small redundant subsection tittle in the authors guide.
- RSS feed generated by the main publication script (as quarto is not capable of generating RSS feeds directly from an existing yml).
Steps are now properly nested under the jobs definition. Remove
the UpdatePublications target argument from the refresh step.
Track .config/dotnet-tools.json by updating gitignore.
Remove UpdatePublications target from default chain
Copilot AI review requested due to automatic review settings May 5, 2026 08:56
@fradav fradav merged commit 2c58ef9 into computorg:master May 5, 2026
2 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Refactors the publication metadata pipeline to rely on quarto inspect, introduces a dedicated PublicationUpdater CLI, and updates the build/workflow to generate + commit publication artifacts deterministically.

Changes:

  • Added QuartoInspect + tests and introduced Quarto inspect JSON schemas/samples for structured parsing/validation.
  • Added PublicationUpdater library + CLI + tests and integrated it into FAKE build targets and GitHub Actions.
  • Updated site templates/content and docs to reflect the new publication refresh workflow and RSS generation.

Reviewed changes

Copilot reviewed 51 out of 55 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/quarto-inspect-project-json-schema.json Adds JSON schema for Quarto project inspect output.
src/quarto-inspect-document-json-schema.json Adds JSON schema for Quarto document inspect output.
src/paket.references Adds build-group Paket references for FAKE build project.
src/getcomputo-pub.fsx New refactored script that downloads repo files and runs quarto inspect, then writes YAML + RSS.
src/QuartoInspect/sample-project.json Adds sample project JSON payload for type provider / docs.
src/QuartoInspect/sample-document.json Adds sample document JSON payload for type provider / docs.
src/QuartoInspect/paket.references Declares Paket dependencies for QuartoInspect library.
src/QuartoInspect/README.md Documents QuartoInspect library usage and tests.
src/QuartoInspect/QuartoTypes.fs Introduces JSON type providers and parse helpers.
src/QuartoInspect/QuartoInspect.fsproj Adds QuartoInspect library project definition.
src/QuartoInspect/QuartoClient.fs Adds client for executing quarto inspect and validation helpers.
src/QuartoInspect.Tests/paket.references Declares Paket dependencies for QuartoInspect test project.
src/QuartoInspect.Tests/QuartoInspectTests.fs Adds Expecto tests for GitHub API, Quarto availability, schemas, and integration.
src/QuartoInspect.Tests/QuartoInspect.Tests.fsproj Adds QuartoInspect test project definition.
src/PublicationUpdater/paket.references Declares Paket dependencies for PublicationUpdater library.
src/PublicationUpdater/PublicationUpdater.fsproj Adds PublicationUpdater library project definition.
src/PublicationUpdater/PublicationUpdater.fs Implements publication collection, YAML/RSS generation, and GitHub fetching.
src/PublicationUpdater/AssemblyInfo.fs Exposes internals to PublicationUpdater.Tests.
src/PublicationUpdater.Tests/paket.references Declares Paket dependencies for PublicationUpdater test project.
src/PublicationUpdater.Tests/PublicationUpdaterTests.fs Adds unit tests for citation extraction helpers and serializers.
src/PublicationUpdater.Tests/PublicationUpdater.Tests.fsproj Adds PublicationUpdater test project definition.
src/PublicationUpdater.Cli/paket.references Declares Paket dependencies for PublicationUpdater CLI.
src/PublicationUpdater.Cli/PublicationUpdater.Cli.fsproj Adds CLI executable project definition.
src/PublicationUpdater.Cli/Program.fs CLI entrypoint to run publication refresh.
src/Build.fsproj Adds FAKE build runner project definition.
src/Build.fs Adds FAKE targets for UpdatePublications/Test/RenderSite.
site/published.xml Updates generated RSS output artifact committed to repo.
site/news.ejs Updates “recent publications” filtering logic.
site/mock-papers.yml Updates generated mock-papers content format.
site/guidelines-authors.qmd Adjusts author guidelines heading/anchor.
paket.lock Introduces locked dependencies for new projects/groups and target framework restriction.
paket.dependencies Adds dependency groups and pins versions/framework.
index.qmd Splits homepage into “Headlines” and “Recent Publications” listings.
getcomputo-pub.fsx Removes old publication refresh script.
docs/DEV-REPORTS/archive/VISUAL_OVERVIEW.md Adds archived visual implementation overview.
docs/DEV-REPORTS/archive/SCHEMA_MODE_CORRECTION.md Adds archived notes about schema-mode correction.
docs/DEV-REPORTS/archive/README.md Adds archive readme.
docs/DEV-REPORTS/archive/MANIFEST.md Adds archived manifest describing implementation artifacts.
docs/DEV-REPORTS/archive/IMPLEMENTATION_COMPLETE.md Adds archived implementation summary.
docs/DEV-REPORTS/archive/FSDATA_SCHEMA_LIMITATIONS.md Adds archived notes on FSharp.Data schema limitations.
docs/DEV-REPORTS/archive/CORRECTION_VERIFIED.md Adds archived verification log.
docs/DEV-REPORTS/archive/CORRECTION_SUMMARY.md Adds archived correction summary.
docs/DEV-REPORTS/archive/CORRECTION_INDEX.md Adds archived index for correction docs.
docs/DEV-REPORTS/SCHEMA_BASED_PROVIDERS.md Adds active doc describing schema-based providers.
docs/DEV-REPORTS/QUICK_REFERENCE.md Adds quick reference for build/test/run paths.
docs/DEV-REPORTS/QUARTO_PROVIDER_IMPLEMENTATION.md Adds implementation notes for providers/tests.
docs/DEV-REPORTS/INDEX.md Adds DEV reports landing page + conventions.
docs/DEV-REPORTS/00-START-HERE.md Adds onboarding/start-here doc.
_quarto.yml Restricts render globs and updates RSS link to site/published.xml.
README.md Updates .NET SDK instructions and documents publication refresh targets.
.gitignore Adds comprehensive ignores for .NET/VS tooling and repo-specific files.
.github/workflows/build.yml Updates CI to setup .NET, restore Paket, run tests, and refresh publications conditionally.
.config/dotnet-tools.json Adds tool manifest for Fantomas + Paket.
Comments suppressed due to low confidence (9)

src/quarto-inspect-document-json-schema.json:1

  • This schema currently forbids any keys inside formats and fileInformation because additionalProperties is false while properties is empty. That contradicts both the description and the sample JSON (which includes keys like "html" and "document.qmd"). If this schema is intended for validation/type-provider schema mode, change these to allow arbitrary keys (e.g., use additionalProperties: { ... } with a suitable schema, or set additionalProperties: true).
    src/quarto-inspect-document-json-schema.json:1
  • This schema currently forbids any keys inside formats and fileInformation because additionalProperties is false while properties is empty. That contradicts both the description and the sample JSON (which includes keys like "html" and "document.qmd"). If this schema is intended for validation/type-provider schema mode, change these to allow arbitrary keys (e.g., use additionalProperties: { ... } with a suitable schema, or set additionalProperties: true).
    src/quarto-inspect-document-json-schema.json:1
  • This schema currently forbids any keys inside formats and fileInformation because additionalProperties is false while properties is empty. That contradicts both the description and the sample JSON (which includes keys like "html" and "document.qmd"). If this schema is intended for validation/type-provider schema mode, change these to allow arbitrary keys (e.g., use additionalProperties: { ... } with a suitable schema, or set additionalProperties: true).
    src/QuartoInspect/QuartoTypes.fs:1
  • Multiple docs in this PR state the type providers use JSON Schema mode (JsonProvider<Schema=...>), but the implementation uses sample-based mode (JsonProvider<sample>). Either update the implementation to use Schema= (and ensure schemas are compatible with FSharp.Data), or adjust the docs to reflect that sample JSON drives the provider.
    src/QuartoInspect/QuartoTypes.fs:1
  • Multiple docs in this PR state the type providers use JSON Schema mode (JsonProvider<Schema=...>), but the implementation uses sample-based mode (JsonProvider<sample>). Either update the implementation to use Schema= (and ensure schemas are compatible with FSharp.Data), or adjust the docs to reflect that sample JSON drives the provider.
    src/QuartoInspect/QuartoClient.fs:1
  • JsonDocument isn't disposed in validateDocumentSchema/validateProjectSchema, which can retain unmanaged buffers. Also, if you switch to use doc = ..., you must return a cloned JsonElement (RootElement.Clone()) to avoid returning an element backed by a disposed document. Consider either (a) returning unit/boolean for validation or (b) cloning the root element before disposing.
    src/QuartoInspect/QuartoClient.fs:1
  • The temp output file is only deleted on the success path. If quarto inspect fails (or succeeds but parsing/reading throws), the temp file can be left behind and accumulate over time. Use a finally cleanup (or try ... finally) that attempts to delete outputFile regardless of exit code.
    src/QuartoInspect/QuartoClient.fs:1
  • The temp output file is only deleted on the success path. If quarto inspect fails (or succeeds but parsing/reading throws), the temp file can be left behind and accumulate over time. Use a finally cleanup (or try ... finally) that attempts to delete outputFile regardless of exit code.
    src/PublicationUpdater/PublicationUpdater.fs:1
  • Author extraction here only checks authors, but the rest of the codebase (and Quarto metadata conventions) often use author (singular) or an array of author objects. This can lead to empty authors in output for valid metadata. Mirror the fallback used elsewhere: try author first, then authors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +46 to +49
- name: refresh publications and commit changes
if: ${{ github.event_name == 'workflow_dispatch' || github.event.inputs.force == 'true' }}
env:
API_GITHUB_TOKEN: ${{ secrets.API_GITHUB_TOKEN }}
Comment thread src/Build.fs

open Fake.Core.TargetOperators

"RenderSite" ==> "Default" |> ignore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants