paper-knowledge-workflow is a public Codex skill for building a traceable research-paper workspace from sources, notes, and drafts.
It is designed for one paper project at a time. It creates a lightweight knowledge base, keeps a source manifest, maps paragraph-level claims to evidence, and runs structure and evidence gates before manuscript revision or export.
This repository is suitable for public beta use. The deterministic gates and project initializer are tested, but the workflow is intentionally not a one-click paper generator.
The scripts automate:
- project initialization
- PDF registration and processed text extraction
- source wiki note generation and matrix row creation
- structure validation
- manifest field validation
- source wiki coverage and local wikilink validation
- paragraph-level claim evidence validation
- maintenance and backlog reporting
The agent or human researcher still owns:
- source ingestion decisions
- literature synthesis
- research-question refinement
- wiki note quality and human-checked summaries
- manuscript argument quality
- scholarly review and final judgment
- First-use guided intake: starts with four plain questions, then infers whether the project is in
preflight,materials, ordraftstate. - Project-level schema: every paper workspace gets its own
SCHEMA.mdso the file structure and evidence rules are explicit. - Direct source tracking: separates durable raw materials from processed reading aids.
- PDF processing adapter: registers PDFs, extracts processed text with
pdftotextor optionalpypdf, and records parser status in the manifest. - Manifest-first evidence registry: uses
index/source_manifest.jsonas the source of truth for access, reading status, evidence level, and citation metadata. - Source wiki generation: creates
10_Wiki/Sources/<source_id>.mdnotes from manifest sources and processed text excerpts. - Literature matrix: creates
30_Matrix/literature_matrix.mdfor comparison and synthesis instead of leaving notes scattered. - Paragraph-level claim map: maps manuscript claims to source IDs in
40_Paper/claim_evidence_map.md. - Hard gates: ships standard-library Python scripts for structure validation, wiki/link validation, evidence validation, and maintenance reporting.
- Soft academic review: keeps scholarly judgment as a review report, not a fake automatic pass.
- Optional ARS routing: can use
academic-research-suiteas a specialist paper-writing companion when it is installed, without vendoring it. - Public-safe examples: includes a small synthetic example project and purity tests to avoid private paths or project-specific material.
- Starting a paper project from a topic, existing materials, or an existing draft.
- Turning PDFs, notes, web sources, and bibliographic metadata into a traceable writing workspace.
- Maintaining
source_manifest.json, wiki pages, literature matrices, paper drafts, and claim-evidence maps. - Checking whether wiki notes cover manifest sources and local wikilinks resolve.
- Checking whether manuscript claims are backed by usable, read sources.
- Routing paper-specific thinking to
academic-research-suitewhen that skill is available.
- Grant proposal writing.
- Cross-project knowledge-base governance.
- One-off copyediting without source tracking.
- A replacement for human scholarly judgment.
- A full document-ingestion engine for every source format. PDF support is a practical adapter, not a universal parser for scanned or protected documents.
Clone the repository, then copy the skill contents into your Codex skills directory.
git clone https://github.com/hideaway007/paper-knowledge-workflow.git
cd paper-knowledge-workflow
mkdir -p ~/.codex/skills/paper-knowledge-workflow
rsync -a skill/ ~/.codex/skills/paper-knowledge-workflow/Restart or refresh Codex so it can discover the new skill.
If you already cloned the repository and only want to install the skill:
mkdir -p ~/.codex/skills/paper-knowledge-workflow
rsync -a skill/ ~/.codex/skills/paper-knowledge-workflow/Ask Codex:
Use $paper-knowledge-workflow to start a paper project.
The skill should ask four short questions:
- What is the paper topic or working title?
- What discipline or research type is this?
- What materials do you already have?
- What do you want from this session?
The skill then infers the entry state:
preflight: topic or direction only.materials: sources are available.draft: a manuscript draft already exists.
The skill should not draft a full paper directly from a vague topic. It first creates or checks the research question checkpoint.
The initialization script can create the project structure directly:
python3 skill/scripts/init_paper_project.py \
--root ./my-paper-project \
--title "Working Paper Title" \
--discipline "urban studies" \
--materials "PDFs and notes" \
--goal "literature review"Then run the hard gates:
python3 skill/scripts/process_pdfs.py --root ./my-paper-project --source-dir ./my-paper-project/00_Inbox
python3 skill/scripts/generate_wiki.py --root ./my-paper-project
python3 skill/scripts/validate_structure.py --root ./my-paper-project
python3 skill/scripts/validate_wiki.py --root ./my-paper-project
python3 skill/scripts/validate_evidence.py --root ./my-paper-project
python3 skill/scripts/generate_maintenance_report.py --root ./my-paper-projectvalidate_structure.py checks that the paper workspace matches the schema contract and that manifest entries are well-formed.
process_pdfs.py registers PDF files, copies them into 20_Sources/raw/, extracts text into 20_Sources/processed/, and records parser metadata under each source's custom field. It prefers system pdftotext and can fall back to optional pypdf.
generate_wiki.py creates source wiki notes in 10_Wiki/Sources/ and appends missing source rows to 30_Matrix/literature_matrix.md.
validate_wiki.py checks that manifest sources have source wiki notes, generated notes link raw and processed artifacts, source wiki IDs still exist in the manifest, and local Obsidian-style wikilinks resolve.
validate_evidence.py checks that paragraph-level claims cite manifest sources and that supported claims do not rely on unread, rejected, pending, or metadata-only sources.
generate_maintenance_report.py summarizes source access, reading status, claim status, and backlog items.
Template examples inside fenced Markdown code blocks are ignored by the claim parser. A freshly initialized project can pass the evidence gate because it contains no real claims yet; it still needs the research question checkpoint and real claim mapping before drafting or export. Parsed PDFs and generated wiki notes are reading aids; they do not make a source read or verified.
00_Inbox/
20_Sources/
raw/
processed/
10_Wiki/
Sources/
30_Matrix/
40_Paper/
50_Review/
index/
SCHEMA.md
This skill is the project owner: it controls the schema, source registry, wiki/matrix artifacts, claim-evidence map, validation gates, and maintenance reports.
academic-research-suite is treated as an optional specialist companion for research-question refinement, literature synthesis, outlines, drafting, citation checks, revision coaching, and reviewer simulation.
This repository does not vendor or copy academic-research-suite. If it is not installed, the workflow still works with the local scripts and references.
The repository uses Python standard-library tests:
python3 -m unittest discover -s tests
python3 tests/smoke_public_workflow.pyThe smoke workflow initializes a temporary project, processes a synthetic PDF, generates source wiki notes, validates wiki links, validates a real synthetic source/claim pair, and checks that an unread source fails the evidence gate.
This repository intentionally avoids private research cases, local machine paths, grant-writing assumptions, and project-specific terminology. Keep examples small, synthetic, and neutral.
See NOTICE.md for the reference model, inspirations, and attribution boundaries, including academic-research-skills, awesome-ai-research-writing, Karpathy's llm-wiki.md, Agent Skills, and Obsidian.