Add prompt override support and workflow helper by DavidLMS · Pull Request #5 · DavidLMS/aphra

DavidLMS · 2026-03-24T19:57:29Z

Introduce prompt override mechanism via a prompts_dir and a workflow-level helper.

Implement robust prompt loading in aphra/core/prompts.py: supports full overrides ({prompts_dir}/{file}), prepend ({base}_prepend.txt) and append ({base}_append.txt), with formatting, fallbacks, and logging.
Add AbstractWorkflow.get_prompt wrapper and store prompts_dir from workflow config so workflows can call self.get_prompt(...) to automatically respect user overrides.
Update ShortArticleWorkflow to use self.get_prompt instead of importing get_prompt directly.
Update docs (CONTRIBUTING.md, README.md) and config.example.toml to document prompts_dir usage and encourage implementing execute() and using self.get_prompt().
Add comprehensive tests (tests/test_core_prompts.py) for override, prepend/append behavior, placeholder formatting, and fallback cases.

These changes allow users to customize or extend workflow prompts without modifying package code and make prompt usage simpler and backwards-compatible.

Introduce prompt override mechanism via a prompts_dir and a workflow-level helper. - Implement robust prompt loading in aphra/core/prompts.py: supports full overrides ({prompts_dir}/{file}), prepend ({base}_prepend.txt) and append ({base}_append.txt), with formatting, fallbacks, and logging. - Add AbstractWorkflow.get_prompt wrapper and store prompts_dir from workflow config so workflows can call self.get_prompt(...) to automatically respect user overrides. - Update ShortArticleWorkflow to use self.get_prompt instead of importing get_prompt directly. - Update docs (CONTRIBUTING.md, README.md) and config.example.toml to document prompts_dir usage and encourage implementing execute() and using self.get_prompt(). - Add comprehensive tests (tests/test_core_prompts.py) for override, prepend/append behavior, placeholder formatting, and fallback cases. These changes allow users to customize or extend workflow prompts without modifying package code and make prompt usage simpler and backwards-compatible.

Copilot

Pull request overview

Adds a configurable prompt override mechanism (prompts_dir) to the core prompt loader and exposes it ergonomically at the workflow level via AbstractWorkflow.get_prompt(), so workflows automatically honor user-provided prompt replacements/extensions without editing package code.

Changes:

Implement prompt override resolution in aphra/core/prompts.py (full override + prepend/append extensions) with formatting and debug logging.
Add AbstractWorkflow.get_prompt() and wire prompts_dir from workflow config during run(), then update ShortArticleWorkflow to use self.get_prompt(...).
Document prompts_dir usage in README/CONTRIBUTING/config example and add unit tests covering override behaviors.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`aphra/core/prompts.py`	Adds prompt override/extend resolution and formatting helpers.
`aphra/core/workflow.py`	Adds `self.get_prompt()` wrapper and stores `prompts_dir` from config during `run()`.
`aphra/workflows/short_article/short_article_workflow.py`	Switches prompt loading to `self.get_prompt()` so overrides apply automatically.
`tests/test_core_prompts.py`	Adds tests for full override and prepend/append behavior via `prompts_dir`.
`config.example.toml`	Documents `prompts_dir` and expected override file naming.
`README.md`	Documents prompt override mechanism and recommends `execute()` + `self.get_prompt()`.
`CONTRIBUTING.md`	Updates workflow development guidance to use `execute()` and `self.get_prompt()`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T20:01:11Z

+def _read_file_if_exists(file_path: str) -> str:
+    """
+    Read a file and return its content, or return None if it doesn't exist.
+
+    Args:
+        file_path: Absolute or relative path to the file
+
+    Returns:
+        str or None: File content, or None if the file doesn't exist
+    """
+    if os.path.isfile(file_path):
+        with open(file_path, 'r', encoding="utf-8") as file:
+            return file.read()
+    return None


_read_file_if_exists is annotated as returning str but it returns None when the file is absent. This makes the type contract incorrect and can break static typing; please change the return type to Optional[str] (and consider updating get_prompt(..., prompts_dir=...) to accept Optional[str] as well).

Copilot · 2026-03-24T20:01:11Z

+    _prompts_dir = None
+
+    def get_prompt(self, file_name: str, **kwargs) -> str:
+        """
+        Load a prompt template for this workflow, with override support.
+
+        This is a convenience wrapper around core.prompts.get_prompt() that
+        automatically provides the workflow name and prompts_dir from config.
+
+        Args:
+            file_name: Name of the prompt file (e.g., 'step1_system.txt')
+            **kwargs: Format placeholders for the prompt template
+
+        Returns:
+            str: The formatted prompt content
+        """
+        prompts_dir = getattr(self, '_prompts_dir', None)
+        return _get_prompt(self.get_workflow_name(), file_name, prompts_dir=prompts_dir, **kwargs)


Defining _prompts_dir as a class attribute makes it look like shared mutable state across workflow instances, even though run() later sets an instance attribute. Consider initializing it as an instance attribute (e.g., in __init__) and/or adding an explicit Optional[str] type annotation to clarify per-instance state.

Copilot · 2026-03-24T20:01:11Z

        # Load workflow configuration and set it in context
        workflow_config = self.load_config()
        context.set_workflow_config(workflow_config)

+        # Store prompts_dir for use by self.get_prompt()
+        self._prompts_dir = workflow_config.get('prompts_dir')
+


New behavior is introduced here by reading prompts_dir from config and wiring it into self.get_prompt(), but there’s no unit test that exercises AbstractWorkflow.run() + self.get_prompt() together. Adding a small dummy workflow test would prevent regressions (e.g., ensuring prompts_dir is honored when calling self.get_prompt() inside execute()).

Copilot · 2026-03-24T20:01:11Z

+    def tearDown(self):
+        """Clean up temporary files."""
+        for f in os.listdir(self.tmpdir):
+            os.remove(os.path.join(self.tmpdir, f))
+        os.rmdir(self.tmpdir)


The cleanup logic only removes files directly under self.tmpdir and then calls os.rmdir. If a future test creates subdirectories (or leaves behind nested paths), tearDown will fail and leak temp dirs. Consider using tempfile.TemporaryDirectory() (store the object and call .cleanup()) or shutil.rmtree(self.tmpdir) for more robust cleanup.

Copilot · 2026-03-24T20:01:12Z

+        append_path = os.path.join(self.tmpdir, 'step1_system_append.txt')
+        with open(append_path, 'w', encoding='utf-8') as f:
+            f.write("SHOULD NOT APPEAR")
+


test_override_takes_priority_over_append says it verifies override ignores both append and prepend files, but the test only creates an _append file. Either add a _prepend file as well or adjust the docstring/comment so it matches what’s actually being tested.

Suggested change

prepend_path = os.path.join(self.tmpdir, 'step1_system_prepend.txt')

with open(prepend_path, 'w', encoding='utf-8') as f:

f.write("SHOULD NOT APPEAR")

Add Optional typing annotations for prompt helpers and prompts_dir, initialize _prompts_dir on AbstractWorkflow, and simplify get_prompt to use the instance attribute. Update _read_file_if_exists to return Optional[str] and adjust get_prompt signature accordingly. Tests: switch to TemporaryDirectory for cleanup, add imports/mocks, refine existing prompt override tests, and add integration tests that verify AbstractWorkflow.run wires prompts_dir from load_workflow_config into self.get_prompt.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T20:19:30Z


-    return formatted_prompt

 def list_workflow_prompts(workflow_name: str) -> list[str]:


The return type annotation -> list[str] is not valid on Python 3.8 without from __future__ import annotations (it raises TypeError: 'type' object is not subscriptable at import time). Since the project declares python_requires >= 3.8 in packaging config, this needs to be made 3.8-compatible (e.g., use typing.List[str] / List[str], add postponed evaluation of annotations, or bump the minimum supported Python version).

Copilot · 2026-03-24T20:19:30Z

+    base_name = os.path.splitext(file_name)[0]
+
+    if prompts_dir:
+        # Check for full override
+        override_path = os.path.join(prompts_dir, file_name)
+        override_content = _read_file_if_exists(override_path)
+


prompts_dir is joined with file_name directly, which allows path traversal if a caller passes something like ../... in file_name, potentially reading arbitrary local files. Even if current workflows pass constants, get_prompt() is a public utility, so it should defensively reject path separators/absolute paths and normalize to a basename before joining.

Copilot · 2026-03-24T20:19:30Z

    try:
        # Try using importlib.resources first (works in packaged installations)
        ref = resources.files('aphra.workflows') / workflow_name / 'prompts' / file_name
        with ref.open('r', encoding="utf-8") as file:
-            content = file.read()
+            return file.read()
    except (AttributeError, FileNotFoundError) as exc:
        # Fallback to direct file access (works in development)
-        workflows_path = os.path.dirname(os.path.dirname(__file__))  # Go up to aphra/
+        workflows_path = os.path.dirname(os.path.dirname(__file__))
        file_path = os.path.join(workflows_path, 'workflows', workflow_name, 'prompts', file_name)

        if not os.path.exists(file_path):
            raise FileNotFoundError(f"Prompt file not found: {file_path}") from exc

        with open(file_path, 'r', encoding="utf-8") as file:
-            content = file.read()
+            return file.read()


Default prompt loading relies on prompt .txt files being present in the installed package (via importlib.resources or direct filesystem access). In this repo, the packaging metadata appears to include only aphra/prompts/*.txt, but the actual prompts live under aphra/workflows/*/prompts/ (and aphra/prompts/ doesn’t exist). This can make packaged installs fail to find default prompts; please ensure the workflow prompt files are included as package data in the distribution metadata.

Co-authored-by: Copilot <[email protected]>

Add validation in get_prompt to reject filenames containing path separators or absolute paths and raise ValueError to prevent path traversal. Import typing.List and update list_workflow_prompts to use List[str]. Update packaging to include workflow-level resources: add aphra/workflows/*/prompts/*.txt and aphra/workflows/*/config/*.toml to pyproject.toml and setup.py package_data so workflow prompts and config files are packaged.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T20:28:56Z

+        ValueError: If file_name contains path separators or is an absolute path
+    """
+    # Reject path traversal attempts
+    if os.path.basename(file_name) != file_name or os.path.isabs(file_name):


The file name validation can be bypassed on Windows with drive-relative paths like C:foo.txt. os.path.isabs('C:foo.txt') is false and basename is unchanged, but os.path.join(prompts_dir, file_name) will ignore prompts_dir when file_name includes a drive, potentially allowing reads outside the intended override directory. Consider additionally rejecting any drive component via os.path.splitdrive(file_name)[0] (or pathlib.PurePath(file_name).drive/root) and/or disallowing ':' in filenames.

Suggested change

if os.path.basename(file_name) != file_name or os.path.isabs(file_name):

drive, _ = os.path.splitdrive(file_name)

if os.path.basename(file_name) != file_name or os.path.isabs(file_name) or drive:

Copilot · 2026-03-24T20:28:57Z

+        ValueError: If file_name contains path separators or is an absolute path
+    """
+    # Reject path traversal attempts
+    if os.path.basename(file_name) != file_name or os.path.isabs(file_name):
+        raise ValueError(f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators.")


New behavior raises ValueError for invalid prompt filenames, but there are no unit tests covering the validation branch (e.g., ../x, a/b, or Windows drive-relative inputs like C:foo.txt). Adding tests for these cases would prevent regressions and verify the path-traversal protections.

Suggested change

ValueError: If file_name contains path separators or is an absolute path

"""

# Reject path traversal attempts

if os.path.basename(file_name) != file_name or os.path.isabs(file_name):

raise ValueError(f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators.")

ValueError: If file_name contains path separators, is an absolute path, or has a drive component

"""

# Reject path traversal attempts and non-plain filenames (including Windows drive-relative paths)

drive, _ = os.path.splitdrive(file_name)

if os.path.basename(file_name) != file_name or os.path.isabs(file_name) or drive:

raise ValueError(

f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators or drive prefixes."

)

Add ntpath-based drive detection and strengthen filename validation in get_prompt to reject path traversal attempts, absolute paths, subdirectory names, and Windows drive-relative paths (e.g. "C:foo.txt"). Update raises ValueError for invalid filenames. Add TestPathTraversalValidation unit tests covering relative traversal, subdirectory, absolute, Windows drive-relative cases, and a plain filename acceptance check.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T20:40:15Z

+    def test_rejects_relative_path_traversal(self):
+        """
+        Test that '../' path traversal attempts are rejected.
+        """
+        with self.assertRaises(ValueError):
+            get_prompt('short_article', '../etc/passwd')
+
+    def test_rejects_subdirectory_path(self):
+        """
+        Test that file names containing subdirectory separators are rejected.
+        """
+        with self.assertRaises(ValueError):
+            get_prompt('short_article', 'subdir/step1_system.txt')
+


Path traversal tests cover malicious file_name values, but there is no corresponding test for malicious workflow_name values (e.g., ../core), even though get_prompt() and list_workflow_prompts() both incorporate workflow_name into filesystem/resource paths. If you add workflow name validation, add tests here to lock in that behavior.

Copilot · 2026-03-24T20:40:15Z

+    # Reject path traversal attempts (including Windows drive-relative paths like "C:foo.txt")
+    if (os.path.basename(file_name) != file_name
+            or os.path.isabs(file_name)
+            or ntpath.splitdrive(file_name)[0]):
+        raise ValueError(f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators.")
+


get_prompt() validates file_name against path traversal, but workflow_name is still used as a path component in _load_default_prompt() without validation. Since workflow_name is part of the public get_prompt(workflow_name, ...) API, callers can pass values like ../... to traverse outside aphra/workflows/<workflow>/prompts in the filesystem fallback (and potentially via importlib.resources), enabling unintended file reads. Consider validating workflow_name to be a plain directory/package name (no separators, no .., no drive/absolute path) before constructing paths.

Copilot · 2026-03-24T20:40:15Z

+def list_workflow_prompts(workflow_name: str) -> List[str]:
    """
    List all available prompt files for a workflow.



list_workflow_prompts() includes workflow_name as a path segment (via importlib.resources and the filesystem fallback). Similar to get_prompt(), this should validate workflow_name to prevent path traversal (e.g., ../...) from escaping the intended aphra/workflows/<workflow>/prompts directory when callers pass untrusted values.

Copilot · 2026-03-24T20:40:16Z

+def list_workflow_prompts(workflow_name: str) -> List[str]:
    """
    List all available prompt files for a workflow.



list_workflow_prompts() is documented/used as a way to list prompt templates, but its implementation returns all files in the prompts directory. To avoid surfacing non-prompt artifacts (and to match the expectation that prompts are .txt templates), filter the returned filenames to .txt only in both branches.

Introduce _validate_name to reject path separators, absolute paths, and Windows drive-relative names and use it to validate workflow_name and file_name in get_prompt and list_workflow_prompts. Also restrict listing to .txt prompt files (both via importlib.resources and filesystem fallback). Update tests to cover workflow_name validation and filename traversal cases.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T18:29:36Z

+    if (os.path.basename(value) != value
+            or os.path.isabs(value)
+            or ntpath.splitdrive(value)[0]):
+        raise ValueError(
+            f"Invalid {label}: '{value}'. "
+            f"Must be a plain name without path separators."
+        )


_validate_name() still allows values like '' / '.' / '..' because os.path.basename(value) == value for those inputs. That means workflow_name='..' (and potentially file_name='..') can traverse out of the intended workflows/prompts directory in the fallback os.path.join path, undermining the path-traversal protection. Explicitly reject empty, '.' and '..' (and consider also rejecting any value containing os.curdir/os.pardir path components) before proceeding.

Enhance _validate_name to treat empty strings and '.'/'..' (as well as os.curdir/os.pardir) as invalid names in addition to existing basename/isabs/drive checks. Add unit tests verifying that empty, '.' and '..' values are rejected for both file_name and workflow_name to prevent path traversal and ambiguous names.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T18:43:45Z

+    """
+    if (not value
+            or value in (os.curdir, os.pardir, '.', '..')
+            or os.path.basename(value) != value


_validate_name() relies on os.path.basename(...) to detect path separators, but on POSIX a Windows-style separator ("\") is not treated as a separator. That means values like "subdir\step1_system.txt" pass validation despite the docstring saying separators are disallowed. Consider explicitly rejecting both '/' and '\' (e.g., also compare against ntpath.basename(value) / check for '\' in value) and add a test for this case to keep path-traversal protection consistent across platforms.

Suggested change

or os.path.basename(value) != value

or os.path.basename(value) != value

or ntpath.basename(value) != value

Tighten name validation to reject Windows-style backslash separators on all platforms by checking ntpath.basename in addition to os.path.basename and existing absolute/drive checks. Refactors the predicate into is_empty_or_special, has_separators, and is_absolute_or_drive for clarity. Adds two tests to ensure backslash separators are rejected for both prompt file_name and workflow_name inputs.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T18:50:11Z

+    def __init__(self):
+        """Initialize the workflow with default state."""
+        self._prompts_dir: Optional[str] = None
+
+    def get_prompt(self, file_name: str, **kwargs) -> str:
+        """
+        Load a prompt template for this workflow, with override support.
+
+        This is a convenience wrapper around core.prompts.get_prompt() that
+        automatically provides the workflow name and prompts_dir from config.
+
+        Args:
+            file_name: Name of the prompt file (e.g., 'step1_system.txt')
+            **kwargs: Format placeholders for the prompt template
+
+        Returns:
+            str: The formatted prompt content
+        """
+        return _get_prompt(self.get_workflow_name(), file_name, prompts_dir=self._prompts_dir, **kwargs)


AbstractWorkflow.get_prompt() relies on instance attribute _prompts_dir. If a subclass defines __init__ and forgets to call super().__init__(), calling self.get_prompt() will raise AttributeError. To keep the new helper backwards-compatible for downstream subclasses, consider making _prompts_dir a class attribute defaulting to None and/or using getattr(self, '_prompts_dir', None) when passing prompts_dir.

Copilot · 2026-03-26T18:50:11Z

+                                        source_language=context.source_language,
+                                        target_language=context.target_language)
+
        # Your complete workflow logic here


In the README’s advanced workflow example, execute() returns translated_text, but that variable is never defined in the snippet. This makes the example non-runnable and confusing; consider returning the result of an actual translation call (or using a clearly marked placeholder like return result) and, if relevant, show how writer_model/context.model_client.call_model(...) is used.

Suggested change

# Your complete workflow logic here

# Your complete workflow logic here — for example, call the writer model:

translated_text = context.model_client.call_model(

model=writer_model,

system_prompt=system_prompt,

input=text,

)

Move _prompts_dir to a class-level attribute and simplify __init__ initialization. Use getattr(self, '_prompts_dir', None) when calling _get_prompt to avoid attribute errors if subclasses set the prompts dir. Also update the README workflow example to call context.model_client.call_model and return the model result instead of a placeholder translated_text.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-26T19:05:48Z

+            model=writer_model,
+            system=system_prompt,
+            prompt=text


The README example calls context.model_client.call_model() using keyword arguments (model=, system=, prompt=), but LLMModelClient.call_model is defined as call_model(system_prompt, user_prompt, model_name, ...) (positional args or matching keyword names). As written, the example will raise TypeError and misleads users; update the snippet to use the actual parameter names/order (and include log_call=context.log_calls if desired).

Suggested change

model=writer_model,

system=system_prompt,

prompt=text

system_prompt=system_prompt,

user_prompt=text,

model_name=writer_model,

log_call=context.log_calls,

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add prompt override support and workflow helper

DavidLMS requested a review from Copilot March 24, 2026 19:57

Copilot started reviewing on behalf of DavidLMS March 24, 2026 19:58 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

DavidLMS requested a review from Copilot March 24, 2026 20:13

Copilot started reviewing on behalf of DavidLMS March 24, 2026 20:14 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

DavidLMS and others added 2 commits March 24, 2026 21:20

Update tests/test_core_prompts.py

b9f675e

Co-authored-by: Copilot <[email protected]>

DavidLMS requested a review from Copilot March 24, 2026 20:25

Copilot started reviewing on behalf of DavidLMS March 24, 2026 20:25 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

DavidLMS requested a review from Copilot March 24, 2026 20:33

Copilot started reviewing on behalf of DavidLMS March 24, 2026 20:34 View session

Copilot AI reviewed Mar 24, 2026

View reviewed changes

DavidLMS requested a review from Copilot March 26, 2026 18:25

Copilot started reviewing on behalf of DavidLMS March 26, 2026 18:25 View session