Add prompt override support and workflow helper#5
Conversation
Introduce prompt override mechanism via a prompts_dir and a workflow-level helper.
- Implement robust prompt loading in aphra/core/prompts.py: supports full overrides ({prompts_dir}/{file}), prepend ({base}_prepend.txt) and append ({base}_append.txt), with formatting, fallbacks, and logging.
- Add AbstractWorkflow.get_prompt wrapper and store prompts_dir from workflow config so workflows can call self.get_prompt(...) to automatically respect user overrides.
- Update ShortArticleWorkflow to use self.get_prompt instead of importing get_prompt directly.
- Update docs (CONTRIBUTING.md, README.md) and config.example.toml to document prompts_dir usage and encourage implementing execute() and using self.get_prompt().
- Add comprehensive tests (tests/test_core_prompts.py) for override, prepend/append behavior, placeholder formatting, and fallback cases.
These changes allow users to customize or extend workflow prompts without modifying package code and make prompt usage simpler and backwards-compatible.
There was a problem hiding this comment.
Pull request overview
Adds a configurable prompt override mechanism (prompts_dir) to the core prompt loader and exposes it ergonomically at the workflow level via AbstractWorkflow.get_prompt(), so workflows automatically honor user-provided prompt replacements/extensions without editing package code.
Changes:
- Implement prompt override resolution in
aphra/core/prompts.py(full override + prepend/append extensions) with formatting and debug logging. - Add
AbstractWorkflow.get_prompt()and wireprompts_dirfrom workflow config duringrun(), then updateShortArticleWorkflowto useself.get_prompt(...). - Document
prompts_dirusage in README/CONTRIBUTING/config example and add unit tests covering override behaviors.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
aphra/core/prompts.py |
Adds prompt override/extend resolution and formatting helpers. |
aphra/core/workflow.py |
Adds self.get_prompt() wrapper and stores prompts_dir from config during run(). |
aphra/workflows/short_article/short_article_workflow.py |
Switches prompt loading to self.get_prompt() so overrides apply automatically. |
tests/test_core_prompts.py |
Adds tests for full override and prepend/append behavior via prompts_dir. |
config.example.toml |
Documents prompts_dir and expected override file naming. |
README.md |
Documents prompt override mechanism and recommends execute() + self.get_prompt(). |
CONTRIBUTING.md |
Updates workflow development guidance to use execute() and self.get_prompt(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _read_file_if_exists(file_path: str) -> str: | ||
| """ | ||
| Read a file and return its content, or return None if it doesn't exist. | ||
|
|
||
| Args: | ||
| file_path: Absolute or relative path to the file | ||
|
|
||
| Returns: | ||
| str or None: File content, or None if the file doesn't exist | ||
| """ | ||
| if os.path.isfile(file_path): | ||
| with open(file_path, 'r', encoding="utf-8") as file: | ||
| return file.read() | ||
| return None |
There was a problem hiding this comment.
_read_file_if_exists is annotated as returning str but it returns None when the file is absent. This makes the type contract incorrect and can break static typing; please change the return type to Optional[str] (and consider updating get_prompt(..., prompts_dir=...) to accept Optional[str] as well).
| _prompts_dir = None | ||
|
|
||
| def get_prompt(self, file_name: str, **kwargs) -> str: | ||
| """ | ||
| Load a prompt template for this workflow, with override support. | ||
|
|
||
| This is a convenience wrapper around core.prompts.get_prompt() that | ||
| automatically provides the workflow name and prompts_dir from config. | ||
|
|
||
| Args: | ||
| file_name: Name of the prompt file (e.g., 'step1_system.txt') | ||
| **kwargs: Format placeholders for the prompt template | ||
|
|
||
| Returns: | ||
| str: The formatted prompt content | ||
| """ | ||
| prompts_dir = getattr(self, '_prompts_dir', None) | ||
| return _get_prompt(self.get_workflow_name(), file_name, prompts_dir=prompts_dir, **kwargs) |
There was a problem hiding this comment.
Defining _prompts_dir as a class attribute makes it look like shared mutable state across workflow instances, even though run() later sets an instance attribute. Consider initializing it as an instance attribute (e.g., in __init__) and/or adding an explicit Optional[str] type annotation to clarify per-instance state.
| # Load workflow configuration and set it in context | ||
| workflow_config = self.load_config() | ||
| context.set_workflow_config(workflow_config) | ||
|
|
||
| # Store prompts_dir for use by self.get_prompt() | ||
| self._prompts_dir = workflow_config.get('prompts_dir') | ||
|
|
There was a problem hiding this comment.
New behavior is introduced here by reading prompts_dir from config and wiring it into self.get_prompt(), but there’s no unit test that exercises AbstractWorkflow.run() + self.get_prompt() together. Adding a small dummy workflow test would prevent regressions (e.g., ensuring prompts_dir is honored when calling self.get_prompt() inside execute()).
| def tearDown(self): | ||
| """Clean up temporary files.""" | ||
| for f in os.listdir(self.tmpdir): | ||
| os.remove(os.path.join(self.tmpdir, f)) | ||
| os.rmdir(self.tmpdir) |
There was a problem hiding this comment.
The cleanup logic only removes files directly under self.tmpdir and then calls os.rmdir. If a future test creates subdirectories (or leaves behind nested paths), tearDown will fail and leak temp dirs. Consider using tempfile.TemporaryDirectory() (store the object and call .cleanup()) or shutil.rmtree(self.tmpdir) for more robust cleanup.
| append_path = os.path.join(self.tmpdir, 'step1_system_append.txt') | ||
| with open(append_path, 'w', encoding='utf-8') as f: | ||
| f.write("SHOULD NOT APPEAR") | ||
|
|
There was a problem hiding this comment.
test_override_takes_priority_over_append says it verifies override ignores both append and prepend files, but the test only creates an _append file. Either add a _prepend file as well or adjust the docstring/comment so it matches what’s actually being tested.
| prepend_path = os.path.join(self.tmpdir, 'step1_system_prepend.txt') | |
| with open(prepend_path, 'w', encoding='utf-8') as f: | |
| f.write("SHOULD NOT APPEAR") |
Add Optional typing annotations for prompt helpers and prompts_dir, initialize _prompts_dir on AbstractWorkflow, and simplify get_prompt to use the instance attribute. Update _read_file_if_exists to return Optional[str] and adjust get_prompt signature accordingly. Tests: switch to TemporaryDirectory for cleanup, add imports/mocks, refine existing prompt override tests, and add integration tests that verify AbstractWorkflow.run wires prompts_dir from load_workflow_config into self.get_prompt.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| return formatted_prompt | ||
|
|
||
| def list_workflow_prompts(workflow_name: str) -> list[str]: |
There was a problem hiding this comment.
The return type annotation -> list[str] is not valid on Python 3.8 without from __future__ import annotations (it raises TypeError: 'type' object is not subscriptable at import time). Since the project declares python_requires >= 3.8 in packaging config, this needs to be made 3.8-compatible (e.g., use typing.List[str] / List[str], add postponed evaluation of annotations, or bump the minimum supported Python version).
| base_name = os.path.splitext(file_name)[0] | ||
|
|
||
| if prompts_dir: | ||
| # Check for full override | ||
| override_path = os.path.join(prompts_dir, file_name) | ||
| override_content = _read_file_if_exists(override_path) | ||
|
|
There was a problem hiding this comment.
prompts_dir is joined with file_name directly, which allows path traversal if a caller passes something like ../... in file_name, potentially reading arbitrary local files. Even if current workflows pass constants, get_prompt() is a public utility, so it should defensively reject path separators/absolute paths and normalize to a basename before joining.
| try: | ||
| # Try using importlib.resources first (works in packaged installations) | ||
| ref = resources.files('aphra.workflows') / workflow_name / 'prompts' / file_name | ||
| with ref.open('r', encoding="utf-8") as file: | ||
| content = file.read() | ||
| return file.read() | ||
| except (AttributeError, FileNotFoundError) as exc: | ||
| # Fallback to direct file access (works in development) | ||
| workflows_path = os.path.dirname(os.path.dirname(__file__)) # Go up to aphra/ | ||
| workflows_path = os.path.dirname(os.path.dirname(__file__)) | ||
| file_path = os.path.join(workflows_path, 'workflows', workflow_name, 'prompts', file_name) | ||
|
|
||
| if not os.path.exists(file_path): | ||
| raise FileNotFoundError(f"Prompt file not found: {file_path}") from exc | ||
|
|
||
| with open(file_path, 'r', encoding="utf-8") as file: | ||
| content = file.read() | ||
| return file.read() |
There was a problem hiding this comment.
Default prompt loading relies on prompt .txt files being present in the installed package (via importlib.resources or direct filesystem access). In this repo, the packaging metadata appears to include only aphra/prompts/*.txt, but the actual prompts live under aphra/workflows/*/prompts/ (and aphra/prompts/ doesn’t exist). This can make packaged installs fail to find default prompts; please ensure the workflow prompt files are included as package data in the distribution metadata.
Co-authored-by: Copilot <[email protected]>
Add validation in get_prompt to reject filenames containing path separators or absolute paths and raise ValueError to prevent path traversal. Import typing.List and update list_workflow_prompts to use List[str]. Update packaging to include workflow-level resources: add aphra/workflows/*/prompts/*.txt and aphra/workflows/*/config/*.toml to pyproject.toml and setup.py package_data so workflow prompts and config files are packaged.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ValueError: If file_name contains path separators or is an absolute path | ||
| """ | ||
| # Reject path traversal attempts | ||
| if os.path.basename(file_name) != file_name or os.path.isabs(file_name): |
There was a problem hiding this comment.
The file name validation can be bypassed on Windows with drive-relative paths like C:foo.txt. os.path.isabs('C:foo.txt') is false and basename is unchanged, but os.path.join(prompts_dir, file_name) will ignore prompts_dir when file_name includes a drive, potentially allowing reads outside the intended override directory. Consider additionally rejecting any drive component via os.path.splitdrive(file_name)[0] (or pathlib.PurePath(file_name).drive/root) and/or disallowing ':' in filenames.
| if os.path.basename(file_name) != file_name or os.path.isabs(file_name): | |
| drive, _ = os.path.splitdrive(file_name) | |
| if os.path.basename(file_name) != file_name or os.path.isabs(file_name) or drive: |
| ValueError: If file_name contains path separators or is an absolute path | ||
| """ | ||
| # Reject path traversal attempts | ||
| if os.path.basename(file_name) != file_name or os.path.isabs(file_name): | ||
| raise ValueError(f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators.") |
There was a problem hiding this comment.
New behavior raises ValueError for invalid prompt filenames, but there are no unit tests covering the validation branch (e.g., ../x, a/b, or Windows drive-relative inputs like C:foo.txt). Adding tests for these cases would prevent regressions and verify the path-traversal protections.
| ValueError: If file_name contains path separators or is an absolute path | |
| """ | |
| # Reject path traversal attempts | |
| if os.path.basename(file_name) != file_name or os.path.isabs(file_name): | |
| raise ValueError(f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators.") | |
| ValueError: If file_name contains path separators, is an absolute path, or has a drive component | |
| """ | |
| # Reject path traversal attempts and non-plain filenames (including Windows drive-relative paths) | |
| drive, _ = os.path.splitdrive(file_name) | |
| if os.path.basename(file_name) != file_name or os.path.isabs(file_name) or drive: | |
| raise ValueError( | |
| f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators or drive prefixes." | |
| ) |
Add ntpath-based drive detection and strengthen filename validation in get_prompt to reject path traversal attempts, absolute paths, subdirectory names, and Windows drive-relative paths (e.g. "C:foo.txt"). Update raises ValueError for invalid filenames. Add TestPathTraversalValidation unit tests covering relative traversal, subdirectory, absolute, Windows drive-relative cases, and a plain filename acceptance check.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def test_rejects_relative_path_traversal(self): | ||
| """ | ||
| Test that '../' path traversal attempts are rejected. | ||
| """ | ||
| with self.assertRaises(ValueError): | ||
| get_prompt('short_article', '../etc/passwd') | ||
|
|
||
| def test_rejects_subdirectory_path(self): | ||
| """ | ||
| Test that file names containing subdirectory separators are rejected. | ||
| """ | ||
| with self.assertRaises(ValueError): | ||
| get_prompt('short_article', 'subdir/step1_system.txt') | ||
|
|
There was a problem hiding this comment.
Path traversal tests cover malicious file_name values, but there is no corresponding test for malicious workflow_name values (e.g., ../core), even though get_prompt() and list_workflow_prompts() both incorporate workflow_name into filesystem/resource paths. If you add workflow name validation, add tests here to lock in that behavior.
| # Reject path traversal attempts (including Windows drive-relative paths like "C:foo.txt") | ||
| if (os.path.basename(file_name) != file_name | ||
| or os.path.isabs(file_name) | ||
| or ntpath.splitdrive(file_name)[0]): | ||
| raise ValueError(f"Invalid prompt file name: '{file_name}'. Must be a plain filename without path separators.") | ||
|
|
There was a problem hiding this comment.
get_prompt() validates file_name against path traversal, but workflow_name is still used as a path component in _load_default_prompt() without validation. Since workflow_name is part of the public get_prompt(workflow_name, ...) API, callers can pass values like ../... to traverse outside aphra/workflows/<workflow>/prompts in the filesystem fallback (and potentially via importlib.resources), enabling unintended file reads. Consider validating workflow_name to be a plain directory/package name (no separators, no .., no drive/absolute path) before constructing paths.
| def list_workflow_prompts(workflow_name: str) -> List[str]: | ||
| """ | ||
| List all available prompt files for a workflow. | ||
|
|
There was a problem hiding this comment.
list_workflow_prompts() includes workflow_name as a path segment (via importlib.resources and the filesystem fallback). Similar to get_prompt(), this should validate workflow_name to prevent path traversal (e.g., ../...) from escaping the intended aphra/workflows/<workflow>/prompts directory when callers pass untrusted values.
| def list_workflow_prompts(workflow_name: str) -> List[str]: | ||
| """ | ||
| List all available prompt files for a workflow. | ||
|
|
There was a problem hiding this comment.
list_workflow_prompts() is documented/used as a way to list prompt templates, but its implementation returns all files in the prompts directory. To avoid surfacing non-prompt artifacts (and to match the expectation that prompts are .txt templates), filter the returned filenames to .txt only in both branches.
Introduce _validate_name to reject path separators, absolute paths, and Windows drive-relative names and use it to validate workflow_name and file_name in get_prompt and list_workflow_prompts. Also restrict listing to .txt prompt files (both via importlib.resources and filesystem fallback). Update tests to cover workflow_name validation and filename traversal cases.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (os.path.basename(value) != value | ||
| or os.path.isabs(value) | ||
| or ntpath.splitdrive(value)[0]): | ||
| raise ValueError( | ||
| f"Invalid {label}: '{value}'. " | ||
| f"Must be a plain name without path separators." | ||
| ) |
There was a problem hiding this comment.
_validate_name() still allows values like '' / '.' / '..' because os.path.basename(value) == value for those inputs. That means workflow_name='..' (and potentially file_name='..') can traverse out of the intended workflows/prompts directory in the fallback os.path.join path, undermining the path-traversal protection. Explicitly reject empty, '.' and '..' (and consider also rejecting any value containing os.curdir/os.pardir path components) before proceeding.
Enhance _validate_name to treat empty strings and '.'/'..' (as well as os.curdir/os.pardir) as invalid names in addition to existing basename/isabs/drive checks. Add unit tests verifying that empty, '.' and '..' values are rejected for both file_name and workflow_name to prevent path traversal and ambiguous names.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| if (not value | ||
| or value in (os.curdir, os.pardir, '.', '..') | ||
| or os.path.basename(value) != value |
There was a problem hiding this comment.
_validate_name() relies on os.path.basename(...) to detect path separators, but on POSIX a Windows-style separator ("\") is not treated as a separator. That means values like "subdir\step1_system.txt" pass validation despite the docstring saying separators are disallowed. Consider explicitly rejecting both '/' and '\' (e.g., also compare against ntpath.basename(value) / check for '\' in value) and add a test for this case to keep path-traversal protection consistent across platforms.
| or os.path.basename(value) != value | |
| or os.path.basename(value) != value | |
| or ntpath.basename(value) != value |
Tighten name validation to reject Windows-style backslash separators on all platforms by checking ntpath.basename in addition to os.path.basename and existing absolute/drive checks. Refactors the predicate into is_empty_or_special, has_separators, and is_absolute_or_drive for clarity. Adds two tests to ensure backslash separators are rejected for both prompt file_name and workflow_name inputs.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__(self): | ||
| """Initialize the workflow with default state.""" | ||
| self._prompts_dir: Optional[str] = None | ||
|
|
||
| def get_prompt(self, file_name: str, **kwargs) -> str: | ||
| """ | ||
| Load a prompt template for this workflow, with override support. | ||
|
|
||
| This is a convenience wrapper around core.prompts.get_prompt() that | ||
| automatically provides the workflow name and prompts_dir from config. | ||
|
|
||
| Args: | ||
| file_name: Name of the prompt file (e.g., 'step1_system.txt') | ||
| **kwargs: Format placeholders for the prompt template | ||
|
|
||
| Returns: | ||
| str: The formatted prompt content | ||
| """ | ||
| return _get_prompt(self.get_workflow_name(), file_name, prompts_dir=self._prompts_dir, **kwargs) |
There was a problem hiding this comment.
AbstractWorkflow.get_prompt() relies on instance attribute _prompts_dir. If a subclass defines __init__ and forgets to call super().__init__(), calling self.get_prompt() will raise AttributeError. To keep the new helper backwards-compatible for downstream subclasses, consider making _prompts_dir a class attribute defaulting to None and/or using getattr(self, '_prompts_dir', None) when passing prompts_dir.
| source_language=context.source_language, | ||
| target_language=context.target_language) | ||
|
|
||
| # Your complete workflow logic here |
There was a problem hiding this comment.
In the README’s advanced workflow example, execute() returns translated_text, but that variable is never defined in the snippet. This makes the example non-runnable and confusing; consider returning the result of an actual translation call (or using a clearly marked placeholder like return result) and, if relevant, show how writer_model/context.model_client.call_model(...) is used.
| # Your complete workflow logic here | |
| # Your complete workflow logic here — for example, call the writer model: | |
| translated_text = context.model_client.call_model( | |
| model=writer_model, | |
| system_prompt=system_prompt, | |
| input=text, | |
| ) |
Move _prompts_dir to a class-level attribute and simplify __init__ initialization. Use getattr(self, '_prompts_dir', None) when calling _get_prompt to avoid attribute errors if subclasses set the prompts dir. Also update the README workflow example to call context.model_client.call_model and return the model result instead of a placeholder translated_text.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| model=writer_model, | ||
| system=system_prompt, | ||
| prompt=text |
There was a problem hiding this comment.
The README example calls context.model_client.call_model() using keyword arguments (model=, system=, prompt=), but LLMModelClient.call_model is defined as call_model(system_prompt, user_prompt, model_name, ...) (positional args or matching keyword names). As written, the example will raise TypeError and misleads users; update the snippet to use the actual parameter names/order (and include log_call=context.log_calls if desired).
| model=writer_model, | |
| system=system_prompt, | |
| prompt=text | |
| system_prompt=system_prompt, | |
| user_prompt=text, | |
| model_name=writer_model, | |
| log_call=context.log_calls, |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add prompt override support and workflow helper
Introduce prompt override mechanism via a prompts_dir and a workflow-level helper.
These changes allow users to customize or extend workflow prompts without modifying package code and make prompt usage simpler and backwards-compatible.