Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@
]
},
"openhands/usage/agent-canvas/customize-and-settings",
"openhands/usage/agent-canvas/critic",
"openhands/usage/agent-canvas/acp-agents",
"openhands/usage/agent-canvas/development",
"openhands/usage/agent-canvas/troubleshooting"
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
133 changes: 133 additions & 0 deletions openhands/usage/agent-canvas/critic.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
title: Critic
description: Configure critic evaluation and iterative refinement in Agent Canvas.
---

<Warning>
The critic feature is experimental. Configuration options, scoring behavior,
and default models may change as the feature evolves.
</Warning>

The critic is an additional evaluation pass that reviews the agent's work and
predicts how likely the task is to succeed. In Agent Canvas, critic results can
appear in the conversation timeline as a success-likelihood score with detected
issue labels.

Use the critic when you want extra feedback for an OpenHands agent
conversation, or when you want the agent to automatically refine its work after

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider "Use the OpenHands LLM provider" instead of "Prefer" to better convey automatic critic handling.

a low critic score.

<Note>
The default OpenHands-hosted critic is currently free to use. For background
on the critic model and evaluation methodology, read
[SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model)
and
[A Rubric-Supervised Critic from Sparse Real-World Outcomes](https://arxiv.org/abs/2603.03800).
The critic applies to OpenHands agent conversations. Agent Canvas can also
run third-party ACP agents, but those agents manage their own execution loop
and may not expose the same critic evaluation path.
</Note>

## Prerequisites

Before enabling the critic:

1. Configure your active LLM in `Settings > LLM`.
2. Prefer the `OpenHands` LLM provider when you want the default hosted critic
path.
3. Start a new conversation after saving critic settings. Existing
conversations keep the settings they were created with.

## Enable the Critic

1. Open `Settings > Verification`.
2. Toggle on `Enable Critic`.
3. Configure the `Critic API Key` field:
- If `OpenHands` is selected as your active LLM provider, leave this field
empty. The critic reuses the active provider's OpenHands Provider LLM Key.
- If you are not using the `OpenHands` LLM provider, paste an OpenHands
Provider LLM Key into `Critic API Key`, or provide the API key required by
your custom critic service.
4. Save the settings.
5. Start a new conversation.

![Agent Canvas Verification settings with Enable Critic, iterative refinement, critic threshold, and Critic API Key guidance](/openhands/static/img/agent-canvas-critic-settings.png)

The Critic API Key and the OpenHands Provider LLM Key are the same credential
when you use the default OpenHands-hosted critic service. You can find that key
in the `API Keys` tab of [OpenHands Cloud](https://app.all-hands.dev/settings/api-keys).

The default hosted critic is free today; the key authenticates access to the
service.

<Note>
A dedicated `Critic API Key` overrides the active LLM key for critic calls
only. Your main LLM configuration continues to use the key from
`Settings > LLM`.
</Note>

## Enable Iterative Refinement

Iterative refinement lets the critic send the agent back to improve its work
when the predicted success score is too low.

1. Open `Settings > Verification`.
2. Toggle on `Enable Critic`.
3. Toggle on `Enable Iterative Refinement`.
4. Optionally switch the settings detail view to `Advanced` or `All`.
5. Adjust:
- `Critic Threshold` - the success score required to stop refining. The
default is `0.6`.
- `Max Refinement Iterations` - the maximum number of retry attempts. The
default is `3`.
6. Save the settings and start a new conversation.

When refinement is enabled, Agent Canvas will let the conversation continue
after a low critic score until the score passes the threshold or the maximum
iteration count is reached.

## View Critic Results

When the critic runs, Agent Canvas shows the result below the agent message or
finish action that was evaluated. The compact view shows the predicted success
likelihood score. You can expand the result to inspect detected issue
categories and probabilities, such as incomplete changes, missing validation,
infrastructure issues, or likely user follow-up patterns.

![Agent Canvas conversation showing a critic success likelihood score below an agent message](/openhands/static/img/agent-canvas-critic-result.png)

## Troubleshooting

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rephrasing to "Use an OpenHands agent conversation" or "Use the OpenHands agent type" instead of "agent path".

### Critic Results Do Not Appear

- Confirm `Enable Critic` is on in `Settings > Verification`.
- Start a new conversation after saving the setting.
- Use the OpenHands agent path. Third-party ACP agents may not expose critic
results.
- Wait until the agent sends a message or finishes a task. With the default
`finish_and_message` mode, the critic does not run after every tool call.

### Authentication Errors

If the critic request fails with an API key or authentication error:

- If `OpenHands` is the active LLM provider, leave `Critic API Key` empty and
confirm the active LLM profile has a saved OpenHands Provider LLM Key.
- If another LLM provider is active, enter an OpenHands Provider LLM Key in
`Critic API Key`.

### Conversations Become Slow

- Keep `Critic Mode` set to `finish_and_message` unless you need per-action
feedback.
- Disable `Enable Iterative Refinement` if you only want passive critic scores.
- Lower `Max Refinement Iterations` if repeated refinement loops are too costly.

## Related Guides

- [Customize and Settings](/openhands/usage/agent-canvas/customize-and-settings)
- [LLM Profiles and Model Configuration](/openhands/usage/agent-canvas/llm-profiles)
- [OpenHands LLMs](/openhands/usage/llms/openhands-llms)
- [SDK Critic Guide](/sdk/guides/critic)
- [Critic Model Blog Post](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model)
- [Critic Research Paper](https://arxiv.org/abs/2603.03800)
3 changes: 2 additions & 1 deletion openhands/usage/agent-canvas/customize-and-settings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The `Settings` area currently includes the following sections:
| `Agent` | Agent behavior and agent-specific capabilities |
| `LLM` | Provider, model, API key, and profile configuration |
| `Condenser` | Context compression and summarization behavior |
| `Verification` | Approval and verification-related behavior |
| `Verification` | Approval, critic evaluation, and verification-related behavior |
| `Application` | UI-level preferences and app behavior |
| `Secrets` | Stored secrets used by the active backend |

Expand All @@ -56,4 +56,5 @@ You might use this split like this:
## Related Guides

- [Connect and Manage Backends](/openhands/usage/agent-canvas/backends)
- [Configure the Critic](/openhands/usage/agent-canvas/critic)
- [Setup a Pre-built Automation](/openhands/usage/agent-canvas/prebuilt-automations)
Loading