Skip to content

feat(logging): Bug 2039779 Copy chain_of_trust.log to live_backing.log on CoT verification failure#796

Open
hneiva wants to merge 1 commit into
mainfrom
hneiva/cot-logs
Open

feat(logging): Bug 2039779 Copy chain_of_trust.log to live_backing.log on CoT verification failure#796
hneiva wants to merge 1 commit into
mainfrom
hneiva/cot-logs

Conversation

@hneiva
Copy link
Copy Markdown
Contributor

@hneiva hneiva commented May 15, 2026

When verify_chain_of_trust fails, run_task never runs and live_backing.log is never created. Treeherder parses live_backing.log as the task's failure log, so users could not see why CoT verification failed.

Copy chain_of_trust.log to live_backing.log on verification failure so the verification output is visible from TH.

@hneiva hneiva requested a review from a team as a code owner May 15, 2026 04:16
@hneiva hneiva force-pushed the hneiva/cot-logs branch from 9dc8233 to ca8b857 Compare May 15, 2026 04:16
Comment thread src/scriptworker/worker.py Outdated
# Surface CoT verification output in live_backing.log so Taskcluster/Treeherder show it
cot_log = get_chain_of_trust_log_filename(context)
if os.path.exists(cot_log):
shutil.copyfile(cot_log, get_log_filename(context))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of copying the file we can create a link or redirect artifact?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does redirecting the artifact mean? Just moving instead of copying? Do we know for a fact that there's nothing externally pointing to chain_of_trust.log?

What is the benefit of either vs just copying the file?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.taskcluster.net/docs/reference/platform/queue/api#createArtifact describes link and redirect artifacts. The benefit is to make it obvious that they're the same artifact, just like on generic-worker live.log redirects to live_backing.log.

await run_cancellable(verify_chain_of_trust(chain))
try:
await run_cancellable(verify_chain_of_trust(chain))
except Exception:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just catch CoTError

Copy link
Copy Markdown
Contributor Author

@hneiva hneiva May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make it skip unforeseeable exceptions, which we don't want.

try:
# build LinkOfTrust objects
if check_task:
await add_link(chain, chain.name, chain.task_id)
await build_task_dependencies(chain, chain.task, chain.name, chain.task_id)
# download the signed chain of trust artifacts
await download_cot(chain)
# verify the signatures and populate the ``link.cot``s
verify_cot_signatures(chain)
# download all other artifacts needed to verify chain of trust
await download_cot_artifacts(chain)
# verify the task types, e.g. decision
await verify_task_types(chain)
# verify the worker_impls, e.g. docker-worker
await verify_worker_impls(chain)
await trace_back_to_tree(chain)
except (BaseDownloadError, KeyError, TypeError, AttributeError) as exc:
log.critical("Chain of Trust verification error!", exc_info=True)
if isinstance(exc, CoTError):
raise
else:
raise CoTError(str(exc))

This block in verify_chain_of_trust() only captures a set of expected exceptions, but doesn't capture json, OS, Value or aiohttp errors.

@hneiva hneiva force-pushed the hneiva/cot-logs branch from ca8b857 to e28d3b2 Compare May 19, 2026 22:23
@hneiva hneiva requested review from a team and jcristau May 19, 2026 22:23
@hneiva hneiva force-pushed the hneiva/cot-logs branch from e28d3b2 to 2532895 Compare May 20, 2026 19:39
…g on CoT verification failure

When verify_chain_of_trust fails, run_task never runs and live_backing.log
is never created. Taskcluster/Treeherder surface live_backing.log as the
task's failure log, so users could not see why CoT verification failed.

Link chain_of_trust.log to live_backing.log on verification failure so the
verification output is visible to users.
@hneiva hneiva force-pushed the hneiva/cot-logs branch from 2532895 to 3768e77 Compare May 20, 2026 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants