Skip to content

o/snapstate: refactor prerequisites task handler to enable proper seed-refresh integration#17152

Open
andrewphelpsj wants to merge 8 commits into
canonical:masterfrom
andrewphelpsj:prereq-refactor
Open

o/snapstate: refactor prerequisites task handler to enable proper seed-refresh integration#17152
andrewphelpsj wants to merge 8 commits into
canonical:masterfrom
andrewphelpsj:prereq-refactor

Conversation

@andrewphelpsj

@andrewphelpsj andrewphelpsj commented Jun 2, 2026

Copy link
Copy Markdown
Member

This is an attempt at simplifying the current prerequisites task handler.

This is somewhat of a full rewrite, so the diff might not be super helpful sadly. A main motivation was to simplify and combine some of the retry logic, since that was spread across a lot of the previous implementation.

No functional changes, though I've added a couple tests that caught regressions I introduced while working on the refactor.

I've also gotten rid of the WaitAll usage that was used when organizing the task sets created by prerequisites. Now the graph is a bit more readable. I'll attach some examples.

@andrewphelpsj

andrewphelpsj commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Here are some example diagrams that show the graphs for installing two snaps that both pull in the same base as a prerequisite.

This is the graph after the changes in this branch:
install_1-Done_-_snapmgrTestSuite_TestInstallManyWithPrereqsTransactionally-2117997213

This is the comically large graph from before the changes in this branch:
install_1-Done_-_snapmgrTestSuite_TestInstallManyWithPrereqsTransactionally-2885815835

@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.79167% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.12%. Comparing base (fcee949) to head (a38963c).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
overlord/snapstate/handlers_prereq.go 94.38% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17152      +/-   ##
==========================================
- Coverage   79.15%   79.12%   -0.04%     
==========================================
  Files        1376     1383       +7     
  Lines      192837   193274     +437     
  Branches     2466     2466              
==========================================
+ Hits       152640   152922     +282     
- Misses      31024    31171     +147     
- Partials     9173     9181       +8     
Flag Coverage Δ
unittests 79.12% <94.79%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Wed Jun 10 21:36:44 UTC 2026
The following results are from: https://github.com/canonical/snapd/actions/runs/27302061627

Failures:

Preparing:

  • openstack:debian-12-64:tests/main/snap-ns-forward-compat
  • openstack:debian-12-64:tests/main/interfaces-fuse-support:parallel
  • openstack:ubuntu-core-18-64:
  • openstack:ubuntu-core-18-64:tests/main/
  • openstack:ubuntu-core-18-64:tests/main/
  • openstack:ubuntu-core-18-64:tests/core/
  • openstack:ubuntu-core-18-64:tests/main/
  • openstack:ubuntu-core-18-64:tests/core/
  • openstack:ubuntu-core-18-64:tests/main/
  • openstack:ubuntu-core-18-64:tests/regression/
  • openstack:ubuntu-core-18-64:tests/smoke/
  • openstack:ubuntu-core-18-64:tests/regression/
  • openstack:ubuntu-core-18-64:tests/regression/
  • openstack:ubuntu-core-20-64:
  • openstack:ubuntu-core-24-64:tests/main/graphical-user-daemons
  • openstack:ubuntu-24.04-64:

Executing:

  • openstack:debian-12-64:tests/main/cgroup-devices-v2
  • openstack:debian-12-64:tests/main/snapd-sigterm
  • openstack:ubuntu-core-18-64:tests/core/services
  • openstack:ubuntu-core-24-64:tests/main/snap-user-service-upgrade-failure
  • openstack:ubuntu-core-24-64:tests/main/dbus-activation-session
  • openstack:ubuntu-26.04-64:tests/unit/go:gcc
  • openstack:ubuntu-20.04-64:tests/completion/snippets:plain
  • openstack:ubuntu-20.04-64:tests/main/snap-download-corrupted-cleanup:bitflip_cache
  • openstack:ubuntu-24.04-64:tests/unit/go:clang

Restoring:

  • openstack:debian-12-64:tests/main/snap-ns-forward-compat
  • openstack:debian-12-64:tests/main/
  • openstack:debian-12-64:
  • openstack:debian-12-64:tests/main/interfaces-fuse-support:parallel
  • openstack:debian-12-64:tests/main/
  • openstack:debian-12-64:
  • openstack:debian-12-64:tests/main/snapd-sigterm
  • openstack:debian-12-64:tests/main/
  • openstack:debian-12-64:
  • openstack:ubuntu-core-18-64:
  • openstack:ubuntu-core-20-64:
  • openstack:ubuntu-26.04-64:tests/main/lxd:snapd_cgroup_just_outside
  • openstack:ubuntu-26.04-64:tests/main/
  • openstack:ubuntu-26.04-64:
  • openstack:ubuntu-20.04-64:tests/completion/snippets:plain
  • openstack:ubuntu-20.04-64:tests/completion/
  • openstack:ubuntu-20.04-64:
  • openstack:ubuntu-20.04-64:tests/main/snap-download-corrupted-cleanup:bitflip_cache
  • openstack:ubuntu-20.04-64:tests/main/
  • openstack:ubuntu-20.04-64:
  • openstack:ubuntu-24.04-64:

Skipped tests from snapd-testing-skip

If you wish to have any of the below tests run in your PR, in your PR description, add 'unskip:' followed by a copy-and-pasted list (without variants) of the below tests you wish to run (unskip plus test list must be valid yaml)

  • openstack:ubuntu-24.04-64:tests/main/i18n
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-flag-restart
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:audio_record_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:audio_record_timespan_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:audio_record_timespan_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_multiple_actioned_by_other_pid_always_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_multiple_actioned_by_other_pid_always_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_multiple_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_multiple_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_multiple_not_actioned_by_other_pid_single_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_multiple_not_actioned_by_other_pid_single_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_write_chmod_same_fd_single_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_write_chmod_same_path_single_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:create_write_write_same_path_single_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:download_file_conflict
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:download_file_defaults
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:download_file_safer
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:read_single_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:read_single_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:special_characters
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:timespan_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:timespan_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:write_read_multiple_actioned_by_other_pid_allow_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:write_read_multiple_actioned_by_other_pid_deny_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:write_single_allow
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-integration-tests:write_single_deny
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-prompt-restoration
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_allow_forever
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_allow_session
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_allow_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_allow_timespan
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_deny_forever
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_deny_session
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_deny_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:audiorecord_deny_timespan
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_allow_forever
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_allow_session
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_allow_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_allow_timespan
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_deny_forever
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_deny_session
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_deny_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:camera_deny_timespan
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_allow_forever
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_allow_session
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_allow_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_allow_timespan
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_deny_forever
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_deny_session
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_deny_single
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-smoke:home_deny_timespan
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-snapd-startup
  • openstack:ubuntu-26.04-64:tests/main/apparmor-prompting-support
  • openstack:ubuntu-26.04-64:tests/main/i18n
  • openstack:ubuntu-26.04-64:tests/main/interfaces-requests-activates-handlers

@andrewphelpsj andrewphelpsj force-pushed the prereq-refactor branch 3 times, most recently from b6e6be4 to 755a207 Compare June 4, 2026 15:56
@andrewphelpsj

Copy link
Copy Markdown
Member Author

I rewrote history here to break down the refactor into smaller incremental changes that work towards the final change. Hopefully should make reviewing a bit easier.

@pedronis pedronis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looked at the first 4 commits, I have enough questions already that is probably a good idea to stop there for now, some things are a bit confusing

Comment on lines +97 to +103
NoReRefresh: true,

// we're calling an API facing call which would otherwise be normally
// expected to produce a delayed effects taskset, but since the desire
// is to inject the tasksets into the current change, set the flag to
// avoid generating one
NoDelayedSideEffects: true,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit is strange because I don't see a place that were are removing in here where these Flags were set

Base: "none",
PrereqContentAttrs: map[string][]string{"prereq1": {"some-content"}},
// set devmode to prove that prerequisites don't inherit these flags
Flags: snapstate.Flags{DevMode: true},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to double the test?

Comment thread overlord/snapstate/handlers_prereq.go Outdated
// tasks are ordered by the remodeling code. specifically, all snap
// downloads during a remodel happen prior to snap installation. thus,
// we cannot wait for snaps to be installed here. see remodelTasks for
// more information on how the tasks are ordered.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment got already strange in previous refactor I think but now it's really confusing, returning early here really means not doing prereq, this needs to be reformulated

Comment thread overlord/snapstate/handlers_prereq.go Outdated
return false, retry
}

// if this snap is already waiting behind the in-flight refresh, this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably be a bit more explicit here than say "this snap"

Comment thread overlord/snapstate/handlers_prereq.go Outdated
// well.
//
// snapd is special, we'll always wait for that to be fully done before
// progressing this task

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wasn't true before, snapd passed in false ??

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment placement is a bit odd

@pedronis pedronis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another note

Comment thread overlord/snapstate/handlers_prereq.go Outdated
}

func installOneBaseOrRequired(t *state.Task, snapName string, contentAttrs []string, requireTypeBase bool, channel string, onInFlight error, userID int, flags Flags, deviceCtx DeviceContext) (*state.TaskSet, error) {
func skipOrRetryPrereq(prereqs *state.Task, snapName string, required bool) (bool, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result needs some names, and function probably a comment

Comment thread overlord/snapstate/handlers_prereq.go Outdated
// well.
//
// snapd is special, we'll always wait for that to be fully done before
// progressing this task

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment placement is a bit odd

Comment thread overlord/snapstate/handlers_prereq.go
Comment thread overlord/snapstate/handlers_prereq.go Outdated
return false, err
}

retry := &state.Retry{After: prerequisitesRetryTimeout}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return a retry from this is a bit too subtle now

Comment thread overlord/snapstate/handlers_prereq.go Outdated
if link.Change().ID() == prereqs.Change().ID() {
return skip, nil
}
// not in this change, poll until that task is done

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case we would conflict that's why we need to retry anyway, the old code was doing that differently

@pedronis pedronis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another one

Comment thread overlord/snapstate/handlers_prereq.go Outdated

// snap is being installed, retry later
return true, nil
required := snapName == "snapd" || requireTypeBase

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear that this new flag is useful, because anyway the function then need to look at snapd as name anyway. So maybe we should pass the old flag and to something slightly different inside the helper

@andrewphelpsj

Copy link
Copy Markdown
Member Author

I've rewritten history so that things should be a bit more reviewable.

@andrewphelpsj andrewphelpsj requested a review from pedronis June 10, 2026 13:53

@pedronis pedronis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the tweaks, did another pass, some small comments and questions

func serializeTaskSetBeforeInProgressChange(ts *state.TaskSet, chg *state.Change) {
tasks := make([]*state.Task, 0, len(chg.Tasks()))
for _, t := range chg.Tasks() {
if t.Status() == state.DoingStatus {

@pedronis pedronis Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if some subpart of the change is undoing?

@andrewphelpsj andrewphelpsj Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've made this only consider tasks that still need to be done, in the do direction.

Comment on lines +306 to +309
// if the base being installed by the prerequisites task is already ordered
// behind the in-flight prerequisite link task in the same lane, this task
// does not need to wait for that prerequisite out-of-band as well.
waiting, err := snapWaitsForLinkInSameLane(prereqs, link)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this can never apply to snapd itself?

@andrewphelpsj andrewphelpsj Jun 10, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific reason why this shortcut was added was to prevent a deadlock that can occur due to some of the task ordering done by the seed-refresh code. That case can't happen to snapd, only bases.

This behavior was added in PR #16827.

Conceptually though, I don't think we should make it apply to snapd anyways. Most things should happen after snapd is fully refreshed.

return nil, nil
}
// as a special case, we allow the core snap to satisfy a core16 requirement
if sn.InstanceName == "core16" {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can simply drop this code? core16 was removed

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this fallback in quite few places. If we want to drop it, they should probably be done all at once? And probably not in this PR.

@andrewphelpsj andrewphelpsj requested a review from pedronis June 10, 2026 19:55

@pedronis pedronis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question

switch t.Status() {
case state.DoStatus:
case state.WaitStatus:
if t.WaitedStatus() != state.DoStatus {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new test tests this continue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, TestSerializeTaskSetBeforeInProgressChangeIncludesWaitStatusForDo hits this branch.

@andrewphelpsj andrewphelpsj requested a review from pedronis June 11, 2026 11:57

@pedronis pedronis left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants