fix: prevent intermittent digest-mismatch errors during archive updates#296
fix: prevent intermittent digest-mismatch errors during archive updates#296gruyaume wants to merge 4 commits into
Conversation
Signed-off-by: Guillaume Belanger <[email protected]>
|
Hey @gruyaume, thanks for your PR! Is this ready for review? |
Yes. If you find the code changes mostly up to you standards, I am happy to address minor issues. However if there are major issues and you have a completely different idea of the solution, please go ahead and implement it your way. I only want the issue fixed as soon as possible, I don't care about getting the attribution. |
Thanks! I explored another approach in parallel but I ended up circling back to yours so there is no point just having another PR for the sake of it. I will review yours then, because I expect only minor changes will be requested. |
Signed-off-by: Guillaume Belanger <[email protected]>
Signed-off-by: Guillaume Belanger <[email protected]>
|
|
||
| body := resp.Body | ||
| if strings.HasSuffix(path, ".gz") { | ||
| if flags&fetchGzip != 0 { |
There was a problem hiding this comment.
This is a behavioral change. Has anyone double checked that we don't have any other cases that were handled before and are not handled now?
| // are content-addressed and so are immune to the inconsistent view of | ||
| // InRelease vs Packages.gz that mirrors and CDNs can serve while a | ||
| // publication is propagating. See https://wiki.ubuntu.com/AptByHash. | ||
| gzPath := packagesPath + ".gz" |
There was a problem hiding this comment.
This whole function needs to have its variable names reviewed and polished, as it's becoming unreadable. We have digests, digest, gzPath, gzDigest, byHashPath, etc. These things all talk about their types and suffixes, and not about what they are. And they are not the same thing.
/cc @upils
| func (s *httpSuite) sawByHashRequest() bool { | ||
| for _, req := range s.requests { | ||
| if strings.Contains(req.URL.Path, "/by-hash/SHA256/") { | ||
| return true |
There was a problem hiding this comment.
If the archive only has the right file in the right place, the only way for us to get the right content is to use that file. I might be missing something, but it looks like at the moment all we are looking at is a request being made inside a directory, possibly in the wrong place considering we fallback.
| } | ||
| if r.AcquireByHash && itemPath != r.Path() && !skipByHash[itemPath] { | ||
| byHashPath := path.Join(prefix, "dists", r.Suite, path.Dir(itemPath), "by-hash", "SHA256", makeSha256(itemContent)) | ||
| content[byHashPath] = itemContent |
There was a problem hiding this comment.
This used to be a very clean and simple function, and it'd be nice to keep it that way. From these options, having a ByHash bool (not "Acquire", that's on the fetch side) seems in line with the type semantics. The other options look like custom hacks that require reading both the test and the implementation to be made sense of, so we need to think a bit about how to represent these while keeping that clean and simple aspect.
/cc @upils
Chisel intermittently fails with
expected digest X, got Ywhen Ubuntu's archive is mid-publication and InRelease is temporarily inconsistent withPackages.gzat the named path (a window we observed lasting at least 39 minutes against the same publication in real CI runs for Ella Core). Fetch the package index by content hash using apt'sAcquire-By-Hash, which Ubuntu archives have advertised since 16.04 (2016), so the URL itself encodes the expected bytes; falls back to the named path when an archive doesn't advertise the feature or a specific hash has been garbage-collected.Fixes #295
Reference