Commit 9e20d7e
committed
rpng: stream deflate output via multi-IDAT instead of one big buffer
Replaces rpng_save_image_stream's full-frame encode_buf + monolithic
deflate with per-row incremental deflate feeding a 16 KiB chunk buffer
that emits multiple IDAT chunks as it fills. Pixel output is unchanged;
only the internal memory shape differs.
Peak transient memory at 4K BGR24 drops from ~50 MiB (encode_buf +
deflate_buf, both sized to the full filtered frame) to ~70 KiB
(per-row filter scratch + one 16 KiB chunk buffer). Memory now scales
with width only, not width*height. Cost is ~0.07% more output bytes
from IDAT chunk headers (12 bytes per 16 KiB chunk):
size old bytes new bytes delta
320x240 19,733 19,745 +12 (+0.06%)
1024x768 100,995 101,067 +72 (+0.07%)
1920x1080 181,909 182,041 +132 (+0.07%)
A 1920x1080 screenshot now produces 12 IDAT chunks vs. 1 before.
pngcheck accepts both old and new output at identical compression
ratio (97.8% on 1920x1080 pseudorandom).
Wall-clock performance: no regression. Measured on Linux x86_64
with /tmp on tmpfs, 5 iterations per config, two trials each:
size pattern old avg new avg delta
1280x720 solid 112.1 ms 112.2 ms +0.1%
1280x720 gradient 116.2 ms 112.7 ms -3.0%
1280x720 photo 497.7 ms 497.2 ms -0.1%
1280x720 random 129.3 ms 127.8 ms -1.1%
1920x1080 solid 244.9 ms 238.3 ms -2.7%
1920x1080 gradient 249.5 ms 240.5 ms -3.6%
1920x1080 random 270.8 ms 269.5 ms -0.5%
Deltas below ~3% are within the noise floor observed for the same
encoder trial-to-trial. The overall shape is that the new encoder
is equal to or marginally faster than the old one across every
measured case, plausibly from reduced allocator pressure (no
~25 MiB malloc+free per encode) and better cache locality. Results
on spinning disks where each IDAT flush becomes a real write may
differ; not measured.
Filter selection, per-row scratch layout, and prev_encoded carry-over
are unchanged -- same five filters, same lowest-SAD pick. Only the
downstream "what happens to the filtered row" differs.
Three trans_stream_zlib subtleties the implementation accounts for:
1. The backend reports AGAIN (not BUFFER_FULL) when avail_out and
avail_in both hit zero on the same call -- BUFFER_FULL requires
avail_in != 0. We detect this via chunk_fill >= IDAT_CHUNK_SIZE
after every successful row feed and flush proactively; otherwise
the next row's trans() would find avail_out=0 with fresh input
waiting and return Z_BUF_ERROR.
2. During the final Z_FINISH drain, AGAIN means "more output
pending, buffer full" (set_in(NULL,0) ensures input is never
the gating factor). We flush and retry until NONE.
3. flush_idat_chunk tolerates payload_len=0 so a Z_STREAM_END
landing on an exact chunk boundary doesn't emit a spurious
empty IDAT.
Ships with a round-trip regression test
(libretro-common/samples/formats/png/rpng_roundtrip_test.c) verifying
pixel-level round-trip across all three public encode entry points
(argb, bgr24 top-down, bgr24 bottom-up via the negative-pitch trick
used by task_screenshot's viewport fast path) and three size tiers:
- Hand-picked (45): 4x4, 37x29, 320x240 x 5 patterns x 3 entry
points. Patterns exercise different filter
selection outcomes.
- Small-width (48): widths {1,2,3,8,31,32,33,257} x heights {1,3}
x 3 entry points, pseudorandom. Catches
off-by-one bugs at narrow-width and near-
alignment boundaries.
- Large (12): 1024x1, 1x1024, 2048x1, 1x12000 x 3 entry
points, pseudorandom. 1x12000 crosses zlib's
default 32 KiB sliding window.
105 subtests total. Exit 0 on success.
Every encoded PNG also passes a structural check: pngcheck(1) if
installed, otherwise a built-in fallback that walks chunks, verifies
CRCs via encoding_crc32 (already linked), and enforces chunk ordering.
pngcheck is not a build or runtime dependency. When the built-in
fallback is active a startup self-test generates a valid PNG,
confirms the fallback accepts it, corrupts a byte, confirms the
fallback rejects it -- aborting if the validator itself is broken
rather than running 105 subtests under a broken validator.
CI integration
(.github/workflows/Linux-libretro-common-samples.yml):
adds rpng_roundtrip_test to the RUN_TARGETS allowlist and extends
the Makefile target-extraction regex from TARGET_TEST to
TARGET_TEST[0-9]* so that TARGET_TEST2 is picked up by auto-discovery.
Strict superset -- extracted target lists are unchanged for every
other Makefile in libretro-common/samples/.
Verified:
- 105/105 pass with pngcheck validation active.
- 105/105 pass with built-in fallback active (stock ubuntu-latest
runner scenario -- no pngcheck, zlib1g-dev only).
- ASan+UBSan clean in both configurations.
- pngcheck OK on every output file.
- File sizes within +0.07% of old encoder.
- No wall-clock regression on measured configurations.1 parent 5aabb22 commit 9e20d7e
4 files changed
Lines changed: 956 additions & 104 deletions
File tree
- .github/workflows
- libretro-common
- formats/png
- samples/formats/png
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| 62 | + | |
62 | 63 | | |
63 | 64 | | |
64 | 65 | | |
| |||
130 | 131 | | |
131 | 132 | | |
132 | 133 | | |
| 134 | + | |
133 | 135 | | |
134 | | - | |
| 136 | + | |
135 | 137 | | |
136 | 138 | | |
137 | 139 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
204 | 204 | | |
205 | 205 | | |
206 | 206 | | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
207 | 230 | | |
208 | 231 | | |
209 | 232 | | |
210 | 233 | | |
211 | 234 | | |
212 | 235 | | |
213 | 236 | | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
227 | 257 | | |
228 | 258 | | |
229 | 259 | | |
| |||
233 | 263 | | |
234 | 264 | | |
235 | 265 | | |
236 | | - | |
237 | | - | |
238 | | - | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
239 | 269 | | |
240 | 270 | | |
241 | 271 | | |
242 | 272 | | |
243 | | - | |
244 | | - | |
245 | | - | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
246 | 286 | | |
247 | 287 | | |
248 | | - | |
249 | | - | |
| 288 | + | |
| 289 | + | |
250 | 290 | | |
251 | 291 | | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
259 | 297 | | |
260 | | - | |
261 | | - | |
262 | | - | |
| 298 | + | |
263 | 299 | | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
264 | 306 | | |
265 | 307 | | |
266 | 308 | | |
267 | 309 | | |
268 | 310 | | |
269 | | - | |
270 | | - | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
271 | 334 | | |
272 | | - | |
273 | | - | |
274 | | - | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
275 | 341 | | |
276 | | - | |
277 | | - | |
278 | | - | |
279 | | - | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
286 | | - | |
287 | | - | |
288 | | - | |
289 | | - | |
290 | | - | |
291 | | - | |
| 342 | + | |
| 343 | + | |
292 | 344 | | |
293 | | - | |
| 345 | + | |
294 | 346 | | |
295 | | - | |
296 | | - | |
297 | | - | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
298 | 362 | | |
299 | 363 | | |
300 | | - | |
301 | | - | |
302 | | - | |
303 | | - | |
304 | | - | |
305 | | - | |
306 | | - | |
307 | | - | |
308 | | - | |
309 | | - | |
310 | | - | |
311 | | - | |
| 364 | + | |
| 365 | + | |
312 | 366 | | |
313 | | - | |
314 | | - | |
315 | | - | |
316 | | - | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
317 | 374 | | |
318 | 375 | | |
319 | 376 | | |
320 | | - | |
321 | | - | |
322 | | - | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
323 | 381 | | |
324 | | - | |
325 | | - | |
326 | | - | |
327 | | - | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
328 | 385 | | |
329 | | - | |
330 | | - | |
331 | | - | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
332 | 414 | | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
| 415 | + | |
342 | 416 | | |
343 | 417 | | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | 418 | | |
353 | 419 | | |
| 420 | + | |
354 | 421 | | |
355 | | - | |
356 | | - | |
357 | 422 | | |
358 | 423 | | |
359 | 424 | | |
360 | 425 | | |
361 | 426 | | |
362 | 427 | | |
| 428 | + | |
| 429 | + | |
363 | 430 | | |
364 | 431 | | |
365 | 432 | | |
| |||
0 commit comments