Skip to content

Commit 5aabb22

Browse files
committed
task_screenshot: skip redundant BGR24 copy on viewport screenshot path
take_screenshot_viewport allocates a BGR24 buffer, has the video driver's read_viewport fill it bottom-up, and hands it to screenshot_dump. screenshot_dump then allocates a second full-frame BGR24 buffer (state->out_buffer), and screenshot_dump_direct runs the generic scaler to convert the first buffer into the second -- which for this case (input and output both BGR24 at the same dimensions) amounts to a flip-and-copy. rpng_save_image_stream already walks source rows via `data += pitch` with a signed pitch, and take_screenshot_raw already exploits this to feed the encoder a bottom-up buffer via a pointer to the last row plus a negative pitch. Apply the same trick to the viewport path: when the source is already BGR24 and no rescaling is needed, hand the source buffer directly to rpng_save_image_bgr24 and skip the out_buffer allocation entirely. For a PNG viewport screenshot this saves one full-frame BGR24 allocation plus one full-frame memcpy. At 4K that is ~48 MiB of heap pressure and copy work avoided per screenshot. The savestate thumbnail path (where out_width/out_height differ from width/height because thumbnails render at native core geometry) is unaffected: the fast path is gated on dimensions matching, and smaller-than-screen thumbnail dimensions fall through to the existing scaler. The XRGB8888 and RGB565 raw paths are also unaffected; they still go through the scaler for real format conversion. Also tightens the retained-case allocation to out_width*out_height*3 instead of width*height*3. The old size was always >= the needed size and never under-allocated, but the extra bytes were only ever used for savestate thumbnails smaller than screen dimensions. Tested: PNG viewport screenshots are byte-identical before and after the patch (verified with cmp).
1 parent 36fe5ab commit 5aabb22

1 file changed

Lines changed: 42 additions & 6 deletions

File tree

tasks/task_screenshot.c

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,33 @@ static bool screenshot_dump_direct(screenshot_task_state_t *state)
9999
if (!input)
100100
return ret;
101101

102+
/* Fast path: source is already BGR24 and no resampling is
103+
* needed, so hand the source buffer directly to the PNG
104+
* encoder. rpng_save_image_stream walks rows via `data +=
105+
* pitch` with a signed pitch, so a bottom-up source is
106+
* encoded top-down for free by starting at the last row and
107+
* passing a negative row stride (same trick take_screenshot_raw
108+
* uses via screenshot_dump's pitch argument).
109+
*
110+
* This avoids allocating a second full-frame BGR24 buffer and
111+
* the flip-and-copy the scaler would otherwise do between them;
112+
* at 4K that is ~48 MiB of allocation and copy per screenshot. */
113+
if ( (state->flags & SS_TASK_FLAG_BGR24)
114+
&& state->out_width == state->width
115+
&& state->out_height == state->height)
116+
{
117+
ret = rpng_save_image_bgr24(
118+
state->filename,
119+
input,
120+
state->out_width,
121+
state->out_height,
122+
(unsigned)(-state->pitch)
123+
);
124+
/* state->out_buffer is NULL in this path (see screenshot_dump);
125+
* nothing to free. */
126+
return ret;
127+
}
128+
102129
if (state->flags & SS_TASK_FLAG_BGR24)
103130
scaler->in_fmt = SCALER_FMT_BGR24;
104131
else if (state->pixel_format_type == RETRO_PIXEL_FORMAT_XRGB8888)
@@ -109,8 +136,7 @@ static bool screenshot_dump_direct(screenshot_task_state_t *state)
109136
video_frame_convert_to_bgr24(
110137
scaler,
111138
state->out_buffer,
112-
(const uint8_t*)state->frame + ((int)state->height - 1)
113-
* state->pitch,
139+
input,
114140
state->width,
115141
state->height,
116142
-state->pitch,
@@ -431,12 +457,22 @@ static bool screenshot_dump(
431457
}
432458

433459
#if defined(HAVE_RPNG)
434-
if (!(buf = (uint8_t*)malloc(width * height * 3)))
460+
/* Only allocate the BGR24 output buffer when screenshot_dump_direct
461+
* will actually use the scaler. When the source is already BGR24 at
462+
* the output dimensions (typical of the viewport read-back path,
463+
* take_screenshot_viewport), the encoder walks the source directly
464+
* with negative pitch and no intermediate buffer is needed. */
465+
if ( !(state->flags & SS_TASK_FLAG_BGR24)
466+
|| state->out_width != width
467+
|| state->out_height != height)
435468
{
436-
free(state);
437-
return false;
469+
if (!(buf = (uint8_t*)malloc(state->out_width * state->out_height * 3)))
470+
{
471+
free(state);
472+
return false;
473+
}
474+
state->out_buffer = buf;
438475
}
439-
state->out_buffer = buf;
440476
#endif
441477

442478
if (use_thread)

0 commit comments

Comments
 (0)