Commit 6d07a2a
committed
metal: memory and correctness cleanup in gfx/drivers/metal.m
Five related fixes in the Metal driver, grouped together because
they're all narrow, independent, and touch the same file:
1. Fix byte offset in MetalRaster.updateGlyph didModifyRange:
The managed-storage buffer invalidation for incremental glyph
uploads was passing the row index as the byte offset rather than
row * stride. Length was correctly in bytes (height * _stride),
so the invalidated range described bytes
[row_index .. row_index + height*_stride), which almost never
overlapped the actually-modified rows. On managed-storage devices
this can leave recently-drawn glyphs invisible to the GPU until
the atlas is invalidated by some other path.
Every other didModifyRange: call site in this file uses a byte
range; aligning this one matches the convention. No behavioural
change on shared-storage / Cocoa Touch.
2. Stream screenshot read-back one row at a time
Context.readBackBuffer: malloced a full-frame BGRA copy of the
whole backbuffer, getBytes:'d into it, then converted BGRA->BGR
into the caller's buffer row by row. For a 4K capture that is
~32 MiB of transient heap per screenshot.
Restructure to getBytes: one row at a time directly into a small
scratch buffer (stack up to 16K-wide, heap fallback beyond),
then convert in place. Peak transient footprint drops from
~32 MiB to ~16 KiB. Also narrows the getBytes: source region to
the viewport Y range instead of reading the whole backbuffer
and discarding rows above and below.
3. Unify font atlas upload paths
MetalRaster init had two branches: a "fast path" using
newBufferWithBytes:length:options: when stride matched atlas
width, and a row memcpy loop when it did not. Both copied the
atlas exactly once, and the fast path carried a workaround
comment noting that newBufferWithBytes: does not correctly
invalidate the buffer on macOS, forcing a manual
didModifyRange: anyway. That made the two paths behaviourally
identical.
Collapse both to a single newBufferWithLength: + .contents
fill, with a whole-buffer memcpy when stride matches width
and a row memcpy loop otherwise. One code path, one
invalidation site, no change in allocated memory or copies.
4. Bound BufferChain memory and clear stale per-node allocated
BufferChain grew monotonically: allocRange: appended a new node
whenever a request exceeded the current node's remaining space,
but discard only reset the head pointer and offset. Backing
nodes were never freed, so a single oversized allocation (heavy
shader pass, one-off geometry spike, content switch to a larger
resolution or shader chain) kept its node alive for the lifetime
of the driver, retained across all CHAIN_LENGTH chains.
Steady-state retention was therefore the all-time high-water
mark * CHAIN_LENGTH.
Trim the tail at discard: find the last node with allocated > 0
and drop nodes after it. Nodes are appended in alloc order and
allocRange: only advances forward, so a trailing unused node
means the whole tail is unused and safe to drop. Interior
unused nodes are kept (waste bounded by _blockLen per node;
they will be reused by smaller allocs on the next cycle).
Only trims when the chain was actually used this cycle
(_allocated > 0) so a quiescent frame doesn't drop the chain
and force reallocation on the next use.
Also reset n.allocated on every node at discard. commitRanges
walks all nodes with allocated > 0 and didModifyRange:'s them,
so without this reset a node that was filled in cycle N but
partially refilled in cycle N+1 would get a stale (larger)
range committed. Semantically wrong and a bandwidth waste on
macOS managed storage.
5. Fix bytesPerRow in TexturedView BGRA upload path
TexturedView.updateFrame: (the menu pixel framebuffer upload,
called from MetalMenu.updateFrame: -> set_texture_frame) was
passing (4 * pitch) as bytesPerRow to replaceRegion: for
BGRA8Unorm/BGRX8Unorm sources. pitch is already the source row
stride in bytes (libretro convention, matched by the MetalMenu
caller which computes it as RPixelFormatToBPP(format) * width,
and matched by the adjacent else-branch).
Multiplying by 4 told Metal to step 4x the source stride
between rows, so row 0 read correct pixels but rows 1..height-1
read beyond the caller's buffer. Most likely to surface on the
32-bit RGUI path (rgb32=true -> RPixelFormatBGRA8Unorm); the
16-bit path (BGRA4Unorm, rgb32=false) goes through the
conversion branch and is unaffected.
Tested with RGUI, shaders and regular core1 parent 04731c7 commit 6d07a2a
1 file changed
Lines changed: 120 additions & 47 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
691 | 691 | | |
692 | 692 | | |
693 | 693 | | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
698 | 710 | | |
699 | 711 | | |
700 | 712 | | |
| |||
704 | 716 | | |
705 | 717 | | |
706 | 718 | | |
707 | | - | |
708 | | - | |
709 | | - | |
710 | | - | |
711 | | - | |
712 | | - | |
713 | | - | |
714 | | - | |
715 | | - | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
716 | 727 | | |
717 | 728 | | |
718 | 729 | | |
719 | 730 | | |
720 | | - | |
| 731 | + | |
721 | 732 | | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
722 | 740 | | |
723 | 741 | | |
724 | | - | |
725 | | - | |
726 | | - | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
727 | 745 | | |
728 | 746 | | |
729 | 747 | | |
730 | | - | |
| 748 | + | |
731 | 749 | | |
732 | 750 | | |
733 | 751 | | |
| |||
962 | 980 | | |
963 | 981 | | |
964 | 982 | | |
| 983 | + | |
| 984 | + | |
| 985 | + | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
965 | 1025 | | |
966 | 1026 | | |
967 | 1027 | | |
| |||
1471 | 1531 | | |
1472 | 1532 | | |
1473 | 1533 | | |
| 1534 | + | |
| 1535 | + | |
| 1536 | + | |
| 1537 | + | |
| 1538 | + | |
| 1539 | + | |
1474 | 1540 | | |
1475 | 1541 | | |
1476 | 1542 | | |
1477 | 1543 | | |
1478 | | - | |
| 1544 | + | |
1479 | 1545 | | |
1480 | 1546 | | |
1481 | 1547 | | |
1482 | 1548 | | |
1483 | 1549 | | |
1484 | | - | |
| 1550 | + | |
1485 | 1551 | | |
1486 | 1552 | | |
1487 | 1553 | | |
| |||
1650 | 1716 | | |
1651 | 1717 | | |
1652 | 1718 | | |
1653 | | - | |
1654 | | - | |
1655 | | - | |
1656 | | - | |
1657 | | - | |
1658 | | - | |
1659 | | - | |
1660 | | - | |
1661 | | - | |
1662 | | - | |
1663 | | - | |
1664 | | - | |
1665 | | - | |
1666 | | - | |
1667 | | - | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
1668 | 1729 | | |
1669 | 1730 | | |
1670 | | - | |
1671 | | - | |
1672 | | - | |
1673 | | - | |
1674 | | - | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
| 1737 | + | |
1675 | 1738 | | |
1676 | | - | |
1677 | | - | |
1678 | | - | |
| 1739 | + | |
| 1740 | + | |
| 1741 | + | |
| 1742 | + | |
| 1743 | + | |
| 1744 | + | |
1679 | 1745 | | |
| 1746 | + | |
1680 | 1747 | | |
1681 | | - | |
| 1748 | + | |
1682 | 1749 | | |
1683 | | - | |
1684 | 1750 | | |
1685 | 1751 | | |
1686 | 1752 | | |
| |||
1756 | 1822 | | |
1757 | 1823 | | |
1758 | 1824 | | |
1759 | | - | |
1760 | | - | |
| 1825 | + | |
| 1826 | + | |
| 1827 | + | |
| 1828 | + | |
| 1829 | + | |
| 1830 | + | |
| 1831 | + | |
| 1832 | + | |
| 1833 | + | |
1761 | 1834 | | |
1762 | 1835 | | |
1763 | 1836 | | |
| |||
0 commit comments