Fix Windows crash with non-ASCII locale thousands separators#188
Fix Windows crash with non-ASCII locale thousands separators#188joffrey-b wants to merge 1 commit into
Conversation
|
Do you know if the bug has been reported in a public bug tracker somewhere? That way, I could keep track of it and remove this workaround when it's no longer necessary. |
|
Yes, it's a known, currently-open defect class in Microsoft's own STL repository, though I haven't found a report that reproduces our exact crash ( microsoft/STL#6206, "LWG-4090 Underspecified use of locale facets for locale-dependent A comment on that issue contains a repro of the same underlying defect: locale::global(locale(""));
fputs(format("{:L}\n", 12345).c_str(), stdout); // char version, broken
fputws(format(L"{:L}\n", 12345).c_str(), stdout); // wchar_t version, correctWith a UTF-8 codepage and a locale whose thousands separator is multi-byte (e.g. Polish NBSP U+00A0, or our French narrow NBSP U+202F), the There's also a related, closed issue, microsoft/STL#3562, where a maintainer confirms this is a by-design limitation of So: same root-cause defect class as our crash (single-byte |
Summary
rsgain crashes on Windows when run directly in a console on systems using a locale whose
thousands separator is a non-ASCII character. French is a confirmed example: Windows uses
NARROW NO-BREAK SPACE (U+202F) as the thousands separator, which triggers the crash.
The root cause is a bug in the Windows CRT's
std::numpunct<char>facet. The fix is aone-line guard that skips setting the global locale on Windows.
Closes #185
Root cause
Three things combine to produce the crash:
main()sets the global C++ locale to the system locale(
std::locale::global(std::locale(""))), enabling locale-aware number formatting.{:L}(e.g. the sample rate line"Stream #0: FLAC, 16 bit, 44 100 Hz, 2 ch"), which accesses the locale'sstd::numpunct<char>facet to retrieve the thousands separator character.numpunct<char>::thousands_sep()crashes (exception code0xC0000409,STATUS_STACK_BUFFER_OVERRUNinucrtbase.dll) when the system localedefines a thousands separator that cannot be represented as a single
char. French usesNARROW NO-BREAK SPACE (U+202F), which encodes to three bytes in UTF-8
(
\xE2\x80\xAF). Thenumpunct<char>facet, which can only return a singlechar,overruns an internal CRT buffer attempting to produce this character.
This explains all the observed symptoms:
Tee-Objectworks: stdout is not a console so the progress paths thatcall
{:L}are not reached in the same way, and the crash does not manifest.-qworks:output_okis suppressed entirely, so the{:L}format specifiers arenever evaluated.
--versionworks: no{:L}format specifiers are involved."Stream #0: ... {:L} Hz ..."outputline: this is the first
{:L}call that formats a number large enough (44100) torequire a thousands separator.
Fix
Guard the
std::locale::globalcall with#ifndef _WIN32. Without a system locale set,{:L}falls back to the C locale which uses no thousands separator, sonumpunctisnever called with a non-ASCII character on Windows.
This is a targeted workaround for a Windows CRT bug. The change is one line in
rsgain.cpp. No build system changes, no new dependencies. Locale-aware numberformatting (
{:L}) continues to work correctly on Linux and macOS.Changes
src/rsgain.cpp: Wrapstd::locale::global(std::locale(""))in#ifndef _WIN32.A comment explains the reason.
Testing
Verified that
rsgain easy <directory>no longer crashes on a French Windows 11 systemwhen run directly in PowerShell (real console). All files are scanned and tagged correctly.
Behaviour on Linux and macOS is unchanged.