Skip to content

Commit 0422b07

Browse files
yghannambp3tk0v
authored andcommitted
x86/mce/amd: Filter bogus hardware errors on Zen3 clients
Users have been observing multiple L3 cache deferred errors after recent kernel rework of deferred error handling.¹ ⁴ The errors are bogus due to inconsistent status values. Also, user verified that bogus MCA_DESTAT values are present on the system even with an older kernel.² The errors seem to be garbage values present in the MCA_DESTAT of some L3 cache banks. These were implicitly ignored before the recent kernel rework because these do not generate a deferred error interrupt. A later revision of the rework patch was merged for v6.19. This naturally filtered out most of the bogus error logs. However, a few signatures still remain.³ Minimize the scope of the filter to the reported CPU family/model/stepping and only for errors which don't have the Enabled bit in the MCi status MSR. ¹ https://lore.kernel.org/[email protected] ² https://lore.kernel.org/[email protected] ³ https://lore.kernel.org/[email protected]https://lore.kernel.org/r/CAKFB093B2k3sKsGJ_QNX1jVQsaXVFyy=wNwpzCGLOXa_vSDwXw@mail.gmail.com [ bp: Generalize the condition according to which errors are bogus. ] Fixes: 7cb735d ("x86/mce: Unify AMD DFR handler with MCA Polling") Closes: https://lore.kernel.org/[email protected] Reported-by: Bert Karwatzki <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Tested-By: Bert Karwatzki <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/[email protected]
1 parent 7aaa804 commit 0422b07

1 file changed

Lines changed: 8 additions & 0 deletions

File tree

  • arch/x86/kernel/cpu/mce

arch/x86/kernel/cpu/mce/amd.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -604,6 +604,14 @@ bool amd_filter_mce(struct mce *m)
604604
enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
605605
struct cpuinfo_x86 *c = &boot_cpu_data;
606606

607+
/* Bogus hw errors on Cezanne A0. */
608+
if (c->x86 == 0x19 &&
609+
c->x86_model == 0x50 &&
610+
c->x86_stepping == 0x0) {
611+
if (!(m->status & MCI_STATUS_EN))
612+
return true;
613+
}
614+
607615
/* See Family 17h Models 10h-2Fh Erratum #1114. */
608616
if (c->x86 == 0x17 &&
609617
c->x86_model >= 0x10 && c->x86_model <= 0x2F &&

0 commit comments

Comments
 (0)