Skip to content

feat: add Prometheus alerting rules for API latency and error-rate SLAs (#846)#863

Open
KingDavid9999 wants to merge 13 commits into
rinafcode:mainfrom
KingDavid9999:fixes-issue-#846
Open

feat: add Prometheus alerting rules for API latency and error-rate SLAs (#846)#863
KingDavid9999 wants to merge 13 commits into
rinafcode:mainfrom
KingDavid9999:fixes-issue-#846

Conversation

@KingDavid9999

Copy link
Copy Markdown

Summary

Closes #846
Adds Prometheus alerting rules for SLA breach detection on the TeachLink backend, along with Alertmanager Slack routing config and on-call runbooks. Previously, SLA breaches were only detectable by manually watching dashboards.

Changes

New files

charts/teachlink-backend/templates/prometheus-rules.yaml: PrometheusRule CR with 4 alerts
charts/teachlink-backend/values.yaml: configurable thresholds, Alertmanager/Slack routing
charts/teachlink-backend/Chart.yaml: Helm chart metadata
charts/teachlink-backend/templates/_helpers.tpl: standard Helm template helpers
docs/RUNBOOKS.md: on-call runbook for all 4 alerts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant