-Performance evaluations for AI applications are essential to help improve their reliability in real-world applications. Metrics such as response relevance, accuracy, and groundedness help assess the accuracy and consistency of AI-generated outputs, so that they're factually supported in grounded content scenarios, contextually appropriate, and logically structured. For Microsoft 365 Copilot, we regularly conduct rigorous quality evaluations across multiple metrics such as relevance, accuracy, and groundedness.
0 commit comments