| title | Monitor machine learning experiments and models |
|---|---|
| description | Learn how to monitor machine learning experiments from the Fabric Monitoring hub and track traffic for machine learning model endpoints. |
| ms.author | ruxu |
| author | ruixinxu |
| ms.reviewer | scottpolly |
| ms.topic | how-to |
| ms.custom | |
| ms.date | 02/11/2026 |
| ms.search.form | machine learning monitoring |
[!INCLUDE product-name] provides built-in monitoring capabilities for machine learning experiments and models. You can track experiment runs directly from the Fabric Monitoring hub, and monitor real-time traffic for active machine learning model endpoints. These monitoring features give you visibility into your machine learning workflows and help you understand how your deployed models are being used.
[!INCLUDE feature-preview]
- A [!INCLUDE product-name] workspace with a capacity assigned.
- At least one machine learning experiment with recorded runs, or a machine learning model with an active machine learning model endpoint.
Machine learning experiments are integrated directly into the Fabric Monitoring hub. This integration provides a centralized view of all experiment activities, the related notebooks, Spark applications, and the machine learning experiment runs those applications generate. By using the Monitoring hub, you can track, filter, and troubleshoot your experiment runs without navigating away from a single unified experience.
To view machine learning experiment runs from the Monitoring hub:
- Open the Monitor hub from the left navigation pane in [!INCLUDE product-name].
- Select the Experiment filter to narrow the view to experiment-related activities.
- Browse the list of experiment activities, which shows details like status, start time, location, and duration.
:::image type="content" source="media/monitor-machine-learning-experiments-models/monitor-machine-learning-experiments-from-monitoring-hub.png" alt-text="Screenshot showing how to monitor machine learning experiment from monitoring hub." lightbox="media/monitor-machine-learning-experiments-models/monitor-machine-learning-experiments-from-monitoring-hub.png":::
The Monitoring hub provides filtering options that help you find specific experiment runs:
- Status: Filter by run status such as succeeded, failed, or in progress.
- Time range: Narrow results to runs created within a specific time window.
- Submitter: Filter runs by the user who submitted them.
- Location: If you have access to multiple workspaces, filter by workspace to focus on relevant experiments.
These filters make it easier to manage and analyze experiments, especially in workspaces with a high volume of machine learning activities.
The Monitoring hub integrates machine learning experiments into the notebook activity view. When you select a notebook activity that triggered a machine learning experiment run, you can access the Item snapshots page to see a snapshot of the experiments and runs captured at the time of execution. This page also displays a snapshot of all settings and parameters that were in effect when the notebook ran.
To view related experiment runs from a notebook activity:
- Open the Monitor hub and locate the notebook activity of interest.
- Select the notebook activity to open its detail view.
- Navigate to the Item snapshots page to see the associated experiments and runs.
:::image type="content" source="media/monitor-machine-learning-experiments-models/monitor-machine-learning-experiments-from-notebook-activity.png" alt-text="Screenshot showing how to monitor machine learning experiment from notebook activity." lightbox="media/monitor-machine-learning-experiments-models/monitor-machine-learning-experiments-from-notebook-activity.png":::
The Item snapshots page includes a list of all experiments and runs generated during the notebook execution, captured at the time the notebook ran.
This approach is useful when you need to debug, reproduce, or audit the machine learning experiment runs produced by a specific notebook execution.
When you activate an machine learning model endpoint for a specific model version, [!INCLUDE product-name] starts tracking traffic to that endpoint. Traffic monitoring gives you insight into how frequently your model is being called, which helps you understand adoption and plan for capacity.
To view traffic for an active model endpoint:
- Go to the machine learning model in your workspace.
- Select the model version with an active endpoint.
- On the model detail view, scroll down to the Endpoint metrics section to see traffic information for that version.
The machine learning model endpoint metric view provides key metrics about endpoint usage, including:
| Metric | Description |
|---|---|
| Request count | The total number of prediction requests the endpoint receives. |
| Error count | The total number of failed requests the endpoint receives. |
| Request latency | The time taken to process and respond to prediction requests. |
:::image type="content" source="media/monitor-machine-learning-experiments-models/endpoint-metrics.png" alt-text="Screenshot showing endpoint metrics for a machine learning model." lightbox="media/monitor-machine-learning-experiments-models/endpoint-metrics.png":::
Note
Metrics typically appear within 15 minutes after the endpoint receives traffic. If no data or line appears, the endpoint might be inactive during the selected time range, or its telemetry might have expired after 90 days. Try adjusting the time range or check back after the endpoint is used.
If your model has multiple active version endpoints, you can compare traffic patterns across versions. This comparison helps you identify which model version receives the most requests and whether newer versions are adopted.
Note
Machine learning models can have active endpoints for up to five versions at a time. Traffic monitoring is available for each active version endpoint independently.
Active model endpoints consume Fabric Capacity Units (CUs) based on incoming traffic. Endpoints can automatically scale up to three compute nodes when traffic increases. Monitoring traffic patterns helps you:
- Identify low-traffic endpoints that might benefit from the auto sleep feature to reduce capacity consumption.
- Detect traffic spikes that could affect endpoint latency or capacity usage.
- Determine the right time to deactivate endpoints for model versions no longer in use.
For detailed information about endpoint capacity consumption and billing, see machine learning model endpoint consumption rates.
Tip
Use the Fabric Capacity Metrics app to view total capacity usage for model endpoint operations. Model endpoint operations appear under the item name "Model Endpoint" in the metrics app.