This guide explains how to perform comprehensive data quality assessment, understand quality metrics, and interpret quality reports.
The Data Quality page provides a comprehensive assessment of your sensor data quality, including:
- Completeness analysis
- Accuracy assessment
- Consistency checks
- Outlier detection
- Correlation analysis
- Overall quality scores
Data quality refers to the fitness of data for its intended use, measured by:
- Completeness: How much data is present
- Accuracy: How correct the data is
- Consistency: How uniform the data is
- Timeliness: How current the data is
- Validity: How well data conforms to rules
- Reliability: Ensure analysis results are trustworthy
- Decision Making: Make informed decisions based on quality data
- Compliance: Meet data quality standards
- Maintenance: Identify sensors needing attention
- Optimization: Improve data collection processes
- Open the application at http://localhost:5173
- Click Data Quality in the sidebar or home page
- Select a machine group from the dropdown
- Choose a machine group from the Select Machine Group dropdown
- The system loads available sensors automatically
- Set Date From to filter start date
- Set Date To to filter end date
- Click Apply Filter to update assessment
- Use Clear Filter to remove date restrictions
Benefits of Date Filtering:
- Focus on specific time periods
- Compare quality across time periods
- Improve performance for large datasets
- Click Assess Data Quality button
- Wait for analysis to complete
- Results appear in multiple sections
Data Overview:
- Date Range: Start and end dates of analyzed data
- Total Data Points: Expected number of readings
- Total Missing Values: Count of missing readings
- Missing Percentage: Overall completeness
- Number of Sensors: Sensors analyzed
Interpretation:
- Low missing percentage (< 5%): Excellent completeness
- Moderate missing (5-10%): Good, minor gaps
- High missing (> 10%): Concerns, investigate
Per-Sensor Statistics:
- Count: Number of data points
- Mean: Average value
- Standard Deviation: Data spread
- Min/Max: Minimum and maximum values
- Quartiles: 25th, 50th (median), 75th percentiles
Use Cases:
- Understand data distribution
- Identify outliers
- Compare sensors
- Detect anomalies
Completeness Metrics:
- Overall Completeness: Percentage of complete data
- Completeness Threshold: Minimum acceptable level
- Incomplete Sensors: Sensors below threshold
Interpretation:
- > 95%: Excellent completeness
- 90-95%: Good completeness
- 85-90%: Acceptable, monitor
- < 85%: Poor, take action
Action Items:
- Investigate incomplete sensors
- Check data collection system
- Review sensor health
- Plan improvements
Consistency Metrics:
- Has Duplicates: Whether duplicate records exist
- Duplicate Count: Number of duplicate records
- Duplicate Percentage: Percentage of duplicates
- Timestamps Consistent: Whether timestamps are valid
Interpretation:
- No duplicates: Good consistency
- Few duplicates (< 1%): Acceptable
- Many duplicates (> 1%): Data quality issue
- Inconsistent timestamps: System issue
Action Items:
- Remove duplicates if needed
- Fix timestamp issues
- Review data collection process
- Improve data validation
Outlier Information:
- Sensor Name: Which sensor has outliers
- Outlier Percentage: Percentage of outlier values
Outlier Detection Method:
- Uses statistical methods (Z-score, IQR)
- Identifies values significantly different from normal
- Helps detect sensor malfunctions
Interpretation:
- Low outlier rate (< 2%): Normal variation
- Moderate outliers (2-5%): Some concerns
- High outliers (> 5%): Significant issues
Action Items:
- Investigate high-outlier sensors
- Check sensor calibration
- Review operating conditions
- Consider sensor replacement
Accuracy Metrics:
- Sensor Name: Sensor with accuracy issues
- Issues Percentage: Percentage of inaccurate readings
- Threshold Type: Type of threshold violation
- Low/High Threshold: Threshold values
Accuracy Assessment:
- Based on threshold violations
- Identifies readings outside valid ranges
- Helps assess sensor accuracy
Interpretation:
- Low issue rate (< 5%): Good accuracy
- Moderate issues (5-15%): Some concerns
- High issues (> 15%): Poor accuracy
Action Items:
- Check sensor calibration
- Verify threshold settings
- Review sensor placement
- Consider recalibration
Correlation Information:
- Strong Correlations: Pairs of highly correlated sensors
- Correlation Coefficient: Strength of relationship (-1 to +1)
- Correlation Matrix: Full correlation matrix
Correlation Interpretation:
- +1.0: Perfect positive correlation
- +0.7 to +1.0: Strong positive correlation
- -0.7 to -1.0: Strong negative correlation
- -0.3 to +0.3: Weak correlation
Use Cases:
- Find related sensors
- Understand sensor dependencies
- Detect redundant measurements
- Identify sensor groups
The system provides an overall quality assessment based on:
- Completeness metrics
- Accuracy metrics
- Consistency metrics
- Outlier rates
Quality Levels:
- Excellent: All metrics within acceptable ranges
- Good: Minor issues, generally acceptable
- Fair: Some concerns, monitor closely
- Poor: Significant issues, take action
Summary Section:
- Overall quality score
- Key findings
- Recommendations
Detailed Metrics:
- Per-sensor statistics
- Completeness details
- Accuracy details
- Consistency details
Visualizations:
- Charts showing quality metrics
- Comparisons across sensors
- Trends over time (if date filtered)
- Regular Assessment: Check quality periodically
- Set Standards: Define acceptable quality levels
- Monitor Trends: Track quality over time
- Take Action: Address identified issues
- Document Findings: Record quality assessments
- Compare Periods: Use date filters to compare
- Select Appropriate Date Range: Focus on relevant time period
- Ensure Data Loaded: Verify data is available
- Understand Context: Know expected quality levels
- Review All Sections: Check all quality metrics
- Identify Issues: Note sensors with problems
- Compare Sensors: Look for patterns
- Document Findings: Record quality scores
- Prioritize Issues: Focus on most critical problems
- Plan Improvements: Address identified issues
- Track Progress: Re-assess after improvements
Symptoms: High missing percentage Causes: Sensor malfunctions, data collection issues Actions: Investigate sensors, check data collection system
Symptoms: Many outlier values Causes: Sensor calibration issues, environmental factors Actions: Check calibration, review operating conditions
Symptoms: Many threshold violations Causes: Incorrect thresholds, sensor drift Actions: Verify thresholds, check sensor calibration
Symptoms: Duplicates, inconsistent timestamps Causes: Data collection problems, system issues Actions: Fix data collection, improve validation
Export Options:
- Screenshot: Capture quality assessment results
- Data Export: Export metrics as CSV (if available)
- Report: Generate quality report document
Problem: Quality assessment is slow Solutions:
- Use date range filters
- Reduce data volume
- Check backend performance
- Verify database connection
Problem: Results don't match expectations Solutions:
- Verify date range
- Check sensor selection
- Review data source
- Compare with other analyses
Problem: Some metrics not available Solutions:
- Ensure sufficient data
- Check sensor metadata
- Verify threshold definitions
- Review error messages
After quality assessment:
- Address Issues: Fix identified quality problems
- Monitor Progress: Re-assess after improvements
- Visualize Data: Use Data Visualization Guide
- Chat with Agent: Ask DQA Agent about quality
- Missing Values Guide - Missing data analysis
- Invalid Values Guide - Invalid readings analysis
- Data Visualization Guide - Data exploration
For technical details, see the Backend API Documentation.