isrmltoolkit is a Python toolkit designed to detect, handle, and visualize outliers in real-world datasets. Built for data scientists and ML practitioners, it combines statistical methods and machine learning algorithms into a simple, unified workflow.
- โ Clean your data
- โ Improve model performance
- โ Understand anomalies
A robust statistical method based on data dispersion, resistant to extreme values.
๐ Values outside these bounds are flagged as outliers.
Measures how far a value deviates from the mean in terms of standard deviations.
- X: Observed value
-
$\mu$ : Mean -
$\sigma$ : Standard deviation
๐ Typically,
$|Z| > 3$ indicates an outlier.
Captures multivariate outliers by considering feature correlations.
-
$\Sigma^{-1}$ : Inverse covariance matrix
๐ High distance values indicate anomalous observations in multi-dimensional space.
A tree-based ML algorithm that isolates anomalies instead of profiling normal data.
- Uses random feature splits.
-
Fewer splits
$\rightarrow$ higher anomaly likelihood. - ๐ Highly efficient for high-dimensional datasets.
Reduces the impact of extreme values without removing them by capping them at specific percentiles.
๐ Useful for stabilizing distributions and improving model robustness.
- ๐ง Multiple Strategies: Diverse outlier detection methods in one place.
- โก Real-world Ready: Designed for noisy, complex datasets.
- ๐ Pipeline Friendly: Built for both exploratory analysis and preprocessing.
- ๐ Evolving: Actively updated with new features and algorithms.
pip install isrmltoolkit