Skip to content

AstroAirafar/user-behavior-anomaly-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ User Behavior Anomaly Detection using Machine Learning

Overview

This project implements an end-to-end, production-style User Behavior Anomaly Detection (UBA) system using unsupervised machine learning.

The system learns normal user behavior patterns from historical activity data and identifies anomalous user events that deviate significantly from learned baselines — without relying on labeled anomaly data.

It is designed and implemented as a deployable inference service, not just a notebook experiment.

Live Deployment:


Problem Statement

Modern digital systems generate continuous streams of user activity events such as logins, clicks, session durations, and access times. Rule-based monitoring systems struggle to detect subtle, previously unseen behavioral anomalies, especially when labeled attack data is unavailable.

Goal: Build a data-driven system that learns normal behavior and flags anomalous user activity using unsupervised ML, exposed via a production-ready API.


Key Features

  • Unsupervised anomaly detection (no labels required)
  • Behavioral feature engineering (event-level + user-level)
  • Isolation Forest–based anomaly scoring
  • Percentile-based anomaly thresholding
  • REST API for real-time inference
  • Fully Dockerized Linux deployment
  • HTTPS-enabled reverse proxy (Nginx)
  • Clean, modular, industry-grade project structure

System Architecture

User Events (CSV / JSON)
        ↓
Feature Engineering
        ↓
StandardScaler
        ↓
Isolation Forest Model
        ↓
Anomaly Score
        ↓
Thresholding Logic
        ↓
FastAPI Inference Service
        ↓
Docker Container
        ↓
Nginx (HTTPS)

Dataset

Dataset Characteristics

  • ~70,000 user activity events
  • ~500 users
  • ~21 days of activity
  • ~2% embedded anomalous behavior
  • Synthetic but realistic SaaS-style data

Event Schema

Each row represents a single user activity event:

Field Description
user_id Unique user identifier
event_type login, logout, view, click, download
session_duration_sec Duration of user session
events_in_session Number of actions in the session
hour_of_day 0–23
day_of_week 0–6
device_type desktop / mobile

⚠️ No anomaly labels are used during training.


Feature Engineering Strategy

Event-Level Features

  • Session duration
  • Events per session
  • Time-of-day
  • Day-of-week
  • Encoded event type
  • Encoded device type

User-Level Behavioral Baselines

Computed per user and merged back:

  • Average session duration
  • Session duration variability
  • Average events per session
  • Typical active hours

Deviation Features (Key Insight)

  • Session duration deviation from user baseline
  • Event count deviation
  • Time-of-day deviation

These features allow the model to detect contextual anomalies, not just global outliers.


Model

Algorithm

  • Isolation Forest (unsupervised)

Why Isolation Forest?

  • No labeled anomalies required
  • Designed for rare event detection
  • Scales well to large datasets
  • Widely used in industry UBA systems

Training Details

  • Feature scaling with StandardScaler
  • ~200 trees
  • Contamination ≈ 2%
  • Model outputs continuous anomaly scores

Anomaly Decision Logic

Instead of relying directly on model labels, the system uses:

  • Percentile-based thresholding on anomaly scores
  • Example: top 2% most anomalous events

This mirrors real production systems, where thresholds are configurable based on operational tolerance.


API Design

Endpoint

POST /predict

Request Example

{
  "user_id": 123,
  "event_type": "login",
  "session_duration_sec": 18000,
  "events_in_session": 190,
  "hour_of_day": 2,
  "day_of_week": 1,
  "device_type": "desktop"
}

Response Example

{
  "anomaly_score": 0.048707,
  "is_anomaly": true,
  "threshold_percentile": 0.98
}

Additional Endpoint

GET /health

Running the Project

1️⃣ Local Setup (Without Docker)

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
uvicorn api.main:app --reload

Open:

http://127.0.0.1:8000/docs

2️⃣ Run with Docker (Recommended)

Build Image

docker build -t user-behavior-anomaly-api .

Run Container

docker run -p 8000:8000 user-behavior-anomaly-api

Open:

http://localhost:8000/docs

Production Deployment

  • Cloud Provider: DigitalOcean
  • OS: Ubuntu LTS
  • Containerization: Docker
  • Reverse Proxy: Nginx
  • TLS: Let’s Encrypt (manual DNS challenge)
  • Restart Policy: unless-stopped

The service is exposed securely at:

https://api.alturawing.tech

Project Structure

user-behavior-anomaly-detection/
├── api/                # FastAPI inference service
├── artifacts/          # Trained model & scaler (ignored by Git)
├── data/               # Raw & processed datasets
├── notebooks/          # EDA & training notebooks
├── src/
│   ├── features/       # Feature engineering logic
│   ├── models/         # Training & prediction utilities
│   └── utils/          # Dataset generation
├── Dockerfile
├── requirements.txt
└── README.md

Limitations

  • Synthetic dataset (not real production logs)
  • Single-event inference (no batch API yet)
  • No real-time streaming ingestion
  • Threshold calibrated offline

These are intentional design choices for clarity and focus.


Future Improvements

  • Batch inference endpoint
  • User-adaptive thresholds
  • Time-window aggregation
  • Streaming ingestion (Kafka)
  • Model monitoring & drift detection
  • Multi-project hosting strategy

Why This Project Matters

This project demonstrates the ability to:

  • Build ML systems without labeled data
  • Engineer meaningful behavioral features
  • Balance ML performance with operational constraints
  • Deploy models as real services
  • Think beyond notebooks into production systems

It is designed to be resume-ready, interview-defensible, and extensible.


Author

AlturaWing

About

Unsupervised ML system detecting anomalous user activity, deployed as a FastAPI service with Docker and Nginx.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors