Skip to content

deepanshu-Raj/CUDA-AKMeans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CUDA-Accelerated Multi-Restart K-Means Clustering

This project implements a CUDA-accelerated multi-restart K-Means clustering system. It compares a CPU baseline against two GPU execution modes: serial GPU restarts and CUDA-streamed GPU restarts. The main GPU computation is implemented using custom CUDA kernels for point assignment, shared-memory partial aggregation, centroid update, and objective computation.

For the full project explanation, design decisions, benchmark results, correctness validation, Nsight Systems/Compute profiling, and scalability discussion, please see: Project Report

Build and Run

make clean
make
./bin/kmeans_streams --n 100000 --dim 8 --k 8 --iters 20 --restarts 4 --streams 4

Reproduce Benchmark Results

./scripts/run_project.sh standard

Obtain the plots:

pip install matplotlib
python3 scripts/plot_results.py

The main benchmark outputs are written to:

results/timing_all.csv
results/derived_speedups.csv
results/plots/

Profiling

Nsight Systems:

./scripts/profile_nsys.sh

Nsight Compute:

./scripts/profile_ncu.sh

Profiling outputs are written to:

results/profile/

Notes

The dataset is synthetically generated by the program and written to data/points.bin before being read back for CPU/GPU execution. The report PDF contains the main analysis and should be used as the primary writeup.

About

CUDA-accelerated multi-restart K-Means clustering with custom kernels and CUDA-stream concurrency, benchmarked against a CPU baseline (~22–55× speedup on a Tesla T4).

Topics

Resources

Stars

Watchers

Forks

Contributors