Skip to content

gecarval/MachineLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

135 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

REPOSITORY STATS

GitHub code size in bytes Code language count GitHub top language GitHub last commit

MachineLearning Project

This repository contains from-scratch implementations of core machine learning techniques — Perceptron, Neural Networks, and Convolutional Neural Networks (CNNs) — written in C and C++ with real-time interactive visualizations powered by Raylib and Dear ImGui. The project is both an educational reference and a working sandbox for experimenting with machine learning algorithms at a low level, understanding their mathematical foundations, and seeing them train in real time.

The CNN is trained on the MNIST handwritten digit dataset and can classify digits you draw yourself on a 28×28 pixel canvas — live, inside the app.


Table of Contents


Features

  • Perceptron — Binary linear classifier with real-time training visualization on 2D point clouds
  • Fully Connected Neural Network — Multi-layer network with configurable depth, customizable activation functions (Sigmoid, Tanh, ReLU, LeakyReLU, SiLU, Linear, Step, Softmax), backpropagation, gradient clamping, and JSON save/load
  • Convolutional Neural Network (CNN) — Multi-layer CNN following LeCun et al. (1998) design, trained on MNIST, with He (Kaiming) weight initialization, 2×2 max-pooling with argmax routing for backprop, and a fully connected classifier head
  • Custom DMatrix library — Cache-optimized blocked matrix multiplication, convolution (valid, half-padded, full-padded), cross-correlation (kernelMult), max/average pooling, transpose, and element-wise operations
  • Interactive drawing canvas — 28×28 grid with Gaussian brush falloff mimicking MNIST ink diffusion, for live digit inference
  • Filter visualizer — Renders intermediate convolutional feature maps per layer and filter inside the app
  • JSON serialization — Save and restore full network state (weights, biases, hyperparameters) to/from .json files
  • Python MNIST preprocessor — Converts raw IDX binary files to organized traindata/0–9/ PNG directories

Project Structure

MachineLearning/
├── classes/
│   ├── Perceptron.hpp / .cpp       # Binary linear classifier
│   ├── NeuralNetwork.hpp / .cpp    # Fully connected network + activation functions
│   ├── CNN.hpp / .cpp              # Convolutional Neural Network
│   ├── DMatrix.hpp / .cpp          # Custom matrix library
│   ├── JsonParser.hpp              # JSON utility helpers
│   └── ui/
│       └── Button.hpp / .cpp       # UI button component (Raylib)
├── srcs/
│   ├── main.cpp                    # Application entry point
│   └── states/
│       ├── mainMenu.cpp            # Main menu state
│       ├── perceptronState.cpp     # Perceptron visualization state
│       ├── neuralNetworkState.cpp  # Neural Network interactive state
│       └── CNNState.cpp            # CNN training & inference state
├── includes/
│   ├── Machine.hpp                 # Global app state header
│   ├── raylib/                     # Raylib graphics library
│   └── imgui/                     # Dear ImGui + rlImGui bindings
├── python_db/
│   ├── image.py                    # MNIST IDX → PNG converter
│   ├── train-images.idx3-ubyte     # MNIST training images (60,000)
│   └── t10k-images.idx3-ubyte      # MNIST test images (10,000)
├── LICENSE
└── README.md

Mathematical Foundations

The Perceptron

The Perceptron, introduced by Frank Rosenblatt in 1958, is the simplest model of a biological neuron and the foundational building block of all neural networks.

How it works:

Given an input vector x = (x₁, x₂, …, xₙ) and a corresponding weight vector w = (w₁, w₂, …, wₙ) plus a bias b, the perceptron computes a weighted sum and passes it through a step function:

z = w · x + b = Σ wᵢxᵢ + b

ŷ = step(z) = { 1  if z ≥ 0
              { 0  otherwise

Training — the Perceptron Learning Rule:

For each misclassified sample, weights and bias are updated proportionally to the error:

wᵢ ← wᵢ + η · (y - ŷ) · xᵢ
b  ← b  + η · (y - ŷ)

where η is the learning rate (LEARNRATE = 0.0000001f in this implementation). The perceptron is guaranteed to converge if the data is linearly separable.

In this project, the Perceptron operates in 2D (each input is a Vector2) and can be trained interactively with keyboard controls. The decision boundary is drawn live on screen.

Key insight: A single perceptron can only separate linearly separable classes. The famous failure case — XOR — cannot be solved by one perceptron. This limitation directly motivated the development of multi-layer networks.


Neural Networks & Backpropagation

A Neural Network stacks multiple layers of perceptrons (neurons), enabling it to learn non-linear decision boundaries. This implementation supports configurable depth (number of hidden layers), hidden size, and output size.

Forward Pass:

For each layer l, the network computes a pre-activation and then applies a nonlinearity:

z⁽ˡ⁾ = W⁽ˡ⁾ · a⁽ˡ⁻¹⁾ + b⁽ˡ⁾
a⁽ˡ⁾ = f( z⁽ˡ⁾ )

where f is the chosen activation function, W⁽ˡ⁾ is the weight matrix, and b⁽ˡ⁾ is the bias vector.

Loss:

The network uses mean squared error (MSE) as the loss function:

L = (1/n) · Σ (yᵢ - ŷᵢ)²

Backpropagation:

Backpropagation computes the gradient of the loss with respect to every weight using the chain rule. Starting from the output layer and moving backward:

δ⁽ᴸ⁾ = ∇ₐL ⊙ f'( z⁽ᴸ⁾ )

δ⁽ˡ⁾ = (W⁽ˡ⁺¹⁾)ᵀ · δ⁽ˡ⁺¹⁾ ⊙ f'( z⁽ˡ⁾ )

Weights and biases are updated via gradient descent:

W⁽ˡ⁾ ← W⁽ˡ⁾ - η · δ⁽ˡ⁾ · (a⁽ˡ⁻¹⁾)ᵀ
b⁽ˡ⁾ ← b⁽ˡ⁾ - η · δ⁽ˡ⁾

Gradient safety: This implementation includes clampGradient() (caps gradient magnitude at ±100) and errorTolerance() (zeros out gradients below 1e-4) to prevent exploding gradients and numerical instability during long training runs.

JSON persistence: NeuralNetwork::serialize() and NeuralNetwork::deserialize() dump/load the full weight and bias tensors to/from a JSON file, so you can save a trained network and resume later.


Activation Functions

This project implements a complete suite of activation functions and their exact analytical derivatives, used interchangeably for hidden and output layers:

Function Formula Derivative Notes
Sigmoid 1 / (1 + e^−x) σ(x)(1 − σ(x)) Classic, saturates at extremes
Tanh (e^x − e^−x) / (e^x + e^−x) 1 − tanh²(x) Zero-centered, preferred over Sigmoid
ReLU max(0, x) 1 if x > 0 else 0 Sparse, fast, may "die"
LeakyReLU x if x > 0 else x/100 1 or 0.01 Fixes dying ReLU
SiLU x · σ(x) σ(x)(1 + x(1 − σ(x))) Smooth, used in modern nets
Linear x 1 Output layer for regression
Step 1 if x ≥ 0 else 0 1 (approx) Perceptron classifier
Softmax e^xᵢ / Σ e^xⱼ Jacobian Multi-class probability output

All functions include NaN/Inf guards and input clamping to prevent numerical blow-up during training.

Choosing the right activation:

  • Hidden layers: LeakyReLU or Tanh work well for most tasks
  • Output for binary classification: Sigmoid
  • Output for multi-class: Softmax
  • Output for regression: Linear

Matrix Operations (DMatrix)

All computations in the neural and convolutional layers are built on the custom DMatrix class — a row-major 2D float matrix with a rich operation set.

Matrix multiplication is implemented in three variants:

  • operator* — Standard O(n³) algorithm
  • multiplyVectorized — SIMD-friendly inner loop ordering
  • multiplyOptimizedCache-blocked multiplication with BLOCK_SIZE = 64, dramatically improving cache hit rates for large matrices

The cache-blocked approach divides matrices into 64×64 tiles that fit in L1/L2 cache, reducing cache misses during the innermost loop — a standard high-performance computing technique.

Convolution & cross-correlation:

The library provides three convolution modes (all implemented directly on DMatrix):

Method Padding Output size Use in CNN
convolve() None (valid) (H−k+1) × (W−k+1) Shrinks feature maps
convolveHalfPadded() Same H × W Preserves spatial size
convolveFullPadded() Full (H+k−1) × (W+k−1) Backprop gradient routing
kernelMult() None (valid) (H−k+1) × (W−k+1) Forward pass cross-correlation

Note: kernelMult (cross-correlation, no kernel flip) is the industry standard used by PyTorch nn.Conv2d and TensorFlow Conv2D. convolveFullPadded (flips the kernel) is used in backprop for mathematically correct gradient computation — see Goodfellow et al., Deep Learning, Ch. 9.

Pooling:

  • maxPooling(poolSize) — Standard max pooling, reduces spatial size by poolSize
  • averagePooling(poolSize) — Average pooling
  • maxPoolingArgmax(poolSize, argmax) — Max pooling that records the argmax index of each pooling window, used during CNN training so backprop can route gradients exactly back to the winning neuron via maxPoolingUnpool

Convolutional Neural Networks

The ConvNeuralNetwork class implements a full CNN following the design of LeCun et al. (1998) (the original LeNet architecture), with modern conventions from PyTorch/TensorFlow.

Architecture:

Input Image (28×28, 1 channel)
        │
        ▼
┌───────────────────┐
│  Conv Layer 0     │  numFilters output channels, kernelSize×kernelSize kernels
│  + LeakyReLU      │
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  Conv Layer 1..N  │  stacked conv layers (numConvLayers total)
│  + LeakyReLU      │
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  2×2 Max Pooling  │  argmax recorded for backprop gradient routing
└───────────────────┘
        │
        ▼
┌───────────────────┐
│     Flatten       │  all filter maps → 1D vector
└───────────────────┘
        │
        ▼
┌───────────────────┐
│  Fully Connected  │  hiddenNodes × hiddenLayerLen → outputNodes (0–9)
│  NeuralNetwork    │
└───────────────────┘
        │
        ▼
  Class Predictions

Kernel layout: kernels[layer][filter][inChannel]

  • Layer 0: inChannels = 1 (grayscale image)
  • Layer l > 0: inChannels = numFilters (output of previous layer)

Weight initialization: He (Kaiming) uniform — N(0, sqrt(2 / fanIn)), where fanIn = k² × inChannels. This initialization is specifically designed for ReLU-family activations and prevents vanishing/exploding gradients at startup.

Forward pass per conv layer (cross-correlation):

z[l][f]  =  Σ_c  crossCorrelate( input[l][c],  K[l][f][c] )  +  b[l][f]
a[l][f]  =  LeakyReLU( z[l][f] )

Backpropagation through conv layers (deepest → shallowest):

δ[f]               =  errorMap[f]  ⊙  DLeakyReLU( z[l][f] )

dL/dK[l][f][c]     =  crossCorrelate( input[l][c],  δ[f] )

dL/db[l][f]        =  sum( δ[f] )

dL/dinput[l][c]    =  Σ_f  fullConvolution( δ[f],  K[l][f][c] )

The full convolution (with kernel flip) in the input gradient is mathematically required for correct backprop through a cross-correlation forward pass.

MNIST training canvas:

The CNN state features a 28×28 interactive drawing canvas with a Gaussian brush (BRUSH_SIGMA = 0.60f, BRUSH_RADIUS = 1) that simulates the ink diffusion characteristics of MNIST handwriting samples. Draw a digit, and the CNN classifies it live. You can also visualize the intermediate feature maps produced by each conv filter at each layer.


Installation

Dependencies:

  • C++17 compiler (g++ or clang++)
  • Raylib (bundled as static library in includes/raylib/)
  • Dear ImGui + rlImGui bindings (bundled in includes/imgui/)
  • nlohmann/json (bundled as includes/json.hpp)
  • Python 3 + numpy + Pillow + matplotlib (for MNIST preprocessing only)

Build & Run:

# Clone the repository
git clone https://github.com/gecarval/MachineLearning.git
cd MachineLearning

# Compile
make

# Run
./machinelearn

MNIST preprocessing (optional — generates PNG training images):

cd python_db
python3 image.py
# Generates traindata/0/ through traindata/9/ with labeled PNG images

The image.py script reads the raw MNIST IDX binary format, decodes the 16-byte header (magic number, image count, rows, cols), and saves each image to traindata/<label>/image_XXXXX.png, organized by digit class.


Controls

Perceptron (Space from main menu)

Key / Button Action
Space Enter Perceptron mode
T Train on current point set
W / A / S / D Move view (Up / Left / Down / Right)
Esc Exit to menu

Neural Network (Enter from main menu)

Key / Button Action
LMB Place red point (class 0)
RMB Place green point (class 1)
Ctrl+Z Undo last point
D Delete all points
S Save Neural Network to JSON
L Load Neural Network from JSON
R Reset Neural Network (randomize weights)
Increase learning rate
Decrease learning rate
Esc Exit to menu

CNN (from main menu)

Key / Button Action
Left-click drag Paint digit on canvas
Right-click drag Erase from canvas
C Clear canvas
S Save CNN model to JSON
L Load CNN model from JSON
Esc Exit to menu

MNIST & Training Data

The MNIST database (Modified National Institute of Standards and Technology) is the canonical benchmark dataset for handwritten digit recognition. It contains:

  • 60,000 training images (train-images.idx3-ubyte)
  • 10,000 test images (t10k-images.idx3-ubyte)
  • Each image is 28×28 pixels, grayscale, with pixel values in [0, 255]
  • Labels are digits 0–9

The IDX file format stores images as raw binary with a 16-byte header (magic number, count, rows, cols), followed by raw uint8 pixel data. The included python_db/image.py script handles parsing and extraction.

The CNN in this project uses these images as training data. Each image is normalized to [0.0, 1.0] float values before being fed into the network.


References & Learning Resources

Video Lectures

Books

Articles

Papers

  • LeCun et al. (1998) — Gradient-Based Learning Applied to Document Recognition — The original LeNet paper that defined CNN architecture for digit recognition on MNIST
  • He et al. (2015) — Delving Deep into Rectifiers — The paper behind He (Kaiming) weight initialization, used in this project's CNN

License

See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages