Skip to content

GPUOpen-LibrariesAndSDKs/MiniDXNN

Repository files navigation

MiniDXNN — MLP Inference & Training on DirectX 12 with Cooperative Vector

CMake build on Windows

An implementation of MLP (Multi-Layer Perceptron) inference and training using DirectX 12 Cooperative Vector. This library demonstrates GPU-accelerated neural network inference and training with cutting-edge shader features.

  • 🚀 High Performance: GPU-accelerated inference and training using Cooperative Vector
  • 🔧 Flexible Architecture: Configurable layers, activations, and data types
  • 🎯 Single-header HLSL: Easy to integrate into any DX12 project

Requirements

  • OS: Windows 11 with Developer Mode enabled
  • GPU: Supports Shader Model 6.9 and Cooperative Vector in D3D12 (AMD Radeon™ RX 9000 Series GPUs or equivalent NVIDIA)
  • Build: CMake ≥ 3.21, Visual Studio 2022 (C++20), Windows SDK
  • DX12 Runtime: Agility SDK 1.717.1-preview, DXC v1.8.2505.1
  • Python: Python 3.8+ with PyTorch (optional, for example python training)

Getting Started

# Clone with submodules (gfx, GoogleTest, CLI11)
git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDXNN.git
cd MiniDXNN

# Build (library + examples)
cmake -B build
cmake --build build --config Release

# Optional: build & run tests
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release
cd build/unittest && ctest -C Release

Example binaries are output to build/example/Release/. Run them from build/example/ as the working directory. See example/README.md for details.

DX12 Setup

⚠️ Important: As of early 2026, Cooperative Vector requires experimental feature support.

  1. Install a Cooperative Vector supported driver
  2. Enable Experimental Shader Model with D3D12EnableExperimentalFeatures before creating the device
  3. Compile shaders with Shader Model 6.9

HLSL Usage

Include the header-only library in your compute shader:

#include <minidxnn/hlsl/mlp.hlsl>

static const uint NUM_LAYERS = 3;       // total layers (hidden + 1)
static const int  INPUT_DIM  = 2;
static const int  HIDDEN_DIM = 64;
static const int  OUTPUT_DIM = 2;

using LayerData = mininn::InferenceLayerDataRef<
    NUM_LAYERS, HIDDEN_DIM,
    dx::linalg::DATA_TYPE_FLOAT16,
    dx::linalg::MATRIX_LAYOUT_ROW_MAJOR,
    dx::linalg::DATA_TYPE_FLOAT16,      // bias type
    dx::linalg::DATA_TYPE_FLOAT16,      // accumulator type
    mininn::LeakyReluActivation,        // hidden activation
    mininn::SigmoidActivation           // output activation
>;

ByteAddressBuffer g_weights : register(t0);
ByteAddressBuffer g_biases  : register(t1);

[numthreads(32, 1, 1)]
void main(uint3 tid : SV_DispatchThreadID)
{
    LayerData layerData;
    layerData.setWeightData(g_weights, uint2(firstLayerMatSize, hiddenLayerMatSize));
    layerData.setBiasData(g_biases);

    vector<half, INPUT_DIM> input = ...;
    vector<half, OUTPUT_DIM> output;
    mininn::forward(output, input, layerData);
}

See the HLSL API Reference for the full API including training (backward).

C++ Fallback Mode

MiniDXNN can be built without DirectX 12 or GPU dependencies, using a pure C++ fallback path. This is useful for CI, unit testing, or platforms without DX12 support (e.g. Linux).

# Build with CPU fallback only (no GPU/DX12 required)
cmake -B build \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DMINIDXNN_CPP_FALLBACK_ONLY=ON \
  -DMINIDXNN_BUILD_TESTS=ON \
  -DMINIDXNN_BUILD_EXAMPLES=ON

cmake --build build -j$(nproc)

# Run tests
cd build && ctest --output-on-failure

How It Works

The CPU fallback compiles include/minidxnn/hlsl/mlp.hlsl as standard C++ by providing HLSL-compatible shims in include/minidxnn/cpp/hlsl_compat.hpp. This header maps HLSL intrinsics (vector, ByteAddressBuffer, dx::linalg::*) to C++ equivalents. The half-precision type is configurable via the MINIDXNN_CPP_FALLBACK_HALF_TYPE CMake compile definition (defaults to half_float::half).

Project Structure

MiniDXNN/
├── include/minidxnn/
│   ├── hlsl/mlp.hlsl              # Header-only HLSL library: MLP forward & backward
│   └── cpp/hlsl_compat.hpp         # HLSL → C++ shim for CPU-only builds
├── docs/                          # Documentation
│   └── mlp_hlsl.md                #   HLSL API reference
├── example/                       # Example applications
│   ├── common/                    #   Shared C++ utilities
│   │   ├── mlp_layer.hpp          #     MLP layer data, CPU forward/backward, weight init
│   │   ├── cpp_fallback.hpp       #     C++ fallback infrastructure (buffer packing, etc.)
│   │   ├── optimizer.hpp          #     Optimizer implementations (SGD, Adam, Lion)
│   │   ├── activation.hpp         #     Activation functions (Identity–LeakyReLU + Tanh)
│   │   ├── loss.hpp               #     Loss functions (MSE)
│   │   └── ...                    #     GPU helpers, image I/O, textures, matrix, RNG
│   ├── kernel/                    #   HLSL compute shaders
│   │   ├── optimizer.hlsl         #     GPU optimizer kernels (SGD, Adam, Lion)
│   │   └── ...                    #     Training, inference, and shared kernel headers
│   ├── 01_texture_inference/      #   Inference from a pre-trained MLP binary
│   ├── 02_texture_training/       #   On-GPU training + reconstruction
│   └── README.md                  #   Example documentation
├── scripts/pyreference/           # Python reference implementation
│   ├── texture_reconstruction_mlp.py  # Train & export MLP model
│   └── xoshiro128p.py                 # RNG matching C++ xoshiro128+
├── unittest/                      # GoogleTest unit tests
├── third_party/                   # Submodules & vendored deps
├── cmake/                         # CMake scripts
└── tools/natvis/                  # Debugging helpers (Visual Studio natvis)

Features

Category Details
Architecture MLP with 0–N hidden layers, independent input/hidden/output dimensions
Operations Forward pass (inference), backward pass (training with gradient accumulation)
Activations Identity, Sigmoid, ReLU, Leaky ReLU (custom activations supported — e.g. Tanh)
Data type float16 (DATA_TYPE_FLOAT16) — currently the only tested type
Matrix layout Row-major (MATRIX_LAYOUT_ROW_MAJOR) — currently the only tested layout

Examples

# Name Description
01 Texture Inference Load a pre-trained MLP binary and reconstruct a texture on the GPU
02 Texture Training Train an MLP on-GPU to learn a 2D texture pattern, then reconstruct it

See example/README.md for step-by-step instructions.

Documentation

License

MIT License — see LICENSE.

Copyright (c) 2026 Advanced Micro Devices, Inc. All rights reserved.

Third-Party Notices

See NOTICE.md for details.


About

An implementation of MLP using DX12 cooperative vector.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors