MiniDXNN — MLP Inference & Training on DirectX 12 with Cooperative Vector

An implementation of MLP (Multi-Layer Perceptron) inference and training using DirectX 12 Cooperative Vector. This library demonstrates GPU-accelerated neural network inference and training with cutting-edge shader features.

🚀 High Performance: GPU-accelerated inference and training using Cooperative Vector
🔧 Flexible Architecture: Configurable layers, activations, and data types
🎯 Single-header HLSL: Easy to integrate into any DX12 project

Requirements

OS: Windows 11 with Developer Mode enabled
GPU: Supports Shader Model 6.9 and Cooperative Vector in D3D12 (AMD Radeon™ RX 9000 Series GPUs or equivalent NVIDIA)
Build: CMake ≥ 3.21, Visual Studio 2022 (C++20), Windows SDK
DX12 Runtime: Agility SDK 1.717.1-preview, DXC v1.8.2505.1
Python: Python 3.8+ with PyTorch (optional, for example python training)

Getting Started

# Clone with submodules (gfx, GoogleTest, CLI11)
git clone --recursive https://github.com/GPUOpen-LibrariesAndSDKs/MiniDXNN.git
cd MiniDXNN

# Build (library + examples)
cmake -B build
cmake --build build --config Release

# Optional: build & run tests
cmake -B build -DMINIDXNN_BUILD_TESTS=ON
cmake --build build --config Release
cd build/unittest && ctest -C Release

Example binaries are output to build/example/Release/. Run them from build/example/ as the working directory. See example/README.md for details.

DX12 Setup

⚠️ Important: As of early 2026, Cooperative Vector requires experimental feature support.

Install a Cooperative Vector supported driver
Enable Experimental Shader Model with D3D12EnableExperimentalFeatures before creating the device
Compile shaders with Shader Model 6.9

HLSL Usage

Include the header-only library in your compute shader:

#include <minidxnn/hlsl/mlp.hlsl>

static const uint NUM_LAYERS = 3;       // total layers (hidden + 1)
static const int  INPUT_DIM  = 2;
static const int  HIDDEN_DIM = 64;
static const int  OUTPUT_DIM = 2;

using LayerData = mininn::InferenceLayerDataRef<
    NUM_LAYERS, HIDDEN_DIM,
    dx::linalg::DATA_TYPE_FLOAT16,
    dx::linalg::MATRIX_LAYOUT_ROW_MAJOR,
    dx::linalg::DATA_TYPE_FLOAT16,      // bias type
    dx::linalg::DATA_TYPE_FLOAT16,      // accumulator type
    mininn::LeakyReluActivation,        // hidden activation
    mininn::SigmoidActivation           // output activation
>;

ByteAddressBuffer g_weights : register(t0);
ByteAddressBuffer g_biases  : register(t1);

[numthreads(32, 1, 1)]
void main(uint3 tid : SV_DispatchThreadID)
{
    LayerData layerData;
    layerData.setWeightData(g_weights, uint2(firstLayerMatSize, hiddenLayerMatSize));
    layerData.setBiasData(g_biases);

    vector<half, INPUT_DIM> input = ...;
    vector<half, OUTPUT_DIM> output;
    mininn::forward(output, input, layerData);
}

See the HLSL API Reference for the full API including training (backward).

C++ Fallback Mode

MiniDXNN can be built without DirectX 12 or GPU dependencies, using a pure C++ fallback path. This is useful for CI, unit testing, or platforms without DX12 support (e.g. Linux).

# Build with CPU fallback only (no GPU/DX12 required)
cmake -B build \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DMINIDXNN_CPP_FALLBACK_ONLY=ON \
  -DMINIDXNN_BUILD_TESTS=ON \
  -DMINIDXNN_BUILD_EXAMPLES=ON

cmake --build build -j$(nproc)

# Run tests
cd build && ctest --output-on-failure

How It Works

The CPU fallback compiles include/minidxnn/hlsl/mlp.hlsl as standard C++ by providing HLSL-compatible shims in include/minidxnn/cpp/hlsl_compat.hpp. This header maps HLSL intrinsics (vector, ByteAddressBuffer, dx::linalg::*) to C++ equivalents. The half-precision type is configurable via the MINIDXNN_CPP_FALLBACK_HALF_TYPE CMake compile definition (defaults to half_float::half).

Project Structure

MiniDXNN/
├── include/minidxnn/
│   ├── hlsl/mlp.hlsl              # Header-only HLSL library: MLP forward & backward
│   └── cpp/hlsl_compat.hpp         # HLSL → C++ shim for CPU-only builds
├── docs/                          # Documentation
│   └── mlp_hlsl.md                #   HLSL API reference
├── example/                       # Example applications
│   ├── common/                    #   Shared C++ utilities
│   │   ├── mlp_layer.hpp          #     MLP layer data, CPU forward/backward, weight init
│   │   ├── cpp_fallback.hpp       #     C++ fallback infrastructure (buffer packing, etc.)
│   │   ├── optimizer.hpp          #     Optimizer implementations (SGD, Adam, Lion)
│   │   ├── activation.hpp         #     Activation functions (Identity–LeakyReLU + Tanh)
│   │   ├── loss.hpp               #     Loss functions (MSE)
│   │   └── ...                    #     GPU helpers, image I/O, textures, matrix, RNG
│   ├── kernel/                    #   HLSL compute shaders
│   │   ├── optimizer.hlsl         #     GPU optimizer kernels (SGD, Adam, Lion)
│   │   └── ...                    #     Training, inference, and shared kernel headers
│   ├── 01_texture_inference/      #   Inference from a pre-trained MLP binary
│   ├── 02_texture_training/       #   On-GPU training + reconstruction
│   └── README.md                  #   Example documentation
├── scripts/pyreference/           # Python reference implementation
│   ├── texture_reconstruction_mlp.py  # Train & export MLP model
│   └── xoshiro128p.py                 # RNG matching C++ xoshiro128+
├── unittest/                      # GoogleTest unit tests
├── third_party/                   # Submodules & vendored deps
├── cmake/                         # CMake scripts
└── tools/natvis/                  # Debugging helpers (Visual Studio natvis)

Features

Category	Details
Architecture	MLP with 0–N hidden layers, independent input/hidden/output dimensions
Operations	Forward pass (inference), backward pass (training with gradient accumulation)
Activations	Identity, Sigmoid, ReLU, Leaky ReLU (custom activations supported — e.g. Tanh)
Data type	float16 (`DATA_TYPE_FLOAT16`) — currently the only tested type
Matrix layout	Row-major (`MATRIX_LAYOUT_ROW_MAJOR`) — currently the only tested layout

Examples

#	Name	Description
01	Texture Inference	Load a pre-trained MLP binary and reconstruct a texture on the GPU
02	Texture Training	Train an MLP on-GPU to learn a 2D texture pattern, then reconstruct it

See example/README.md for step-by-step instructions.

Documentation

HLSL API Reference — mlp.hlsl types, functions, and memory layout
Example Guide — building and running the examples
Cooperative Vector Spec — HLSL specification
D3D12 Cooperative Vector Blog — overview and getting started

License

MIT License — see LICENSE.

Third-Party Notices

Half-precision floating-point library — MIT
gfx — MIT
CLI11 — BSD-3-Clause
GoogleTest — BSD-3-Clause

See NOTICE.md for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiniDXNN — MLP Inference & Training on DirectX 12 with Cooperative Vector

Requirements

Getting Started

DX12 Setup

HLSL Usage

C++ Fallback Mode

How It Works

Project Structure

Features

Examples

Documentation

License

Third-Party Notices

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
build		build
cmake		cmake
docs		docs
example		example
include/minidxnn		include/minidxnn
scripts/pyreference		scripts/pyreference
third_party		third_party
tools/natvis		tools/natvis
unittest		unittest
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

MiniDXNN — MLP Inference & Training on DirectX 12 with Cooperative Vector

Requirements

Getting Started

DX12 Setup

HLSL Usage

C++ Fallback Mode

How It Works

Project Structure

Features

Examples

Documentation

License

Third-Party Notices

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages