A beginner course for learning machine learning as a translation problem:
plain English <-> algebra <-> Rust newtypes <-> composable maps
The goal is not to memorize symbols. The goal is to learn how to read formulas as programs, how to read Rust code as precise mathematical structure, and how to use category-theory intuition to see models as composed maps between meaningful types.
For the canonical curriculum layout, see lessons/Course Structure. For the recommended learner route, see Learning Path. For the object/map progression across lessons, code, and assignments, see Concept Atlas.
- Beginners with little or no machine learning background
- Rust learners who want a concrete reason to use vectors, structs, loops, and functions
- Self-paced learners who want short lessons and small practice steps
- Read Learning Path.
- Read The Learning Lens.
- Keep Concept Atlas open as the object/map guide.
- Read 01 Foundations.
- Continue with 02 Vectors.
- Continue with 03 Neuron.
- Continue with 04 Learning.
- Continue with 05 MLP.
- Continue with 06 Attention.
- Continue with 07 Transformer.
- Continue with 08 Language Modeling.
- Continue with 09 Systems.
- Continue with 10 Kernels.
- Continue with 11 Inference.
The repo uses sequential folder numbers even though the curriculum starts at Module 0:
- Course Module 0 -> Repo folder
lessons/01-foundations - Course Module 1 -> Repo folder
lessons/02-vectors - Course Module 2 -> Repo folder
lessons/03-neuron - Course Module 3 -> Repo folder
lessons/04-learning - Course Module 4 -> Repo folder
lessons/05-mlp - Course Module 5 -> Repo folder
lessons/06-attention - Course Module 6 -> Repo folder
lessons/07-transformer - Course Module 7 -> Repo folder
lessons/08-language-modeling - Course Module 8 -> Repo folder
lessons/09-systems - Course Module 9 -> Repo folder
lessons/10-kernels - Course Module 10 -> Repo folder
lessons/11-inference
The orientation file lessons/00-learning-lens.md comes before Module 0 and explains the shared newtype/category-theory lens.
- Lessons index
- Learning Path
- The Learning Lens
- Concept Atlas
- 01 Foundations
- 02 Vectors
- 03 Neuron
- 04 Learning
- 05 MLP
- 06 Attention
- 07 Transformer
- 08 Language Modeling
- 09 Systems
- 10 Kernels
- 11 Inference
- Rust Essentials for a Tiny Neuron
- A Neuron as a Chain of Functions
- Neuron exercises
- Neuron solutions
- Tokens as Vectors in a Sequence
- Query, Key, and Value Roles
- Scores, Weights, and Value Mixing
- Attention exercises
- Attention solutions
- 07 Transformer
- What Problem the Transformer Solves
- Typed Rust Transformer with Expressive Errors
- Transformer Encoder in Small Chunks
- Transformer exercises
- Transformer solutions
- 08 Language Modeling
- From Text To Token IDs
- Next-Token Batches, Loss, And Updates
- The Public Text Boundary
- Language-modeling exercises
- Language-modeling solutions
- 09 Systems
- Shapes, Elements, Bytes, And FLOPs
- Timing, Arithmetic Intensity, And Memory Hierarchy
- The Public Systems Report Boundary
- Systems exercises
- Systems solutions
- 10 Kernels
- Elementwise Maps And Reductions
- Tiling A Matrix-Vector Kernel
- The Public Kernel Report Boundary
- Kernel exercises
- Kernel solutions
- 11 Inference
- Autoregressive Decoding As A Typed State Trace
- The Public Decode Boundary And Typed Latency
- Inference exercises
- Inference solutions
- Code index
- category lens crate
- neuron crate
- mlp crate
- attention crate
- transformer crate with public encoder trace review
- language-modeling basics crate
- systems crate
- kernels crate
- scaling crate
- data crate
- evaluation crate
- inference crate
- parallelism crate
- alignment crate
rust-ml/
├── assignments/ # original Rust assignment tracks
├── lessons/ # canonical course content
├── references/ # transcripts and papers used as source material
├── code/ # runnable companion crates
├── book/ # non-authoritative note; this is repo-first for now
└── README.md
lessons/is the source of truth for written teaching content.code/follows the lesson progression and now includes a real testedtransformercrate.code/systemsis the active R2 systems-measurement and public-report bridge for the CS336 Rust equivalent track.code/kernelsis the active kernels, tiling, and public-report bridge for the CS336 Rust equivalent track.code/scalingis the active R3 scaling-evidence and public-report bridge for the CS336 Rust equivalent track.code/datais the active R4 data-preparation bridge for the CS336 Rust equivalent track.code/evaluationis the active evaluation bridge for the CS336 Rust equivalent track.code/inferenceis the active inference bridge for the CS336 Rust equivalent track.code/parallelismis the active parallelism and public-report bridge for the CS336 Rust equivalent track.code/alignmentis the active R5 post-training signal and public-release bridge for the CS336 Rust equivalent track.book/is not the public surface right now; this is a repo-first learning resource.lessons/COURSE-STRUCTURE.mdis the canonical structure guide for module and lesson contracts.lessons/CONCEPT-ATLAS.mdis the learner-facing map from ML concepts to Rust newtypes, composable maps, and runnable proofs.- Public learner-facing content must follow Public Content Boundary.
The course keeps the same translation goal everywhere:
plain English <-> algebra <-> Rust newtypes <-> composable maps
The current repo intentionally has two learning depths:
- a coherent path through Modules 0, 1, 2, 3, 4, 5, and 6
- a language-modeling bridge in Module 7
- a systems bridge in Module 8
- a kernels bridge in Module 9
- an inference bridge in Module 10
Module 6 applies the translation rule in two complementary ways:
- narrative lessons that explain the architecture and the implementation choices
- a chunked encoder lesson where every concept is written as
English -> Algebra -> Rust
That repetition is intentional. Repetition is how the translation dictionary becomes automatic.
Module 7 starts the CS336 Rust equivalent path without jumping straight into a large project. It uses a tiny bigram model so the learner can inspect text, tokens, IDs, batches, loss, one update, and public text review in one sitting.
Module 8 continues the bridge into systems thinking. It keeps bytes, FLOPs, elapsed time, bandwidth, and public report eligibility as separate typed objects.
Module 9 makes kernel work inspectable. It teaches elementwise maps, reductions, tiling, FLOP counts, HBM bytes, and public kernel reports as typed objects instead of loose benchmark folklore.
- Read the module README.
- Work through the lesson files in order.
- Do the module exercises without copying from the solutions first.
- Use the solution files to check reasoning, naming, and Rust syntax.
- Move to the next module only after you can explain each formula out loud in English.
Current recommended sequence:
- Learning Path
- The Learning Lens
- Concept Atlas
- 01 Foundations
- 02 Vectors
- 03 Neuron
- 04 Learning
- 05 MLP
- 06 Attention
- 07 Transformer
- 08 Language Modeling
- 09 Systems
- 10 Kernels
- 11 Inference
Run every active teaching crate:
cargo test --manifest-path code/Cargo.toml --workspace --all-targetsRun the beginner neuron ladder:
cargo run --manifest-path code/Cargo.toml -p rust_ml_neuron --example 01_weighted_sum
cargo run --manifest-path code/Cargo.toml -p rust_ml_neuron --example 02_forward_pass
cargo run --manifest-path code/Cargo.toml -p rust_ml_neuron --example 03_one_step_training
cargo run --manifest-path code/Cargo.toml -p rust_ml_neuron --example 04_and_gate_epoch
cargo run --manifest-path code/Cargo.toml -p rust_ml_neuron --example 05_public_training_stepThe neuron crate covers:
- semantic scalar types such as
InputValue,Weight,Bias,Target,LearningRate, andPublicTrainingStep - explicit
TryFromadapters for raw learner numbers - readable typed arithmetic through
std::opsimplementations - feature and weight vectors with shape checks
- one forward pass from weighted sum to sigmoid prediction
- one training step with visible gradients and loss before/after
- a tiny AND-gate training loop for intuition
- public training-step review boundaries that keep restricted or private update evidence out of public material
Run the MLP bridge ladder:
cargo run --manifest-path code/Cargo.toml -p rust_ml_mlp --example 01_hidden_features
cargo run --manifest-path code/Cargo.toml -p rust_ml_mlp --example 02_shape_flow
cargo run --manifest-path code/Cargo.toml -p rust_ml_mlp --example 03_forward_trace
cargo run --manifest-path code/Cargo.toml -p rust_ml_mlp --example 04_xor_table
cargo run --manifest-path code/Cargo.toml -p rust_ml_mlp --example 05_public_traceThe MLP crate covers:
- semantic layer roles such as
InputVector,HiddenActivation,OutputLogit,Prediction, andPublicForwardTrace - explicit
TryFromadapters for raw learner numbers - typed arithmetic with
std::opsfor weighted products, sums, and bias addition - finite-value and probability-range invariants
- dense layer shape checks with expressive errors
- a deterministic XOR-shaped forward pass
- learner-visible hidden activations and logits
- public trace review boundaries that keep restricted or private representation traces out of public material
Run the attention bridge ladder:
cargo run --manifest-path code/Cargo.toml -p rust_ml_attention --example 01_score_one_pair
cargo run --manifest-path code/Cargo.toml -p rust_ml_attention --example 02_softmax_focus
cargo run --manifest-path code/Cargo.toml -p rust_ml_attention --example 03_weighted_sum
cargo run --manifest-path code/Cargo.toml -p rust_ml_attention --example 04_attention_trace
cargo run --manifest-path code/Cargo.toml -p rust_ml_attention --example 05_public_traceThe attention crate covers:
- semantic token, query, key, value, score, weight, output, and public trace roles
- explicit
TryFromadapters for raw learner literals - typed arithmetic with
std::opsfor projection products, score contributions, and weighted value mixing - stable softmax over attention scores
- weighted sums over value vectors
- learner-visible attention traces for one query token
PublicAttentionTracereview boundaries that keep restricted or private traces out of public material- shape and range errors for invalid sequences, projections, and weights
Run the first CS336 Rust language-modeling artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_lm_basics --example 01_tokenize_and_encode
cargo run --manifest-path code/Cargo.toml -p rust_ml_lm_basics --example 02_next_token_batch
cargo run --manifest-path code/Cargo.toml -p rust_ml_lm_basics --example 03_uniform_loss
cargo run --manifest-path code/Cargo.toml -p rust_ml_lm_basics --example 04_training_step
cargo run --manifest-path code/Cargo.toml -p rust_ml_lm_basics --example 05_public_training_exampleThe language-modeling basics crate covers:
RawText,Token,TokenId,VocabularySize,ContextLength,Position,Logit,Loss,LearningRate, andPublicLanguageModelingExample- explicit
TryFromadapters for raw learner literals - checked vocabulary encoding
- next-token batch construction
- uniform cross-entropy loss
- one tiny gradient step over a bigram logit table
- a typed public-example boundary that rejects restricted or private text before tokenization
Run the first CS336 Rust systems artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_systems --example 01_memory_accounting
cargo run --manifest-path code/Cargo.toml -p rust_ml_systems --example 02_attention_flops
cargo run --manifest-path code/Cargo.toml -p rust_ml_systems --example 03_median_timing
cargo run --manifest-path code/Cargo.toml -p rust_ml_systems --example 04_arithmetic_intensity
cargo run --manifest-path code/Cargo.toml -p rust_ml_systems --example 05_memory_hierarchy
cargo run --manifest-path code/Cargo.toml -p rust_ml_systems --example 06_public_reportThe systems crate covers:
BatchSize,SequenceLength,ModelWidth,Bytes,BytesPerSecond,MemoryLevel,Flops,ElapsedNanos,ArithmeticIntensity, andPublicSystemsReport- explicit
TryFromadapters for raw learner literals - activation memory estimates
- matrix-vector FLOP and byte estimates
- dense self-attention score/value FLOP estimates
- median timing over repeated stage measurements
- arithmetic intensity as the bridge between math and memory traffic
- accelerator memory hierarchy as typed byte movement and bandwidth
- a typed public-report boundary that rejects restricted or private measurements
Run the first CS336 Rust kernels artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_kernels --example 01_elementwise_gelu
cargo run --manifest-path code/Cargo.toml -p rust_ml_kernels --example 02_row_sum_reduction
cargo run --manifest-path code/Cargo.toml -p rust_ml_kernels --example 03_tiled_matvec
cargo run --manifest-path code/Cargo.toml -p rust_ml_kernels --example 04_kernel_estimate
cargo run --manifest-path code/Cargo.toml -p rust_ml_kernels --example 05_public_reportThe kernels crate covers:
MatrixShape,TileShape,TilePlan,KernelScalar,Accumulator,Bytes,FlopCount, andPublicKernelReport- explicit
TryFromadapters for raw learner literals - typed
std::opsarithmetic for element counts, byte counts, FLOP counts, scalar products, and accumulation - elementwise GeLU-style traces
- row reductions through a typed accumulator
- tiled matrix-vector traces with visible tile windows
- typed byte and FLOP estimates that keep resource units separate
- a typed public-report boundary that rejects restricted or private tiled traces
Run the first CS336 Rust scaling artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_scaling --example 01_record_runs
cargo run --manifest-path code/Cargo.toml -p rust_ml_scaling --example 02_fit_power_law
cargo run --manifest-path code/Cargo.toml -p rust_ml_scaling --example 03_forecast_loss
cargo run --manifest-path code/Cargo.toml -p rust_ml_scaling --example 04_report_limitations
cargo run --manifest-path code/Cargo.toml -p rust_ml_scaling --example 05_tradeoff_decision
cargo run --manifest-path code/Cargo.toml -p rust_ml_scaling --example 06_public_reportThe scaling crate covers:
RunId,ParameterCount,TokenCount,TrainingStep,ComputeBudgetFlops,ValidationLoss,ScalingExponent,ScalingTradeoff, andPublicScalingReport- explicit
TryFromadapters for raw learner literals - typed experiment configs and run records
- checked parameter-token compute estimates
- log-log power-law fitting over validation loss
- forecast errors and limitation notes for tiny evidence
- typed baseline-versus-candidate tradeoff decisions
- a typed public-report boundary that rejects restricted or private metric records
Run the first CS336 Rust data artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_data --example 01_normalize_documents
cargo run --manifest-path code/Cargo.toml -p rust_ml_data --example 02_filter_and_dedup
cargo run --manifest-path code/Cargo.toml -p rust_ml_data --example 03_build_shard
cargo run --manifest-path code/Cargo.toml -p rust_ml_data --example 04_source_mixture
cargo run --manifest-path code/Cargo.toml -p rust_ml_data --example 05_public_manifestThe data crate covers:
DocumentId,SourceName,RawText,NormalizedText,DedupKey,FilterReason,MixtureWeight,CorpusShard,DatasetCard, andPublicCorpusManifest- explicit
TryFromadapters for raw learner literals - deterministic normalization
- durable filter decisions with rejection reasons
- duplicate detection by normalized-text key
- source mixtures with non-negative weights and a positive total
- a typed public manifest boundary that rejects restricted or private source cards
- readable checked newtype addition for manifest document and token totals
Run the first CS336 Rust evaluation artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_evaluation --example 01_score_prediction
cargo run --manifest-path code/Cargo.toml -p rust_ml_evaluation --example 02_accuracy_report
cargo run --manifest-path code/Cargo.toml -p rust_ml_evaluation --example 03_reject_mismatched_ids
cargo run --manifest-path code/Cargo.toml -p rust_ml_evaluation --example 04_compare_runs
cargo run --manifest-path code/Cargo.toml -p rust_ml_evaluation --example 05_public_reportThe evaluation crate covers:
ExampleId,EvalRunId,Prompt,ExpectedAnswer,ModelAnswer,Correctness,ExactMatchAccuracy,AccuracyDelta, andPublicEvalReport- explicit
TryFromadapters for raw learner literals - deterministic exact-match scoring after whitespace and case normalization
- report construction that rejects duplicate example IDs
- typed run comparison through accuracy deltas
- a typed public-report boundary that rejects restricted or private evaluation examples
Run the first CS336 Rust inference artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_inference --example 01_greedy_decode
cargo run --manifest-path code/Cargo.toml -p rust_ml_inference --example 02_sampling_controls
cargo run --manifest-path code/Cargo.toml -p rust_ml_inference --example 03_kv_cache_trace
cargo run --manifest-path code/Cargo.toml -p rust_ml_inference --example 04_latency_budget
cargo run --manifest-path code/Cargo.toml -p rust_ml_inference --example 05_public_traceThe inference crate covers:
PromptTokens,TokenId,ContextWindow,SamplingMode,DecodeStep,KvCacheEntry,LatencyBudget, andPublicDecodeTrace- explicit
TryFromadapters for raw learner literals - deterministic greedy and top-k decoding controls
- KV-cache traces that distinguish prompt-prefix and generated-token entries
- typed latency estimates for prefill plus per-token generation
- a typed public-trace boundary that rejects restricted or private prompts, outputs, and cache records
Run the first CS336 Rust parallelism artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_parallelism --example 01_data_parallel_batch
cargo run --manifest-path code/Cargo.toml -p rust_ml_parallelism --example 02_tensor_parallel_width
cargo run --manifest-path code/Cargo.toml -p rust_ml_parallelism --example 03_collective_all_reduce
cargo run --manifest-path code/Cargo.toml -p rust_ml_parallelism --example 04_pipeline_schedule
cargo run --manifest-path code/Cargo.toml -p rust_ml_parallelism --example 05_public_reportThe parallelism crate covers:
WorldSize,RankIndex,RankId,GlobalBatchSize,ModelWidth,LayerCount,MicroBatchCount,CommunicationBytes, andPublicParallelismReport- explicit
TryFromadapters for raw learner literals - data-parallel, tensor-parallel, and pipeline-parallel layout summaries
- rank-owned tensor shards with origin offsets
- a tiny all-reduce trace and communication estimate
- a typed public-report boundary that rejects restricted or private collective traces
Run the first CS336 Rust alignment artifact:
cargo run --manifest-path code/Cargo.toml -p rust_ml_alignment --example 01_instruction_example
cargo run --manifest-path code/Cargo.toml -p rust_ml_alignment --example 02_preference_signal
cargo run --manifest-path code/Cargo.toml -p rust_ml_alignment --example 03_verifier_feedback
cargo run --manifest-path code/Cargo.toml -p rust_ml_alignment --example 04_audit_record
cargo run --manifest-path code/Cargo.toml -p rust_ml_alignment --example 05_alignment_workflow
cargo run --manifest-path code/Cargo.toml -p rust_ml_alignment --example 06_public_releaseThe alignment crate covers:
Instruction,Response,ChosenResponse,RejectedResponse,RewardScore,VerifierResult,AlignmentRunId,AuditRecord,AlignmentWorkflow,AlignmentStage, andPublicAlignmentRelease- explicit
TryFromadapters for raw learner literals - supervised instruction-response examples
- preference pairs with distinct chosen and rejected responses
- finite reward-score margins
- verifier feedback that keeps failures visible
- audit records that preserve source and update kind
- workflow transitions that reject out-of-order alignment updates
- a typed public-release boundary that rejects restricted or private alignment workflows
Run the advanced Transformer encoder demo:
cargo run --manifest-path code/Cargo.toml -p rust_ml_transformer --example encoder_demoThe Transformer crate covers:
- dense vectors and matrices
- semantic model newtypes such as
TokenEmbedding,Query,Key, andValue - typed
std::opsarithmetic for positional encoding, residual addition, vector addition, dot products, and matrix-vector products - expressive
thiserrordiagnostics for shape mistakes - standard self-attention and multi-head attention
- a simplified linear-attention comparison point
- positional encodings, layer norm, feed-forward layers, and an encoder block
The repo now includes two GitHub Actions workflows for quality control:
CIruns deterministic checks for lesson structure, local Markdown links, and authored-section contracts.CIscans public learner-facing files for common private-content and secret-shaped leaks.CIchecks Rust teaching contracts: no.unwrap(), no.expect(), no panic-style macros such aspanic!(),todo!(),unimplemented!(), orunreachable!(), noResult<_, String>, no raw public enum payloads, no raw associated type assignments, no public raw-containerTryFromadapters, no raw-style public accessor names, no raw collection accessor helpers, and strict public newtype boundaries for the migrated teaching crates and Markdown Rust snippets.CIchecks active teaching-crate consistency: package names, README structure, examples, tests, typed error modules, and strict path coverage.CIalso compile-checks Rust snippets embedded in lessons, runscargo fmt,cargo clippy,cargo test, and executes every active teaching example once.Gemini Writing Reviewreviews Markdown content on pull requests for English clarity, technical-teaching quality, structural discipline, and beginner friendliness.
The Gemini review is advisory, not a replacement for human judgment. It is designed to catch weak phrasing, excess cognitive load, mismatches between English and code, and places where the teaching flow violates common technical-writing or technical-instruction best practices.
To enable Gemini review in GitHub Actions, configure:
- repository secret
GEMINI_API_KEY - optional repository variable
GEMINI_MODELif you want a model other than the defaultgemini-2.0-flash
The workflow writes a review artifact named gemini-writing-review so the writing assessment can be read directly from the workflow run.
The repo keeps supporting source material in references/, including:
- a Transformer explainer transcript
- Bahdanau et al. (2014)
- Luong et al. (2015)
- Vaswani et al. (2017)
- Sebastian Raschka's LLMs From Scratch repository as an external inspiration source for attention, GPT, and educational sequencing