Gambler's Problem — Dynamic Programming with Value & Policy Iteration

Implements Value Iteration and Policy Iteration (from Sutton & Barto, Reinforcement Learning: An Introduction) to solve the Gambler's Problem. Analyzes the effect of discount factor γ and coin-flip probability p_h on the optimal policy.

Stack: Python, Dynamic Programming, MDP, NumPy, Matplotlib

Problem Description

The Gambler's Problem is a classic finite Markov Decision Process (MDP) in which a gambler bets on the outcome of repeated coin flips. The goal is to find the policy that maximizes the probability of reaching a capital of $100 before going broke.

States: Gambler's capital s ∈ {1, 2, ..., 99}, with terminal states at s = 0 (loss) and s = 100 (win)
Actions: Stakes a ∈ {0, 1, ..., min(s, 100 − s)} — the amount wagered on each flip
Reward: +1 when s = 100 is reached; 0 otherwise
Dynamics: With probability p_h, the gambler wins the stake; with probability (1 − p_h), they lose it

Approach

Task 1 — Standard Gambler's Problem

Solved for p_h = 0.25 and p_h = 0.55 using both:

Value Iteration: Iteratively applies the Bellman optimality equation until the value function converges, then extracts the greedy policy.
Policy Iteration: Alternates between full policy evaluation and policy improvement until the policy stabilizes.

The discount factors γ = 1.0, 0.99, and 0.90 are applied to analyze how future reward weighting affects the optimal policy shape.

Task 2 — Modified Rules

An extended variant introduces two modifications:

The gambler may stake up to their full current capital (not capped at 100 − s)
A "North head" special outcome: with probability 1/8 on a heads flip, winnings are doubled
Reward is modified to reflect the final capital reached above 99: reward = final_capital − 99

The state transition and reward functions are updated accordingly, and Value Iteration is re-applied.

Key Results

For p_h < 0.5, the optimal policy concentrates stakes at capital levels that maximize the probability of a single decisive win — producing the characteristic "staircase" value function.
For p_h > 0.5, the gambler has an inherent edge; spreading smaller bets across more flips increases the probability of winning, flattening the optimal policy.
Reducing γ from 1.0 to 0.90 significantly discounts terminal rewards, causing the agent to prefer faster paths to $100 even at greater risk.
Policy Iteration and Value Iteration converge to identical optimal policies, confirming implementation correctness.

Repository Structure

RL-Dynamic-Programming-GamblersProblem/
├── notebooks/
│   └── DS669MidtermProject_PabloSalar.ipynb   # Full experiment notebook with policy visualizations
├── src/
│   └── DS669MidtermProject_PabloSalar.py      # Standalone Python script
└── README.md

Technologies

Library	Purpose
NumPy	Bellman equation computation and array operations
Matplotlib	Value function and policy plots across parameters
Jupyter	Interactive experimentation and reporting

Reference

Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction (2nd ed.). MIT Press. — Chapter 4: Dynamic Programming.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gambler's Problem — Dynamic Programming with Value & Policy Iteration

Problem Description

Approach

Task 1 — Standard Gambler's Problem

Task 2 — Modified Rules

Key Results

Repository Structure

Technologies

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
notebooks		notebooks
src		src
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Gambler's Problem — Dynamic Programming with Value & Policy Iteration

Problem Description

Approach

Task 1 — Standard Gambler's Problem

Task 2 — Modified Rules

Key Results

Repository Structure

Technologies

Reference

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages