Topics: Markov Decision Process

Given an MDP, we can obtain an approximation to the optimal policy by using the successive approximations method.

The basic idea behind this method is to find an optimal policy for the process when only $n$ periods remain, starting from $n = 1$ (one period away from finishing) and continuing onwards. As $n$ grows, the optimal policies that are found converge to the optimal policy for the MDP (i.e. the optimal policy when “infinite” periods remain). Thus, the optimal policies that are found for each $n$ are successive approximations to the optimal policy.

This method mainly uses the expected total costs starting from a state $V_{i}^{n}$ , though we’ll be using the ones that correspond to the optimal policy among all the possible ones, instead of just the ones for a given policy.

Algorithm

Evidently, the method is iterative. We will iterate over $n$ , the amount of remaining periods.

Since the method finds approximations and, depending on the amount of iterations we make, we’re not always guaranteed to find the optimal policy, we will define $N$ , the maximum amount of iterations, as well as $ε$ , the tolerated error.

$n = 1$

For the first iteration $n = 1$ , we will determine, for every state $i = 0, 1, \dots, m$ :

$V_{i}^{1} = optimal {C_{ik}}$

…having an element to compare for every decision $k$ that is viable in $i$ .

We will then obtain the optimal policy $r_{1}$ (when having $n = 1$ remaining periods) by setting $d_{i} {r_{1}}$ to the $k$ that corresponds to the optimal cost.

We continue by setting $n \leftarrow n + 1$ .

$n > 1$

For the second iteration $n = 2$ and onwards, we will determine, for every state $i = 0, 1, \dots, m$ :

V_{i}^{n} = optimal {C_{ik} + α j = 0 \sum m p_{ij} {k} V_{j}^{n - 1}}

…having an element to compare for every decision $k$ that is viable in $i$ .

We will then obtain the optimal policy $r_{n}$ by setting $d_{i} {r_{n}}$ to the $k$ that corresponds to the optimal cost.

Stopping Condition

If $n = N$ , then we stop iterating. If $n < N$ , then, for every state $i = 0, 1, \dots, m$ , we test the inequality:

V_{i}^{n} - V_{i}^{n - 1} < ε

If the inequality is satisfied for every $i$ , we stop iterating. Otherwise, we set $n \leftarrow n + 1$ and continue iterating.

Notkesto

Navigation

Recently Created

A Camera is a Simple Device

Good Feedback is Constructive and Actionable

Photography Gear is the Medium

Photography is More About Art than Technology

Discounted Policy Improvement Method for MDPs

Successive Approximations Method for MDPs

Algorithm

$n = 1$

$n > 1$

Stopping Condition

Graph View

Table of Contents

Backlinks

Notkesto

Navigation

Recently Created

A Camera is a Simple Device

Good Feedback is Constructive and Actionable

Photography Gear is the Medium

Photography is More About Art than Technology

Discounted Policy Improvement Method for MDPs

Successive Approximations Method for MDPs

Algorithm §

n=1 §

n>1 §

Stopping Condition §

Graph View

Table of Contents

Backlinks

Algorithm

$n = 1$

$n > 1$

Stopping Condition