Topics: Markov Decision Process - Linear Programming Problem


Given an MDP, we can find the optimal policy by formulating it as a linear programming problem.

Decision Variables

As Binary

Let the following be our decision variables:

Once solved, the matrix of the values these variables optimally take characterises the optimal policy.

As Continuous

Solving binary linear programming problems isn’t as straightforward as it may normally be with continuous ones, so we can reinterpret these decision variables as:

Id est, the probability of making the decision given that the system is in the state .

Under this reinterpretation, we’ll be actually using other variables in the objective function: . These represent the unconditional steady state probability of being in the state and taking the decision , satisfying:

Model

Finally, with these considerations, the linear programming model for an MDP is:

…where:

  • denotes the cost of making the decision in the state
  • denotes the transition probability from state to when making the decision

Evidently, we can find the optimal by solving this linear programming problem through any of the applicable methods (simplex, etc.). Then we can determine the optimal policy by calculating the as established above.