Topics: Markov Decision Process - Linear Programming Problem
Given an MDP, we can find the optimal policy by formulating it as a linear programming problem.
Decision Variables
As Binary
Let the following be our decision variables:
Once solved, the matrix of the values these variables optimally take characterises the optimal policy.
As Continuous
Solving binary linear programming problems isn’t as straightforward as it may normally be with continuous ones, so we can reinterpret these decision variables as:
Id est, the probability of making the decision given that the system is in the state .
Under this reinterpretation, we’ll be actually using other variables in the objective function: . These represent the unconditional steady state probability of being in the state and taking the decision , satisfying:
Model
Finally, with these considerations, the linear programming model for an MDP is:
…where:
- denotes the cost of making the decision in the state
- denotes the transition probability from state to when making the decision
Evidently, we can find the optimal by solving this linear programming problem through any of the applicable methods (simplex, etc.). Then we can determine the optimal policy by calculating the as established above.