Topics: Markov Decision Process - Optimisation


There are several optimisation methods that allow us to find the optimal policy for a given Markov decision process:

  • The exhaustive enumeration method is the simplest of them all. It consists of enumeration all of the viable policies, calculating the expected cost for each one, then finally selecting the optimal one. It’s simple but computationally expensive.

  • Solving the MDP as a linear programming problem through any of the applicable methods.

  • The policy improvement method consists of solving a linear equation system, then finding an alternative policy that optimises the costs until the optimal policy is reached. Compared to other methods, it is rather efficient and simple.

  • The successive approximations method finds an approximation to the optimal policy by using the expected total costs starting from every state given increasing remaining periods.