Topics: Markov Decision Process - Optimisation
There are several optimisation methods that allow us to find the optimal policy for a given Markov decision process:
-
The exhaustive enumeration method is the simplest of them all. It consists of enumeration all of the viable policies, calculating the expected cost for each one, then finally selecting the optimal one. It’s simple but computationally expensive.
-
Solving the MDP as a linear programming problem through any of the applicable methods.
-
The policy improvement method consists of solving a linear equation system, then finding an alternative policy that optimises the costs until the optimal policy is reached. Compared to other methods, it is rather efficient and simple.
- The discounted policy improvement method is basically the same method applied for phenomena where it might prove useful to discount the costs the further they are in the future.
-
The successive approximations method finds an approximation to the optimal policy by using the expected total costs starting from every state given increasing remaining periods.