Topics: Policy Improvement Method for MDPs - Markov Decision Process


(definition)

In the context of an MDP, let be the expected total cost of a system that starts in state and evolves over periods, given a specific policy . This will be:

…where is the cost of making the decision in the state (as defined by the policy ) and is the discount factor. If there is no discount, then , as is the case when using the standard policy improvement method (cf. the discounted version).

Do not forget that these calculations take into consideration a given policy , which defines which decision is made in which state .