Topics: Policy Improvement Method for MDPs - Markov Decision Process
(theorem)
In the context of an MDP, the expected long term cost for a given policy can be expressed as:
…where denotes the steady state probability of the state .
Do not forget that the policy defines which decision is made in which state .