Topics: Markov Chain - Stochastic Process
(definition)
A Markov decision process (MDP) is a Markov chain characterised by the ability to make a decision (among various options) in each state, incurring a cost for each decision taken in a given state.
MDPs can model diverse phenomena, making them useful when paired with optimisation. In this regard, several methods allow us to find the optimal policy for a given MDP.
Characterisation
An MDP is characterised by the following 4 elements:
- , its state set
- , its set of decisions
- , the costs associated to taking the decision in the state
- The transition matrices associated to a given decision , whose elements are commonly denoted by
Additionally, we may define , the set of policies that can be formed with all the viable decisions in each state.