Topics: Markov Chain - Stochastic Process


(definition)

A Markov decision process (MDP) is a Markov chain characterised by the ability to make a decision (among various options) in each state, incurring a cost for each decision taken in a given state.

MDPs can model diverse phenomena, making them useful when paired with optimisation. In this regard, several methods allow us to find the optimal policy for a given MDP.

Characterisation

An MDP is characterised by the following 4 elements:

  • , its state set
  • , its set of decisions
  • , the costs associated to taking the decision in the state
  • The transition matrices associated to a given decision , whose elements are commonly denoted by

Additionally, we may define , the set of policies that can be formed with all the viable decisions in each state.