Here are my notes on the derivation of the Least Squares Policy Iteration (LSPI) algorithm. The notes are based on the original paper by Lagoudakis and Parr.

/
[pdf]

