vi Contents
3.7 Learning to control 71
3.8 Further considerations 72
3.9 Technical remarks 73
3.10 Bibliographical remarks 73
3.11 Exercises 75
4 Operators and Metrics 79
4.1 The Bellman operator 80
4.2 Contraction mappings 81
4.3 The distributional Bellman operator 85
4.4 Wasserstein distances for return functions 89
4.5 `
p
probability metrics and the Cramér distance 94
4.6 Sufficient conditions for contractivity 97
4.7 A matter of domain 101
4.8 Weak convergence of return functions* 105
4.9 Random variable Bellman operators* 107
4.10 Technical remarks 108
4.11 Bibliographical remarks 110
4.12 Exercises 112
5 Distributional Dynamic Programming 119
5.1 Computational model 119
5.2 Representing return-distribution functions 122
5.3 The empirical representation 124
5.4 The normal representation 129
5.5 Fixed-size empirical representations 132
5.6 The projection step 135
5.7 Distributional dynamic programming 140
5.8 Error due to diffusion 144
5.9 Convergence of distributional dynamic programming 146
5.10 Quality of the distributional approximation 150
5.11 Designing distributional dynamic programming algorithms 153
5.12 Technical remarks 154
5.13 Bibliographical remarks 160
5.14 Exercises 162
6 Incremental Algorithms 167
6.1 Computation and statistical estimation 168
6.2 From operators to incremental algorithms 169
6.3 Categorical temporal-difference learning 171