Operators and Metrics 111
survey of theoretical properties. A version of the contraction analysis in p-
Wasserstein distances was given by Bellemare et al. [2017a]; we owe the proof
of Proposition 4.15 in terms of optimal couplings to Philip Amortila.
The use of contraction mapping theory to analyse stochastic fixed-point equa-
tions was introduced by Rösler [1991], who analyses the Quicksort algorithm
by characterising the distributional fixed points of contraction mappings in 2-
Wasserstein distance. Applications and generalisation of this technique includes
the analysis of further recursive algorithms, models in stochastic geometry, and
branching processes [Rösler, 1992, Rachev and Rüschendorf, 1995, Neininger,
1999, Rösler and Rüschendorf, 2001, Rösler, 2001, Neininger, 2001, Neininger
and Rüschendorf, 2004, Rüschendorf, 2006, Rüschendorf and Neininger, 2006,
Alsmeyer, 2012]. Although the random-variable Bellman equation (Equation
2.15) can be viewed as a system of recursive distributional equations, the empha-
sis on a collection of effectively independent random variables (G
⇡
) differs
from the usual treatment of such equations.
4.5.
The family of
`
p
distances described in this chapter are covered at length
in the work of Rachev et al. [2013], which studies an impressive variety of
probability metrics. A version of the contraction analysis in Cramér distance
was originally given by Rowland et al. [2018]. In two and more dimensions,
the Cramér distance is generalised by the energy distance [Székely, 2002,
Székely and Rizzo, 2013, Rizzo and Székely, 2016], itself a member of the
maximum mean discrepancy (MMD) family [Gretton et al., 2012]; contraction
analysis in terms of MMD metrics was undertaken by Nguyen et al. [2021]
(see Exercise 4.19 for further details). Another special case of the
`
p
metrics
considered in this chapter is the Kolmogorov-Smirnov distance (
`
1
), which
features in results in empirical process theory, such as the Glivenko-Cantelli
theorem. Many of these metrics are integral probability metrics [Müller, 1997],
which allows for a dual formulation with appealing algorithmic consequences.
Chung and Sobel [1987] provide a non-expansion result in total variation
distance (without naming it as such; the proof uses an integral probability
metric formulation).
4.6.
The properties of regularity, convexity, and c-homogeneity were introduced
by Zolotarev [1976] in a slightly more general setting. Our earlier work pre-
sented these in a modern context [Bellemare et al., 2017b], albeit with only a
mention of their potential use in reinforcement learning. Although that work
proposed the term “sum-invariant” as mnemonically simpler, this is only techni-
cally correct when Equation 4.15 holds with equality; we have thus chosen to
keep the original name. Theorem 4.25 is new to this book.