Notation

Notation Description

R The set of real numbers

N, N

The set of natural numbers including (excluding)

zero: {0, 1, 2, …}

The probability of one or many random variables

producing the given outcomes

P(Y) The space of probability distributions over a set Y

, F

−1

The cumulative distribution function (CDF) and

inverse CDF, respectively, for distribution ν

Dirac delta distribution at

θ ∈R

, a probability dis-

tribution that assigns probability 1 to outcome

N(µ, σ

) Normal distribution with mean µ and variance σ

U([a, b]) Uniform distribution over [a, b], with a, b ∈R

U({a, b, …}) Uniform distribution over the set {a, b, …}

Z ∼ν

The random variable

, with probability distribution

z, Z

Capital letters generally denote random variables

and lowercase letters their realizations or expecta-

tions. Notable exceptions are V, Q, and P

x ∈X A state x in the state space X

a ∈A An action a in the action space A

r ∈R A reward r from the set R

γ Discount factor

, X

t+1

∼P(·, · | X

, A

)

Joint probability distribution of reward and next

states in terms of the current state and action

Transition kernel

Draft version. 333

334 Notation

Notation Description

Reward distribution function

Initial state distribution

∅

A terminal state

, N

Size of the state, action, and reward spaces (when

ﬁnite)

A policy; usually stationary and Markov, mapping

states to distributions over actions

∗

An optimal policy

k Iteration number or index of a sample trajectory

t Time step or time index

, A

, R

)

t≥0

A trajectory of random variables for state, action,

and reward produced through interaction with a

Markov decision process

0:t−1

A sequence of random variables

T Length of an episode

The distribution over trajectories induced by a

Markov decision process and a policy π

The expectation operator for the distribution over

trajectories induced by P

G A random-variable function or random return

Var, Var

Variance of a distribution generally and variance

under the distribution P

(x) The value function for policy π at state x ∈X

(x, a)

The state-action value function for policy

at state

x ∈X and taking action a ∈A

= Z

Equality in distribution of two random variables

Z, Z

D(Z | Y)

The conditional probability distribution of a random

variable Z given Y

The random-variable function for policy π

η A return-distribution function

(x)

The return-distribution function for policy

at state

x ∈X

Pushforward distribution passing distribution

through the function f

r,γ

Bootstrap function with reward r and discount γ

min

, R

max

, V

min

, V

max

Minimum and maximum possible reward and return

within an MDP

Draft version.

Notation 335

Notation Description

(x)

Number of visits to state

x ∈X

up to but excluding

iteration k

Number of particles or parameters of the distribu-

tion representation

{θ

, …, θ

}

Support of a categorical distribution representation,

with θ

< θ

for i < j

The gap between consecutive locations for the

support of a categorical representation with

locations

(x), ˆη

(x)

An estimate of the value function or return distribu-

tion function at state x under policy π

α, α

The step size in an update expression and the step

size used for iteration k

A ← B

Denotes updating the variable

with the contents

of variable B

The categorical projection (Sections 3.5 and 5.6)

The quantile projection (Section 5.6)

, T

The policy-evaluation Bellman operator and Bell-

man optimality operator, respectively

, T

The policy-evaluation distributional Bellman oper-

ator and distributional optimality operator, respec-

tively

An operator

applied to a point

U ∈M

, where

(M, d) is a metric space.

k·k

∞

Supremum norm on a vector space

p-Wasserstein distance



distance between probability distributions



Cramér distance

The supremum extension of a probability metric

return-distribution functions, where the supremum

is taken over states

Γ(ν, ν

)

The set of couplings, joint probability distributions,

of ν, ν

∈P (R)

(R) The set of distributions with ﬁnite pth moments

(R)

The set of distributions with ﬁnite

-distance to the

distribution

and ﬁnite ﬁrst moment. Also referred

to as the ﬁnite domain of d

CVaR Conditional value at risk

Draft version.

336 Notation

Notation Description

F A probability distribution representation

Empirical probability distribution representation

Normal probability distribution representation,

parameterized by mean and variance

C,m

-categorical probability distribution representation

Q,m

m-quantile probability distribution representation

A projection onto the probability distribution repre-

sentation F

bzc, dze

Floor and ceiling operations, mapping

z ∈R

to the

nearest integer that is less or equal (ﬂoor), or greater

or equal (ceiling), than z

A probability metric, typically used for the purposes

of contraction analysis

(θ)

Quantile regression loss function for target thresh-

old τ ∈(0, 1) and location estimate θ ∈R

{u}

An indicator function that takes the value 1 when

is true and 0 otherwise; also {·}

J(π) Objective function for a control problem

Greedy policy operator, produces a policy that is

greedy with respect to a given action-value function

, T

The Bellman and distributional Bellman optimality

operators derived from greedy selection rule G

(π)

A risk-sensitive control objective function, with risk

measure ρ

ψ A statistical functional or sketch

Steady-state distribution under policy π

φ(x)

State representation for state

, a mapping

X→

φ,ξ

Projection onto the linear subspace generated by

with state weighting ξ

M (R)

Space of signed probability measures over the reals



ξ,2

Weighted Cramér distance over return-distribution

functions, with state weighting given by ξ

φ,ξ,

Projection onto the linear subspace generated by

minimizing the 

ξ,2

distance

L Loss function

The Huber loss with threshold κ ≥0

Draft version.