Notation

Notation Description

R The set of real numbers

N, N

+

The set of natural numbers including (excluding)

zero: {0, 1, 2, …}

P

The probability of one or many random variables

producing the given outcomes

P(Y) The space of probability distributions over a set Y

F

⌫

, F

–1

⌫

The cumulative distribution function (CDF) and

inverse CDF respectively for distribution ⌫

✓

Dirac delta distribution at

✓ 2R

, a probability dis-

tribution which assigns probability 1 to outcome

✓

N(µ,

2

) Normal distribution with mean µ and variance

2

U([a, b]) Uniform distribution over [a, b], with a, b 2R

U({a, b, …}) Uniform distribution over the set {a, b, …}

Z ⇠⌫

The random variable Z, with probability distribution

⌫

z, Z

Capital letters generally denote random variables

and lower case letters their realisations or expecta-

tions. Notable exceptions are: V, Q, and P

x 2X A state x in the state space X

a 2A An action a in the action space A

r 2R A reward r from the set R

Discount factor

R

t

, X

t+1

⇠P(·, · | X

t

, A

t

)

Joint probability distribution of reward and next-

states in terms of the current state and action

P

X

Transition kernel

349

350 Notation

Notation Description

P

R

Reward distribution function

⇠

0

Initial state distribution

x

?

A terminal state

N

X

, N

A

, N

R

Size of the state, action, and reward spaces (when

ﬁnite)

⇡

A policy; usually stationary and Markov, mapping

states to distributions over actions

⇡

⇤

An optimal policy

k Iteration number or index of a sample trajectory

t Time step or time index

(X

t

, A

t

, R

t

)

t0

A trajectory of random variables for state, action,

and reward produced through interaction with a

Markov decision process

X

0:t–1

A sequence of random variables

T Length of an episode

P

⇡

The distribution over trajectories induced by a

Markov decision process and a policy ⇡

E

⇡

The expectation operator for the distribution over

trajectories induced by P

⇡

G A random-variable function or random return

Var, Var

⇡

Variance of a distribution generally and variance

under the distribution P

⇡

V

⇡

(x) The value function for policy ⇡ at state x 2X

Q

⇡

(x, a)

The state-action value function for policy

⇡

at state

x 2Xand taking action a 2A

Z

D

= Z

0

Equality in distribution of two random variables

Z, Z

0

D(Z | Y)

The conditional probability distribution of a random

variable Z given Y

G

⇡

The random-variable function for policy ⇡

⌘ A return-distribution function

⌘

⇡

(x)

The return-distribution function for policy

⇡

at state

x 2X

f

#

⌫

Push-forward distribution passing distribution

⌫

through the function f

b

r,

Bootstrap function with reward r and discount

R

MIN

, R

MAX

, V

MIN

, V

MAX

Minimum and maximum possible reward and return

within an MDP

Notation 351

Notation Description

N

k

(x)

Number of visits to state x

2X

up to but excluding

iteration k

m

Number of particles or parameters of the distribu-

tion representation

{✓

1

, …, ✓

m

}

Support of a categorical distribution representation,

with ✓

i

< ✓

j

for i < j

&

m

The gap between consecutive locations for the

support of a categorical representation with m

locations

ˆ

V

⇡

(x), ˆ⌘

⇡

(x)

An estimate of the value function or return distribu-

tion function at state x under policy ⇡

↵, ↵

k

The step size in an update expression and the step

size used for iteration k

A B

Denotes updating the variable A with the contents

of variable B

⇧

C

The categorical projection (Sections 3.5 and 5.6)

⇧

Q

The quantile projection (Section 5.6)

T

⇡

, T

The policy-evaluation Bellman operator and Bell-

man optimality operator, respectively

T

⇡

, T

The policy-evaluation distributional Bellman oper-

ator and distributional optimality operator, respec-

tively

OU

An operator

O

applied to a point U

2

M, where

(M, d) is a metric space.

k·k

1

Supremum norm on a vector space

w

p

p-Wasserstein distance

`

p

`

p

distance between probability distributions

`

2

Cramér distance

¯

d

The supremum extension of a probability metric d to

return-distribution functions, where the supremum

is taken over states

(⌫, ⌫

0

)

The set of couplings, joint probability distributions,

of ⌫, ⌫

0

2P(R)

P

p

(R) The set of distributions with ﬁnite p

th

moments

P

d

(R)

The set of distributions with ﬁnite d-distance to the

distribution

0

and ﬁnite ﬁrst moment. Also referred

to as the ﬁnite domain of d

CVAR Conditional value at risk

352 Notation

Notation Description

F A probability distribution representation

F

E

Empirical probability distribution representation

F

N

Normal probability distribution representation,

parameterised by mean and variance

F

C,m

m-categorical probability distribution representation

F

Q,m

m-quantile probability distribution representation

⇧

F

A projection onto the probability distribution repre-

sentation F

bzc, dze

Floor and ceiling operations, mapping z

2R

to the

nearest integer that is less or equal (ﬂoor), or greater

or equal (ceiling), than z

d

A probability metric, typically used for the purposes

of contraction analysis

L

⌧

(✓)

Quantile regression loss function for target thresh-

old ⌧ 2(0, 1) and location estimate ✓ 2R

{u}

An indicator function that takes the value 1 when u

is true and 0 otherwise; also {·}

J(⇡) Objective function for a control problem

G

Greedy policy operator, which produces a policy

which is greedy with respect to a given action-value

function

T

G

, T

G

The Bellman and distributional Bellman optimality

operators derived from greedy selection rule G

J

⇢

(⇡)

A risk-sensitive control objective function, with risk

measure ⇢

A statistical functional or sketch

⇠

⇡

Steady state distribution under policy ⇡

(x)

State representation for state x, a mapping

:

X!

R

n

⇧

,⇠

Projection onto the linear subspace generated by

with state weighting ⇠

M (R)

Space of signed probability measures over the reals

`

⇠,2

Weighted Cramér distance over return-distribution

functions, with state weighting given by ⇠

⇧

,⇠,`

2

Projection onto the linear subspace generated by

,

minimising the `

⇠,2

distance

L Loss function

Notation 353

H

The Huber loss with threshold 0