References 357
André Barbeau. Drugs affecting movement disorders. Annual Review of Pharmacol-
ogy, 14(1):91–113, 1974.
Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis
Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain
Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, and Michael Bowling.
The Hanabi challenge: A new frontier for AI research. Artificial Intelligence, 280:
103216, 2020.
Etienne Barnard. Temporal-difference methods and Markov models. IEEE Transac-
tions on Systems, Man, and Cybernetics, 1993.
André Barreto, Will Dabney, Rémi Munos, Jonathan J. Hunt, Tom Schaul, Hado P. van
Hasselt, and David Silver. Successor features for transfer in reinforcement learning.
In Advances in Neural Information Processing Systems, 2017.
Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan,
Dhruva TB, Alistair Muldal, Nicolas Heess, and Timothy Lillicrap. Distributed
distributional deterministic policy gradients. In Proceedings of the International
Conference on Learning Representations, 2018.
Andrew G. Barto, Richard S. Sutton, and Charles W. Anderson. Neuronlike adaptive
elements that can solve difficult learning control problems. IEEE Transactions on
Systems, Man, and Cybernetics, SMC-13(5):834–846, 1983.
Andrew G. Barto, Steven J. Bradtke, and Satinder P. Singh. Learning to act using
real-time dynamic programming. Artificial intelligence, 72(1-2):81–138, 1995.
Nicole Bäuerle and Jonathan Ott. Markov decision processes with average-value-at-risk
criteria. Mathematical Methods of Operations Research, 74(3):361–379, 2011.
Charles Beattie, Joel Z. Leibo, Denis Teplyashin, Tom Ward, Marcus Wainwright,
Heinrich Küttler, Andrew Lefrancq, Simon Green, Víctor Valdés, Amir Sadik, Julian
Schrittwieser, Keith Anderson, Sarah York, Max Cant, Adam Cain, Adrian Bolton,
Stephen Gaffney, Helen King, Demis Hassabis, Shane Legg, and Stig Petersen.
DeepMind Lab. arXiv preprint arXiv:1612.03801, 2016.
Marc G. Bellemare, Joel Veness, and Michael Bowling. Investigating contingency
awareness using Atari 2600 games. In Proceedings of the Twenty-Sixth AAAI
Conference on Artificial Intelligence, 2012a.
Marc G. Bellemare, Joel Veness, and Michael Bowling. Sketch-based linear value
function approximation. In Advances in Neural Information Processing Systems,
2012b.
Marc G. Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade
Learning Environment: An evaluation platform for general agents. Journal of Artificial
Intelligence Research, 47:253–279, June 2013a.
Marc G. Bellemare, Joel Veness, and Michael Bowling. Bayesian learning of recur-
sively factored environments. In Proceedings of the International Conference on
Machine Learning, 2013b.