Reinforcement Learning (2/e)

Richard S. Sutton, Andrew G. Barto

出版时间

2018-11-13

ISBN

9780262039246

评分

★★★★★
书籍介绍
The significantly expanded and updated new edition of a widely used text on reinforcement learning, one of the most active research areas in artificial intelligence. Reinforcement learning, one of the most active research areas in artificial intelligence, is a computational approach to learning whereby an agent tries to maximize the total amount of reward it receives while interacting with a complex, uncertain environment. In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the field's key ideas and algorithms. This second edition has been significantly expanded and updated, presenting new topics and updating coverage of other topics. Like the first edition, this second edition focuses on core online learning algorithms, with the more mathematical material set off in shaded boxes. Part I covers as much of reinforcement learning as possible without going beyond the tabular case for which exact solutions can be found. Many algorithms presented in this part are new to the second edition, including UCB, Expected Sarsa, and Double Learning. Part II extends these ideas to function approximation, with new sections on such topics as artificial neural networks and the Fourier basis, and offers expanded treatment of off-policy learning and policy-gradient methods. Part III has new chapters on reinforcement learning's relationships to psychology and neuroscience, as well as an updated case-studies chapter including AlphaGo and AlphaGo Zero, Atari game playing, and IBM Watson's wagering strategy. The final chapter discusses the future societal impacts of reinforcement learning.
AI导读
核心看点
  • 强化学习领域奠基性教材,图灵奖得主Sutton与Barto合著经典。
  • 清晰阐述核心在线学习算法,涵盖从表格型到函数逼近的完整体系。
  • 第二版大幅扩充更新,新增多智能体、持续任务等前沿话题与数学推导。
适合谁读
  • 人工智能与机器学习领域的研究生,希望系统掌握强化学习理论基础的读者。
  • 从事RL算法研发的工程师,需深入理解Q-learning、策略梯度等核心原理。
  • 对智能体决策机制感兴趣,具备一定数学基础并愿意啃硬骨头的自学者。
读前提醒
  • 建议配合Sutton官网免费电子版及Coursera专项课程同步学习,效果更佳。
  • 前半部分表格型方法较易读懂,后半部分函数逼近与数学推导难度陡增。
  • 不必强求一次性通读,可结合具体业务场景或论文需求,针对性精读章节。
读者共识
  • 公认RL领域圣经级教材,框架建立Solid,是进入该领域的必读之作。
  • 文字叙述清晰简洁,但部分数学推导密集,非数学专业读者需耐心克服。
  • 虽被吐槽部分文字冗余,但作为工具书查阅算法细节与理论推导极具价值。

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "Newcomers to reinforcement learning are sometimes surprised that the rewards -- which define of the goal of learning -- are computed in the environment rather than in the agent. Certainly most ultimate goals for animals are recognized by computations occuring inside their body: by sensors for recogn"
  • "例5.5 普通重要度采样的估计的方差通常是无穷的,尤其当缩放过的回报值具有无穷的方差时,其收敛性往往不尽人意,而这种现象在带环的序列轨迹中进行离轨策略学习时很容易发生 引自章节:5.5 基于重要度采样的离轨策略 101"
  • "而当行动策略是随机的且具有试探性时(例如可以使用epsinon-贪心策略),这个策略会成为一个确定性的最优策略"
  • "Electrical stimulation not only energized the rats’ behavior—through dopamine’s effect on motivation—it also led to the rats quickly learning to stimulate themselves by pressing a lever, which they would do frequently for long periods of time."
  • "The reward prediction error hypothesis of dopamine neuron activity was proposed by scientists who recognized striking parallels between the behavior of TD errors and the activity of neurons that produce dopamine, a neurotransmitter essential in mammals for reward-related learning and behavior. Exper"
  • "A conspicuous feature of the dopamine system is that fibers releasing dopamine project widely to multiple parts of the brain. Although it is likely that only some populations of dopamine neurons broadcast the same reinforcement signal, if this signal reaches the synapses of many neurons involved in "
作者简介
Richard S. Sutton is Professor of Computing Science and AITF Chair in Reinforcement Learning and Artificial Intelligence at the University of Alberta, and also Distinguished Research Scientist at DeepMind.
用户评论
可以交流一下exercise的答案?
RL的经典教科书。
强化学习必看书,还是草稿的时候从头到尾看了一遍,至少应该再看一遍。
看了前两部分
真的很好看 个人认为比花书好看,自洽 自成一体,公式简洁。当然 也有卖idea的现象
这本书涵盖了很多rl的推导以及技术细节,有种发RL paper必备神器的感觉,但是有点美中不足的是有些地方明明一个公式带过的却有一大段文字描述,偏偏这些文字描述又只是把公式直白翻译了一遍对深入理解却没有太大帮助
五星好书啊 泪流满面
今年又手贱去学校学了这门reinforcement learning的课,今天终于把试考完了。整门课7个大作业,除此之外还要做一个paper presentation,然后写一个paper report,最后还有四个小时的笔试考试。我一度自我怀疑,我究竟在干嘛学得这么辛苦?但是因为paper presentation和report都是两人小组,我退课的话,我同学就郁闷了,所以也就硬着头皮把试考了。还挺开心最后一道相对比较“难”的大题证明都证出来的。这本书写得挺好的,通俗易通,确实作为经典入门教材无可非议。打好基础之后,真正RL的应用其实还是需要看新的paper和代码的——不过机器学习都这样,也不是RL的问题。我其实很好奇的RL在贝叶斯框架下的问题,但是我们这门课没有再详细深入讲了。
我把tabular calculation 吃得挺透了,后面的function approximation 由于我们居委管理太差,我实在没心思在这疫情之中看下去,就不看了。
大部分内容都很友好,不需要太深的数理背景。
收藏