书籍介绍

《机器学习》展示了机器学习中核心的算法和理论，并阐明了算法的运行过程。《机器学习》综合了许多的研究成果，例如统计学、人工智能、哲学、信息论、生物学、认知科学、计算复杂性和控制论等，并以此来理解问题的背景、算法和其中的隐含假定。《机器学习》可作为计算机专业本科生、研究生教材，也可作为相关领域研究人员、教师的参考书。

AI导读

核心看点

系统阐述概念学习、决策树、神经网络等核心算法
融合统计学、AI、信息论等多学科理论背景
深入解析归纳与分析学习结合及隐含假设

适合谁读

计算机专业本科生及研究生教材适用
相关领域研究人员及教师参考阅读
希望深入理解机器学习底层原理的读者

读前提醒

理论推导较多，建议结合实践项目辅助理解
部分章节如遗传算法可酌情跳过以提效率
翻译存在瑕疵，有条件建议对照英文原版

读者共识

机器学习领域经典之作，虽老但基础扎实
内容精炼无废话，但理论性强阅读门槛高
适合打基础，部分过时内容需结合新书补充

本导读基于书籍简介、目录、原文摘录、短评和书评生成，不等同于全文精读。

精彩摘录

"The inductive learning hypothesis. Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples"
"We shall see that most current theory of machine learning rests on the crucial assumption that the distribution of training examples is identical to the distribution of test examples. Concept learning. Inferring a boolean-valued function from training examples of its input and output."
"As illustrated by these first two steps, positive training example may force the S boundary of the version space to become increasingly general. Negative training examples play the complimentary role of forcing the G boundary to become increasing specific."
"When gradient descent falls in a local minimum with respect to one of these weights, it will not necessarily be in a local minimum with respect to the other weights. In fact, the more weights in network, the more dimensions that might provide "escape routs" for gradient descent to fall away from the"
"The proof of this involoves showing that any function can e approximated by a inear combination of some samll region, and then showing that two layers of sigmoid units are sufficient to produce good local approximations."
"The only likely impact on the final error is that different error-minimization procedures may fall into different local minima. Bishop (1996) contains a general discussion of several parameter optimization methods for training networks A variety of methods have been proposed to dynamically grow or s"
"既然分析的方法提出逻辑论证的假设，而归纳方法提供统计论证的假设，很容易看出为什么可以将两者结合起来。逻辑的论证的强度只相当于它们所基于的假定或先验知识。如果先验知识不正确或不可得，逻辑论证是不可信的且无力的。统计论证的强度依赖于它们基于的数据和统计假定。当基准分布不可信或数据稀缺时，统计论证也是不可信且无力的。简而言之，两种方法针对不同的类型的问题时才有效。通过两者的结合，有望开发出更通用的学习方法，可以覆盖较广的学习任务。"
"Every hypothesis consistent with D is a MAP hypothesis"