书籍介绍

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surf...

(展开全部)

AI导读

核心看点

统计视角统合机器学习核心概念
重概念阐释而非纯数学推导
涵盖监督学习与模型选择经典方法

适合谁读

具备扎实数学统计基础的读者
机器学习进阶研究者与从业者
经济学或生物信息学领域学者

读前提醒

需配合代码实现以深化理解
建议先修线性代数与概率论
可结合网络免费资源辅助学习

读者共识

理论深度极高，入门门槛较高
常读常新，适合反复研读
经典巨著，值得投入时间精读

本导读基于书籍简介、目录、原文摘录、短评和书评生成，不等同于全文精读。

精彩摘录

"LAR uses least squares directions in the active set of variables. Lasso uses least square directions; if a variable crosses zero, it is removed from the active set. Boosting uses non-negative least squares directions in the active set."
"the bias of the 1-nearest-neighbor estimate is often low, but the variance is high."
"Monte Carlo is an extremely bad method; it should be used only when all alternative methods are worse"
"有一个关于Metropolis算法的故事，非常流行：一晚，Edward、Metropolis和Marshall在派对上讨论这个问题，在鸡尾酒餐巾纸上写出了这个闻名的算法。他们最终的论文之所以写上妻子的名字，是为了安抚被整晚的技术性讨论所烦扰的女人Arianna和Augusta"
"Bagging; or Bootstrap AGGregatING, is an extension of bootstrapping to classification and regression problems. The main idea is to sample with replacement from the training data so that we now have B training data sets, each having n′≤n observations. The machine-learning algorithm is trained on each"
"还是觉得对这本书相见恨晚，研一写那么都web app有毛用啊，就应该踏踏实实的多读书啊~ 还好去实习了，还好发现原来啥都不知道，还好坚持把这本书啃完了，虽然理解的较为粗陋，要不要去读个博呢......真苦恼~"
"Both k-nearest neighbors and least squares end up approximating conditional expectatios by averages."
"However, with a 0 − 1 outcome, this computation simplifies. We order the predictor classes according to the proportion falling in outcome class 1. Then we split this predictor as if it were an ordered predictor."

作者简介

Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

Preface to the Second Edition

Preface to the First Edition

1 Introduction

2 Overview of Supervised Learning

2.1 Introduction

显示全部

用户评论

绝对不适合入门，很多东西都学了一遍再回来看才能更多理解作者在写什么。真的是高屋建瓴，常读常新。

没看完……稍微翻了一下

:无

好感动啊。

第二版已经第十次修订了，作者网站有免费的pdf下载，难度略大。。。