Statistical Foundations of Data Science - Jianqing Fan(范剑青)

Statistical Foundations of Data Science

Jianqing Fan(范剑青)

出版社

CRC Press

出版时间

2020-08-01

ISBN

9781466510845

评分

★★★★★
书籍介绍

Big data are ubiquitous. They come in varying volume, velocity, and variety. They have a deep impact on systems such as storages, communications

and computing architectures and analysis such as statistics, computation, optimization, and privacy. Engulfed by a multitude of applications, data science

aims to address the large-scale challenges of data analysis, turning big data

into smart data for decision making and knowledge discoveries. Data science

integrates theories and methods from statistics, optimization, mathematical

science, computer science, and information science to extract knowledge, make

decisions, discover new insights, and reveal new phenomena from data. The

concept of data science has appeared in the literature for several decades and

has been interpreted differently by different researchers. It has nowadays become a multi-disciplinary field that distills knowledge in various disciplines to

develop new methods, processes, algorithms and systems for knowledge discovery from various kinds of data, which can be either low or high dimensional,

and either structured, unstructured or semi-structured. Statistical modeling

plays critical roles in the analysis of complex and heterogeneous data and

quantifies uncertainties of scientific hypotheses and statistical results.

This book introduces commonly-used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level

textbook on the statistical foundations of data science as well as a research

monograph on sparsity, covariance learning, machine learning and statistical

inference. For a one-semester graduate level course, it may cover Chapters 2,

3, 9, 10, 12, 13 and some topics selected from the remaining chapters. This

gives a comprehensive view on statistical machine learning models, theories

and methods. Alternatively, one-semester graduate course may cover Chapters 2, 3, 5, 7, 8 and selected topics from the remaining chapters. This track

focuses more on high-dimensional statistics, model selection and inferences

but both paths emphasize a great deal on sparsity and variable selections.

Frontiers of scientific research rely on the collection and processing of massive complex data. Information and technology allow us to collect big data

of unprecedented size and complexity. Accompanying big data is the rise of

dimensionality and high dimensionality characterizes many contemporary statistical problems, from sciences and engineering to social science and humanities. Many traditional statistical procedures for finite or low-dimensional data

are still useful in data science, but they become infeasible or ineffective for

dealing with high-dimensional data. Hence, new statistical methods are indispensable. The authors have worked on high-dimensional statistics for two

decades, and started to write the book on the topics of high-dimensional data

analysis over a decade ago. Over the last decide, there have been surges in

interest and exciting developments in high-dimensional and big data. This led

us to concentrate mainly on statistical aspects of data science.

We aim to introduce commonly-used statistical models, methods and pro-

ii

cedures in data science and provide readers with sufficient and sound theoretical justifications. It has been a challenge for us to balance statistical theories

and methods and to choose the topics and works to cover since the amount

of publications in this emerging area is enormous. Thus, we focus on the

foundational aspects that are related to sparsity, covariance learning, machine

learning, and statistical inference.

Sparsity is a common assumption in the analysis of high-dimensional data.

By sparsity, we mean that only a handful of features embedded in a huge pool

suffice for certain scientific questions or predictions. This book introduces various regularization methods to deal with sparsity, including how to determine

penalties and how to choose tuning parameters in regularization methods and

numerical optimization algorithms for various statistical models. They can be

found in Chapters 3–6 and 8.

High-dimensional measurements are frequently dependent, since these variables often measure similar things, such as aspects of economics or personal

health. Many of these variables have heavy tails due to big number of collected

variables. To model the dependence, factor models are frequently employed,

which exhibit low-rank plus sparse s

目录
1 Introduction 3
1.1 Rise of Big Data and Dimensionality 3
1.1.1 Biological Sciences 4
1.1.2 Health Sciences 6
1.1.3 Computer and Information Sciences 7

显示全部
用户评论
又是一本很经典的书籍了。这个去年刚出版,是几位作者的工作的回顾和现有的理论的总结。非常推荐读一下。不过门槛有点高。
Z-Library
收藏