R for Data Science

Hadley Wickham

出版时间

2016-12-25

ISBN

9781491910399

评分

★★★★★
书籍介绍

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible.

Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way.

You’ll learn how to:

• Wrangle—transform your datasets into a form convenient for analysis

• Program—learn powerful R tools for solving data problems with greater clarity and ease

• Explore—examine your data, generate hypotheses, and quickly test them

• Model—provide a low-dimensional summary that captures true "signals" in your dataset

• Communicate—learn R Markdown for integrating prose, code, and results.

Hadley Wickham is an Assistant Professor and the Dobelman FamilyJunior Chair in Statistics at Rice University. He is an active memberof the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization. His research focuses on how to make data analysis ...

(展开全部)

AI导读
核心看点
  • 系统介绍tidyverse套件,重塑R语言数据分析范式
  • 涵盖数据导入、整理、可视化及建模的完整工作流
  • 强调探索性数据分析思维,注重代码可读性与效率
适合谁读
  • 零基础或希望转型使用现代R范式的初学者
  • 从事数据分析、统计学研究及相关领域的从业者
  • 熟悉Python Pandas但想体验R优雅语法的开发者
读前提醒
  • 建议配合RStudio环境边读边敲代码,实践出真知
  • 重点理解tidy data理念及管道操作符的使用逻辑
  • 若需深入底层原理,建议后续补充阅读Advanced R
读者共识
  • tidyverse极大提升了R语言的易用性与代码优雅度
  • 作为入门书极佳,但理论深度较浅,适合快速上手
  • ggplot2与dplyr的组合被公认为数据分析的利器

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "正确问题的近似答案通常是模糊的,但它远远胜过错误问题的确切答案,尽管后者总是很精确。——John Tukey"
  • "正如Jamie Zawinsk下面所说 当遇到一个问题时,有些人会这样想:“我可以用正则表达式来搞定它。”于是,原来的一个问题就变成了两个问题。"
  • "ggplot2只能同时使用六种形状,默认情况下,当使用这种图形属性时,多出的变量值将不会出现图中。"
  • "只要将一个图形属性映射为一个离散变量,就会自动对数据进行分组来绘制多个几何对象。"
  • "使用RStudio快捷键RStudio:Alt+-(Alt加上减号)……会自动在复制的两端加上空格,这是一个非常好的编码习惯。读代码是苦中作乐的一件事情,因此,用空格让你的眼睛稍感轻松吧。"
  • "支持管道操作是tidyverse中的R包的核心原则之一。唯一的例外就是ggplot2:它是在发现管道方式前开发的。ggplot2的下一个版本ggvis支持管道操作,遗憾的是其还没有达到成熟完备的程度。"
  • "当使用多个变量进行分组时,每次的摘要统计会用掉一个分组变量。这样就可以轻松地对数据集进行循序渐进的分析"
  • "探索性数据分析(EDA)...是一个可迭代的循环过程具有以下作用。 (1)对数据提出问题 (2)对数据进行可视化、转换和建模,进而找出问题的答案。 (3)上一个步骤的结果来精炼问题,并提出新问题。 EDA并不是具有严格规则的正式过程,它首先是一种思维状态。"
作者简介
Hadley Wickham is an Assistant Professor and the Dobelman FamilyJunior Chair in Statistics at Rice University. He is an active memberof the R community, has written and contributed to over 30 R packages, and won the John Chambers Award for Statistical Computing for his work developing tools for data reshaping and visualization. His research focuses on how to make data analysis better, faster and easier, with a particular emphasis on the use of visualization to better understand data and models. Garrett Grolemund is a statistician, teacher and R developer who currently works for RStudio. He sees data analysis as a largely untapped fountain of value for both industry and science. Garrett received his Ph.D at Rice University in Hadley Wickham's lab, where his research traced the origins of data analysis as a cognitive process and identified how attentional and epistemological concerns guide every data analysis. Garrett is passionate about helping people avoid the frustration and unnecessary learning he went through while mastering data analysis. Even before he finished his dissertation, he started teaching corporate training in R and data analysis for Revolutions Analytics. He's taught at Google, eBay, Axciom and many other companies, and is currently developing a training curriculum for RStudio that will make useful know-how even more accessible. Outside of teaching, Garrett spends time doing clinical trials research, legal research, and financial analysis. He also develops R software, he's co-authored the lubridate R package which provides methods to parse, manipulate, and do arithmetic with date-times and wrote the ggsubplot package, which extends the ggplot2 package.
目录
Chapter 1 Data Visualization with ggplot2
Chapter 2 Workflow: Basics
Chapter 3 Data Transformation with dplyr
Chapter 4 Workflow: Scripts
Chapter 5 Exploratory Data Analysis

显示全部
用户评论
刷新对R的三观
读得不太认真,有需要再复习吧。R毕竟不是专门做统计的,大部分场合下当然没有stata之类直观,纵有万般好,也不必过分推崇。我决定实用主义地穿插着用各种软件,虽然姿势难看,但是省劲儿啊……
大牛的作品,果然不一般,清楚简明易懂。
大概因为有点pandas和sklearn的基础,一刷非常顺畅~暑假跟着R for finance类的教材初学R真是不堪回首=。= #选对入门书太重要!
tidyverse太美啦!
想了想,还是学习了tidyverse,主要是自己用起来方便。
常常查阅
A great tutorial for learning tidyverse and base R (though barely) programming. It could've been even better if there's a real life data analysis project where all the packages and functions are interwoven into a single case study.
英文版再看一次
多好的一本书,只能怪自己又笨又懒🥲
收藏