Pig编程指南 - 盖茨 (Alan Gates)

Pig编程指南

盖茨 (Alan Gates)

出版时间

2013-01-31

ISBN

9787115301116

评分

★★★★★

标签

计算机

书籍介绍

《Pig编程指南》不仅为初学者讲解ApachePig的基础知识,同时也向有一定使用经验的高级用户介绍更加综合全面的Pig重要特性,如PigLatin脚本语言、控制台shell交互命令以及用于对Pig进行拓展的用户自定义函数(UDF)等。当读者有大数据处理需求时,《Pig编程指南》提供了如何更高效地使用Pig来完成需求的方法。

《Pig编程指南》适合各个层次的Pig用户及开发人员阅读使用。

AI导读
核心看点
  • 系统讲解Pig Latin脚本语言与数据模型
  • 深入解析UDF自定义函数及Grunt交互工具
  • 提供高效处理大数据的实战方法与技巧
适合谁读
  • Apache Pig初学者及有一定经验的高级用户
  • 从事Hadoop生态开发与大数据处理的技术人员
  • 希望快速上手Hadoop但不精通Java的开发者
读前提醒
  • 建议结合官网文档与GitHub源码同步学习
  • 翻译质量一般,英文好的读者建议阅读原版
  • 书中技术较老,需注意与当前Hadoop版本兼容性
读者共识
  • 内容精悍到位,适合边工作边学习快速掌握
  • 对非Java背景用户友好,是Hadoop入门神器
  • 翻译水平参差不齐,部分读者建议直接看英文

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "Pig provides an engine for executing data flows in parallel on Hadoop. It includes a language, Pig Latin ..."
  • "To be mathematically precise, a Pig Latin script describes a directed acyclic graph (DAG), where the edges are data flows and the nodes are operators that process the data."
  • "That is, one reducer will get 10 or more times the data than other reducers. Pig has join and order by operators that will handle this case and (in some cases) rebalance the reducers."
  • "Users = load 'users' as (name, age); Fltrd = filter Users by age >= 18 and age <= 25; Pages = load 'pages' as (user, url); Jnd = join Fltrd by name, Pages by user; Grpd = group Jnd by url; Smmd = foreach Grpd generate group, COUNT(Jnd) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5"
  • "Because Hadoop is a distributed system and usually processes data in parallel, when it outputs data to a “file” it creates a directory with the file’s name, and each writer creates a separate part file in that directory."
  • "The only thing Pig needs to know to run on your cluster is the location of your cluster’s NameNode and JobTracker. The NameNode is the manager of HDFS, and the JobTracker coordinates MapReduce job"
  • "Casts to bytearrays are never allowed because Pig does not know how to represent the various data types in binary format."
  • "Pig does these joins in MapReduce by using the map phase to annotate each record with which input it came from. It then uses the join key as the shuffle key. Thus join forces a new reduce phase"
目录
第1章 初识Pig
1.1 Pig是什么?
1.1.1 Pig是基于Hadoop的
1.1.2 Pig Latin,一种并行数据流语言
1.1.3 Pig的用途

显示全部
用户评论
现在每次学习一个新的东西总是会有一种很激动的感觉!而且在工作中边用边学是最快的一种学习方式吧
好顶赞
配合官网上面那个文档看在合适不过了 赞!
书是不错,但是不建议太依赖pig
总体来说非常不错的书籍,对着书中源码实战一遍,就掌握的差不多了
翻译得一般般。。。。。。
收藏