Pig编程指南

Name: Pig编程指南
Availability: InStock
Rating: 8 (26 reviews)
ISBN: 9787115301116

盖茨 (Alan Gates)

出版社

人民邮电出版社

出版时间

2013-01-31

ISBN

9787115301116

评分

★★★★★

标签

计算机

书籍介绍

《Pig编程指南》不仅为初学者讲解ApachePig的基础知识，同时也向有一定使用经验的高级用户介绍更加综合全面的Pig重要特性，如PigLatin脚本语言、控制台shell交互命令以及用于对Pig进行拓展的用户自定义函数(UDF)等。当读者有大数据处理需求时，《Pig编程指南》提供了如何更高效地使用Pig来完成需求的方法。

《Pig编程指南》适合各个层次的Pig用户及开发人员阅读使用。

AI导读

核心看点

系统讲解Pig Latin脚本语言与数据模型
深入解析UDF自定义函数及Grunt交互工具
提供高效处理大数据的实战方法与技巧

适合谁读

Apache Pig初学者及有一定经验的高级用户
从事Hadoop生态开发与大数据处理的技术人员
希望快速上手Hadoop但不精通Java的开发者

读前提醒

建议结合官网文档与GitHub源码同步学习
翻译质量一般，英文好的读者建议阅读原版
书中技术较老，需注意与当前Hadoop版本兼容性

读者共识

内容精悍到位，适合边工作边学习快速掌握
对非Java背景用户友好，是Hadoop入门神器
翻译水平参差不齐，部分读者建议直接看英文

本导读基于书籍简介、目录、原文摘录、短评和书评生成，不等同于全文精读。

精彩摘录

"Pig provides an engine for executing data flows in parallel on Hadoop. It includes a language, Pig Latin ..."
"To be mathematically precise, a Pig Latin script describes a directed acyclic graph (DAG), where the edges are data flows and the nodes are operators that process the data."
"That is, one reducer will get 10 or more times the data than other reducers. Pig has join and order by operators that will handle this case and (in some cases) rebalance the reducers."
"Users = load 'users' as (name, age); Fltrd = filter Users by age >= 18 and age <= 25; Pages = load 'pages' as (user, url); Jnd = join Fltrd by name, Pages by user; Grpd = group Jnd by url; Smmd = foreach Grpd generate group, COUNT(Jnd) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5"
"Because Hadoop is a distributed system and usually processes data in parallel, when it outputs data to a “file” it creates a directory with the file’s name, and each writer creates a separate part file in that directory."
"The only thing Pig needs to know to run on your cluster is the location of your cluster’s NameNode and JobTracker. The NameNode is the manager of HDFS, and the JobTracker coordinates MapReduce job"
"Casts to bytearrays are never allowed because Pig does not know how to represent the various data types in binary format."
"Pig does these joins in MapReduce by using the map phase to annotate each record with which input it came from. It then uses the join key as the shuffle key. Thus join forces a new reduce phase"