Fundamentals of Data Engineering - Joe Reis, Matt Housley

Fundamentals of Data Engineering

Joe Reis, Matt Housley

出版时间

2022-10-18

ISBN

9781098108304

评分

★★★★★
书籍介绍
Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available in the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, governance, and deployment that are critical in any data environment regardless of the underlying technology. This book will help you: Assess data engineering problems using an end-to-end data framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle
AI导读
核心看点
  • 系统梳理数据工程全生命周期,涵盖生成到部署各环节
  • 强调以业务目标为导向,定义正确的数据架构与竞争优势
  • 提供云技术选型指南,帮助构建支持下游消费的数据系统
适合谁读
  • 希望全面了解数据工程领域、构建宏观认知的初学者
  • 有相关经验,需查漏补缺或进行技术选型的资深工程师
  • 数据科学家、分析师及希望转型数据工程领域的开发者
读前提醒
  • 内容偏概念与架构,缺乏具体代码实例,需结合实践理解
  • 建议先读第二部分再读第一部分,阅读体验可能更顺畅
  • 避免陷入技术复杂性,注重快速取胜与减少技术债务
读者共识
  • 被誉为数据工程领域的DDIA,体系全面且不过时
  • 适合查漏补缺,但细节较浅,一线实操者可能觉得抽象
  • 有助于明确DE角色定位,建立稳固的概念性基石

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "通常,公司会聘请整个大数据工程师团队,每年花费数百美元来照看这些平台。大数据工程师经常话费过多的时间来维护复杂的工具,而没有足够的时间来提供业务的洞察力和价值。"
  • "数据成熟度是指整个组织想着更高的数据利用率、功能和集成的方向发展。 我们的数据成熟度模型分为三个阶段,从数据开始,用数据扩展,以数据领先。"
  • "定义正确的数据架构。这意味着确定业务目标和你希望通过数据计划实现的竞争优势。努力建立一个支持这些目标的数据架构定义正确的数据架构。这意味着确定业务目标和你希望通过数据计划实现的竞争优势。努力建立一个支持这些目标的数据架构。 3、识别和审计将支持关键举措的数据,并在你设计的数据架构内运行. 4、为未来的数据分析师和数据科学家构建坚实的数据基础,以生成具有竞争价值的报告和模型。定义正确的数据架构。这意味着确定业务目标和你希望通过数据计划实现的竞争优势。努力建立一个支持这些目标的数据架构。 3、识别和审计将支持关键举措的数据,并在你设计的数据架构内运行. 4、为未来的数据分析师和数据科学家构建坚实的"
  • "1、快速取胜将确立数据在组织内的重要性。请记住,快速取胜可能会产生技术债。要制定减少债务的计划,否则会给未来的交付增加阻力。 2、走出去与人交谈,避免孤岛工作。我们经常看到数据团队在安全的环境中工作,不与部门外的人交流,也不从业务利益相关者那里获取观点和反馈。这样做的危险在于你会把更多的时间花在对人们没有什么用处的事情上。 3、避免无差别的繁重工作,不要让自己陷入不必要的技术复杂性之中,尽可能使用现成的整体解决方案。 4、仅在可以创造竞争优势的地方创建自定义解决方案和代码。"
作者简介
Joe Reis is a business-minded data nerd who’s worked in the data industry for 20 years, with responsibilities ranging from statistical modeling, forecasting, machine learning, data engineering, data architecture, and almost everything else in between. Joe is the CEO and Co-Founder of Ternary Data, a data engineering and architecture consulting firm based in Salt Lake City, Utah. In addition, he volunteers with several technology groups and teaches at the University of Utah. In his spare time, Joe likes to rock climb, produce electronic music, and take his kids on crazy adventures. Matt Housley is a data engineering consultant and cloud specialist. After some early programming experience with Logo, Basic and 6502 assembly, he completed a PhD in mathematics at the University of Utah. Matt then began working in data science, eventually specializing in cloud based data engineering. He co-founded Ternary Data with Joe Reis, where he leverages his teaching experience to train future data engineers and advise teams on robust data architecture. Matt and Joe also pontificate on all things data on The Monday Morning Data Chat.
用户评论
early release 版本写得挺像通讯约稿的/"The data engineer we discuss in this book can be described more precisely as a data lifecycle engineer."/工业界以数据为对象的生产实践中,数据科学和数据工程的分野。/Data Mesh, Serving Data for Analytics, Machine Learning, and Reverse ETL/中间有几章可以当作checklist
收藏