Designing Data-Intensive Applications - Martin Kleppmann

Designing Data-Intensive Applications

Martin Kleppmann

出版时间

2017-04-02

ISBN

9781449373320

评分

★★★★★
书籍介绍
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures
AI导读
核心看点
  • 深入解析分布式系统核心概念,如一致性、可用性与分区容错性。
  • 全面对比关系型数据库、NoSQL及流批处理技术的优劣与适用场景。
  • 梳理数据分片、复制与事务隔离机制,揭示底层设计原理与权衡。
适合谁读
  • 从事后端开发、架构设计,需处理海量数据与高并发系统的工程师。
  • 准备系统架构面试,希望建立完整分布式知识体系的技术求职者。
  • 对大数据底层原理感兴趣,希望从理论高度理解技术演进的读者。
读前提醒
  • 内容理论性强且跨度大,建议结合过往项目经验阅读以加深理解。
  • 不必强求一次性读懂所有细节,可先构建宏观认知,再按需深入章节。
  • 书中引用大量经典论文,适合作为索引,引导读者进一步探索前沿技术。
读者共识
  • 被誉为分布式系统领域的圣经,是程序员进阶架构师的必读经典之作。
  • 不仅讲解技术实现,更强调设计背后的权衡思维,具有极高的指导价值。
  • 部分章节较为晦涩难懂,建议有实战经验的读者阅读,效果更佳。

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "UDP is a good choice in situations where delayed data is worthless. For example, in a VoIP phone call, there probably isn’t enough time to retransmit a lost packet before its data is due to be played over the loudspeakers. In this case, there’s no point in retransmitting the packet—the application m"
  • "For data warehouse queries that need to scan over millions of rows, a big bottleneck is the bandwidth for getting data from disk into memory. However, that is not the only bottleneck. Developers of analytical databases also worry about efficiently using the bandwidth from main memory into the CPU ca"
  • "Sending a packet over a network or making a request to a network service is normally a transient operation that leaves no permanent trace. Although it is possible to record it permanently (using packet capture and logging), we normally don’t think of it that way. Even message brokers that durably wr"
  • "SSI is fairly new: it was first described in 2008 [40] and is the subject of Michael Cahill's PhD thesis. ... it has the possibility of being fast enough to become the enw default in the future."
  • "Detecting writes that affect prior reads (the write occurs after the read)."
  • "In the context of two-phased locking we discussed index-range locks. .. Wee can use a similar technique here, except that SSI don't block other transactions. When a transaction writes to the database, it must look in the indexes for any other transactions that have recently read the affected data. T"
  • "In practice, isolation is unfortunately not that simple. Serializable isolation has a performance cost, and many databases don’t want to pay that price. It’s therefore common for systems to use weaker levels of isolation, which protect against some concurrency issues, but not all. Those levels of is"
  • "One thing that document and graph databases have in common is that they typically don't enforce a schema for the data they store, which can make it easier to adapt applications to changing requirements. However, your application most likely still assumes that data has a certain structure; it's just "
作者简介
Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes. Martin is a regular conference speaker, blogger, and open source contributor. He believes that profound technical ideas should be accessible to everyone, and that deeper understanding will help us develop better software.
用户评论
我靠,这本书实在太牛了。 赶紧读!赶紧读!赶紧读!
值得再读一遍。分布式数据系统 真•big picture
蛮好的,大数据、分布式系统的基础书,都琢磨透了架构师妥妥的 线性一致性这章需要深入研究一下。 准备再读一遍
挺适合准备系统设计面试的,twitter的pull, push模型,database sharding 和 replication都讲得比较清楚
主要看了前两部分,我觉得是最好的数据库/分布式存储的入门扫盲书,每章后面引用的paper可以让你更深入。
(这版热度更高就把短评写在这里:)希望豆瓣 CEO 能强制每位工程师通读本书。
偏数据库的书 看完一遍还是一团麻🤡东西太多了
初略的看了一遍,很多地方还不了解,不过此书需要多读。 太经典了,data system的方方面面都讲到了
真正的深入浅出,对于后端分布式开发,点出了很多切实的常见问题,和一些解决方案。另外书中的结果一直在提醒unreliable dependency,scalable,fault tolerance,是在设计任何feature的时候都应该去考虑的方面。
正读第二遍。确实很不错的概念书,但是对于底层没有那么详细的解释
收藏