集体智慧编程

TOBY SEGARAN

出版时间

2008-12-31

ISBN

9787121075391

评分

★★★★★

标签

编程

书籍介绍

本书以机器学习与计算统计为主题背景,专门讲述如何挖掘和分析Web上的数据和资源,如何分析用户体验、市场营销、个人品味等诸多信息,并得出有用的结论,通过复杂的算法来从Web网站获取、收集并分析用户的数据和反馈信息,以便创造新的用户价值和商业价值。全书内容翔实,包括协作过滤技术(实现关联产品推荐功能)、集群数据分析(在大规模数据集中发掘相似的数据子集)、搜索引擎核心技术(爬虫、索引、查询引擎、PageRank算法等)、搜索海量信息并进行分析统计得出结论的优化算法、贝叶斯过滤技术(垃圾邮件过滤、文本过滤)、用决策树技术实现预测和决策建模功能、社交网络的信息匹配技术、机器学习和人工智能应用等。

本书是Web开发者、架构师、应用工程师等的绝佳选择。

AI导读
核心看点
  • 以Python代码实战讲解机器学习算法
  • 涵盖推荐系统、聚类、搜索与优化
  • 侧重工程应用而非深奥数学推导
适合谁读
  • Web开发者与架构师
  • 缺乏数学基础的数据挖掘初学者
  • 有理论背景但缺实践经验的工程师
读前提醒
  • 部分API与代码实现可能已过时
  • 数学原理讲解较浅,需结合其他书
  • 建议配合机器学习课程同步阅读
读者共识
  • 实战性极强,是优秀的入门敲门砖
  • 内容简洁务实,适合快速了解领域
  • 部分读者认为后期内容晦涩难懂

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "Next, get a list of random people to make up the dataset. Fortunately, Hot or Not provides an API call that returns a list of people with specified criteria. In this exam- ple, the only criteria will be that the people have “meet me” profiles, since only from these profiles can you get other informa"
  • "What Does This Have to Do with the Articles Matrix? So far, what you have is a matrix of articles with word counts. The goal is to factorize this matrix, which means finding two smaller matrices that can be multiplied together to reconstruct this one. The two smaller matrices are: The features matri"
  • "Another feature that applies more evenly to a couple of companies is this one: Feature 2 (46151801.813632453, 'GOOG') (24298994.720555616, 'YHOO') (10606419.91092159, 'PG') (7711296.6887903402, 'CVX') (4711899.0067871698, 'BIIB') (4423180.7694432881, 'XOM') (3430492.5096612777, 'DNA') (2882726.88776"
  • "Because new connections are only created when necessary, this method has to return a default value if there are no connections. For links from words to the hidden layer, the default value will be –0.2 so that, by default, extra words will have a slightly negative effect on the activation level of a "
  • "Pearson Correlation Score A slightly more sophisticated way to determine the similarity between people’s inter- ests is to use a Pearson correlation coefficient. The correlation coefficient is a mea- sure of how well two sets of data fit on a straight line. The formula for this is more complicated t"
  • "Simulated annealing is an optimization method inspired by physics. Annealing is the process of heating up an alloy and then cooling it down slowly. Because the atoms are first made to jump around a lot and then gradually settle into a low energy state, the atoms can find a low energy configuration."
  • "The flight scheduling example works because moving a person from the second to the third flight of the day would probably change the overall cost by a smaller amount than moving that person to the eighth flight of the day would. If the flights were in random order, the optimization methods would wor"
  • "Squaring the numbers is common practice because it makes large differences count for even more. This means an algorithm that is very close most of the time but far off occasionally will fare worse than an algorithm that is always somewhat close. This is often desired behavior, but there are situatio"
作者简介
Toby Segaran是Genstruct公司的软件开发主管,这家公司涉足计算生物领域,他本人的职责是设计算法,并利用数据挖掘技术来辅助了解药品机理。Toby Segaran还为其他几家公司和数个开源项目服务,帮助它们从收集到的数据当中分析并发掘价值。除此以外,Toby Segaran还建立了几个免费的网站应用,包括流行的tasktoy和Lazybase。他非常喜欢滑雪与品酒,其博客地址是blog.kiwitobes.com,现居于旧金山。
目录
前言
第1章 集体智慧导言
什么是集体智慧
什么是机器学习
机器学习的局限

显示全部
用户评论
介绍了基本思想,入门极佳
难怪算法工程师们都对此书很不屑嘛~
非常实用的宝典,看了这本书,如果有工具,可以解决大部分问题,只是不懂python,实现部分看不懂啊
真是好实战哇!
为毛叫集体?这个词儿在汉语里就是上世纪那场浩劫,还有对个性人性无情的扼杀,让我心有余悸
出版的时候中文机器学习的书还不多,现在再看觉得简单了
16年买的书,那个时候就已经过时了,今天翻出来看完了,很喜欢这种教学风格,比较工程,但是这也意味着它时效性比较差…
结合python深度学习,继续深入了解用
大一的时候看了本书,但其实当时没太看懂。。,实际上里面的内容是当今互联网公司赚钱的核心逻辑
本科入门挺适合的
下载
收藏