Natural Language Processing with Python

Steven Bird, Ewan Klein, Edward Loper

出版时间

2009-07-10

ISBN

9780596516499

评分

★★★★★
书籍介绍
This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to automatic summarization and translation. With Natural Language Processing with Python, you'll learn how to write Python programs to work with large collections of unstructured text. You'll access richly-annotated datasets using a comprehensive range of linguistic data structures. And you'll understand the main algorithms for analyzing the content and structure of written communication. Packed with examples and exercises, Natural Language Processing with Python will help you: * Extract information from unstructured text, to guess the topic or identify "named entities" * Analyze linguistic structure in text, including parsing and semantic analysis * Access popular linguistic databases, including WordNet and treebanks * Integrate techniques drawn from fields as diverse as linguistics and artificial intelligence Perfect for individual study, or as a classroom and workshop textbook, this book will help you gain practical skills in Natural Language Processing using the Python programming language and the Natural Language Toolkit (NLTK) open source library. If you're interested in developing Web applications, analyzing multilingual news sources, documenting endangered languages, or if you are simply curious to have a programmer's perspective on how human language works, you will find Natural Language Processing with Python both fascinating and immensely useful.
AI导读
核心看点
  • 基于Python与NLTK库,系统讲解自然语言处理核心算法
  • 涵盖文本分词、词性标注及命名实体识别等实用技术
  • 提供大量代码示例,帮助处理非结构化文本数据
适合谁读
  • 具备Python基础,希望入门自然语言处理的开发者
  • 对文本挖掘、语言学或数据挖掘感兴趣的科研人员
  • 需要快速上手NLTK工具库进行文本分析的初学者
读前提醒
  • 本书基于Python 2,建议参考官网Python 3修订版
  • 内容侧重英文处理,中文NLP需自行迁移思路
  • 理论深度有限,更适合实践操作而非理论研究
读者共识
  • 实例丰富且编排合理,是公认的优秀入门教材
  • 对非计算机专业友好,兼顾Python语法与NLP知识
  • 部分高级章节难度较大,需结合其他资料深入

本导读基于书籍简介、目录、原文摘录、短评和书评生成,不等同于全文精读。

精彩摘录
  • "A part-of-speech tagger, or POS tagger, processes a sequence of words, and attaches a part of speech tag to each word (don’t forget to import nltk):"
  • "Here we see that and is CC, a coordinating conjunction; now and completely are RB, or adverbs; for is IN, a preposition; something is NN, a noun; and different is JJ, an adjective. NLTK provides documentation for each tag, which can be queried using the tag, e.g., nltk.help.upenn_tagset('RB'), or a "
  • "Notice that refuse and permit both appear as a present tense verb (VBP) and a noun (NN). E.g., refUSE is a verb meaning “deny,” while REFuse is a noun meaning “trash” (i.e., they are not homophones). Thus, we need to know which word is being used in order to pronounce the text correctly. (For this r"
  • "The text.similar() method takes a word w, finds all contexts w1w w2, then finds all words w' that appear in the same context, i.e. w1w'w2."
  • "A tagger can also model our knowledge of unknown words; for example, we can guess that scrobbling is probably a verb, with the root scrobble, and likely to occur in contexts like he was scrobbling."
  • "Tagged corpora use many different conventions for tagging words. To help us get star- ted, we will be looking at a simplified tagset (shown in Table 5-1)."
  • "Example 5-1. Program to find the most frequent noun tags."
用户评论
NLTK 指导书!看了前七章,后面的句法分析没看了
扫了前200页,基本上还啥也没讲呢
后面就开始看不懂了TT
很多非常适合Python实践的例子,但是对我来说我希望得到更多理论上的理解,但是本书更多是基于一个库的实践指导吧
觉得一般...
入门:用python做computational semantics
我读的电子版,从封面看也许不是同一本。个人认为这本有点像教人怎么用package,更像工具书而非入门读物。
神经网络之前的自然语言处理,NLTK工具书,从序列标注到语义分析。一阶逻辑那里不怎么清楚,别的都挺好的。
这好像是我完完整整看完实践过的第一本编程书...惭愧
读了前三章,完全就是nltk说明书和基本的文本处理方法,后面的章节也扫了一眼,感觉介绍的东西过于细节且有些过时。个人所见,目前主流的NLP,已经是基于word embedding representation结合neural network deep learning了,而不是探究grammar,句式结构,单词词性这些了。
下载
收藏