CUDA by Example - Jason Sanders, Edward Kandrot

CUDA by Example

Jason Sanders, Edward Kandrot

出版时间

2010-07-29

ISBN

9780131387683

评分

★★★★★
书籍介绍
"This book is required reading for anyone working with accelerator-based computing systems." --From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is required--just the ability to program in a modestly extended version of C. CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. You'll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance. Major topics covered include *Parallel programming*Thread cooperation*Constant memory and events*Texture memory*Graphics interoperability*Atomics*Streams*CUDA C on multiple GPUs*Advanced atomics*Additional CUDA resources All the CUDA software tools you'll need are freely available for download from NVIDIA. http://developer.nvidia.com/object/cuda-by-example.html
精彩摘录
  • "CUDA C编译器对共享内存中的变量与普通变量将分别采取不同的处理方式。对于在GPU上启动的每个线程块,CUDA C编译器都将创建该变量的一个副本。线程块中的每个线程都共享这块内存,但线程却无法看到也不能修改其他线程块的变量副本。这就实现了一种非常好的方式,使得一个线程块中的多个线程能够在计算上进行通信和协作。"
  • "与从全局内存中读取数据相比,从常量内存中读取相同的数据可以节约内存带宽,原因有两个: 1.对常量内存的单词读操作可以广播到其他的邻近线程,这将节约15次读取操作; 2.常量内存的数据将缓存起来,因此对相同地址的连续读操作将不会产生额外的内存通信量。"
  • "CUDA架构中,warp是指一个包含32个线程的集合,这个线程集合以步调一致的形式执行,warp中的每个thread都将在不同的数据上执行相同的指令。 当处理常量内存时,nVidia硬件将把单次内存读取操作广播到每个half warp(即16个thread)。如果在half warp中的每个线程都从常量内存的相同地址上读取数据,那么GPU只会产生一次读取请求并在随后将数据广播到每个thread。"
用户评论
写的很好,就是有点老。
Introduction to CUDA GPU parallel programming, only introduction. Most examples for GPU are compared to those for CPU.
读了几本英文,发现老外写书很活泼。不错不错。
Just the beginning .
挺适合初学者通读。 缺少整体的架构,不适合进一步。
当作cuda编程的入门教材是不错的,但仅限于入门(
N 年前看过,笔记做的非常仔细,但之后一直没什么机会用 CUDA,导致忘得差不多了;重温了一遍,书里例子比较多,比较适合快速上手。
这本书简短,但是过一遍里面的cuda代码,基本能理解cuda的一些概念:block/threads,shared memory,const memeory, zero copy.....
第一次用户视角看GPU编程,看到什么API总会不自觉的对应到以前在NV搞的那些指令和engine,然后会心一笑。
收藏