The most fierce battle ever between AMD and NVIDIA has begun, since NVIDIA has just introduced the Tesla P100 GPU, the most advanced hyperscale data center accelerator ever built.
Based on Pascal GPU architecture with five breakthrough technologies, the Tesla P100 packs 3584 CUDA cores, 15.3 billion transistors, 16GB of CoWoS HBM2 stacked memory, delievering 10.6 teraflops single-precision performance.
Picture courtesy of Anandtech
* 5.3 teraflops double-precision performance, 10.6 teraflops single-precision performance and 21.2 teraflops half-precision performance with NVIDIA GPU BOOST technology
* 160GB/sec bi-directional interconnect bandwidth with NVIDIA NVLink
* 16GB of CoWoS HBM2 stacked memory
* 720GB/sec memory bandwidth with CoWoS HBM2 stacked memory
* Enhanced programmability with page migration engine and unified memory
* ECC protection for increased reliability
* Server-optimized for highest data center throughput and reliability
* NVIDIA Pascal architecture for exponential performance leap — A Pascal-based Tesla P100 solution delivers over a 12x increase in neural network training performance compared with a previous-generation NVIDIA Maxwell-based solution.
* NVIDIA NVLink for maximum application scalability — The NVIDIA NVLink high-speed GPU interconnect scales applications across multiple GPUs, delivering a 5x acceleration in bandwidth compared to today’s best-in-class solution1. Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximize application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication.
* 16nm FinFET for unprecedented energy efficiency — With 15.3 billion transistors built on 16 nanometer FinFET fabrication technology, the Pascal GPU is the world’s largest FinFET chip ever built2. It is engineered to deliver the fastest performance and best energy efficiency for workloads with near-infinite computing needs.
* CoWoS with HBM2 for big data workloads — The Pascal architecture unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design, Chip on Wafer on Substrate (CoWoS) with HBM2, provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the Maxwell architecture.
* New AI algorithms for peak performance — New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning.
The Tesla P100 GPU is becoming available in the NVIDIA DGX-1 deep learning system in June, and expected to arrive in early 2017 from leading server manufacturers.