NVIDIA Pascal GP100 GPU Expected To Feature 12 TFLOPs of Single Precision Compute

0 196


NVIDIA Pascal GP100 GPU: Some information about Nvidia Pascal GPU upcoming FinFET based chip has been leaked out which shows the total computing performance. The information was brought up by 3DCenter. It is about Pascal based Drive PX 2 module, which is automobile super computer which will use Graphic power of GPU to drive the Graphic cards automatically. It is not mentioned by Nvidia about the Flagship chip in the presentation, but it is expected to know more on GTC in April 2016.

It is about flagship chip which is known as GP100 (the official name is unknown, it is expected), would feature on Graphics Cards like Tesla, Quardo, and GeForce. With efficiency improvements and better performance per watt, it is based on the 16nm FinFET process with double precision computation. During the performance analysis, some serious results in the performance per watt department were gained by Maxwell, NVIDIA’s Gen Architect.

The source of this information is slides being provided by iMacmatican , a Beyond3D forum member, who has given  good  details about Nvidia’s GPU Graphic Cards Change. Most of slides show some change to be made in design of GPU, The approximate values for NVIDIA’s CUDA generation of GPUs have been given below:

  • Volta: 22
  • Pascal: 14
  • Kepler: 5.5
  • Fermi: 2
  • Tesla: 0.5


As it is clear in the slides, that Volta is 22Giga Floating Point Operations per Second per Watt while on the other hand, Pascal is 14GFOPs per Watt which is more than twice of the dual Precision GFOPs/Watt.

Pascal is rated at 42GFOPs/Watt, Volta is rated as 73GFOPSs/Watt, while Maxwell is rated at 23 GFLOPs/W with the dual-chip offering pushing that up to 25 GFLOPs/W, while discussing the single Precision   or Single precision floating General Matrix Multiply (SGEMM). Pascal and the next upcoming generations of GPUs will come with mixed precision compute, allowing users to get twice the compute performance in FP16 Work Loads as  compared to FP32 by computing at 16-bit with twice the accuracy of FP32. Pascal would take that up to 85 GFLOPs/W while Volta would have up to 145 GFLOPs /W as compared to Maxwell which has just 26 half precision GFLOPs/W.

 NVIDIA is claiming that they have integrated the memory (HBM2) to be part of the actual GPU die, as mentioned in the slides. It means there are two possibilities. Either NVIDIA has actually managed to integrated HBM2 and a 16nm GPU on the same die or they could be using a similar design as the Fury cards from AMD which fuse the GPU and HBM chips on single interposer for a single chip solution just similar to SoC.

Read also: AMD Radeon 400 Series GPUs To Start Selling In April

What we know so far about Nvidia’s flagship Pascal GP100 GPU:

  • Pascal graphics architecture.
  • 2x performance per watt estimated improvement over Maxwell.
  • To launch in 2016, purportedly the second half of the year.
  • DirectX 12 feature level 12_1 or higher.
  • Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
  • Built on the 16nm FinFET manufacturing process from TSMC.
  • Allegedly has a total of 17 billion transistors, more than twice that of GM200.
  • Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.
  • Features a 4096-bit memory bus interface, same as AMD’s Fiji GPU power the Fury series.
  • Features NVLink (only compatible with next generation IBM PowerPC server processors)
  • Supports half precision FP16 compute at twice the rate of full precision FP32.
GPU Architecture NVIDIA Fermi NVIDIA Kepler NVIDIA Maxwell NVIDIA Pascal
GPU Process 40nm 28nm 28nm 16nm (TSMC FinFET)
Flagship Chip GF110 GK210 GM200 GP100
GPU Design SM (Streaming Multiprocessor) SMX (Streaming Multiprocessor) SMM (Streaming Multiprocessor Maxwell) TBA
Maximum Transistors 3.00 Billion 7.08 Billion 8.00 Billion Up to 17 Billion
Maximum Die Size 520mm2 561mm2 601mm2 TBA
Stream Processors Per Compute Unit 32 SPs 192 SPs 128 SPs TBA
Maximum CUDA Cores 512 CCs (16 CUs) 2880 CCs (15 CUs) 3072 CCs (24 CUs) TBA
Compute Performance 1.6 TFLOPs 5.1 TFLOPs 6.1 TFLOPs 12 TFLOPs
Maximum VRAM 1.5 GB GDDR5 6 GB GDDR5 12 GB GDDR5 32 GB HBM2
Maximum Bandwidth 192 GB/s 336 GB/s 336 GB/s 1 TB/s
Maximum TDP 244W 250W 250W 250W
Average Performance Increase over Predecessor +45% (GTX 580 Versus GTX 285) +55 (GTX Titan Black Versus GTX 580) +30% (GTX Titan X Versus GTX Titan Black) TBA
Flagship GPU Price (Consumer Only) $499 US (GTX 580) $999 US (GTX Titan Black) $999 US (GTX Titan X) TBA
Launch Year 2010 (GTX 580) 2014 (GTX Titan Black) 2015 (GTX Titan X) 2016

As mentioned in slides, NVIDIA Pascal GPU with Stacked DRAM (1 TB/s) featuring up to 4 TFLOPs of Double Precision (FP64) and 12 TFLOPs of Single Precision (FP32) compute performance. The Pascal-Solo GPU called by features just 1 GPU and has a 235W TDP.  The Tesla GPU with PCI-e Active/Passive cooling options is expected to be launched in Q2 in 2016.


GPU Family AMD Polaris NVIDIA Pascal
Flagship GPU Name TBC (Polaris Based) NVIDIA GP100/GP200
GPU Process GloFo 14nm FinFET TSMC 16nm FinFET
GPU Transistors 15-18 Billion ~17 Billion
HBM Memory (Consumers) Up to 16 GB (SK Hynix) HBM2 Up to 16 GB (SK Hynix/Samsung) HBM2
HBM Memory (Dual-Chip Professional/ HPC) 32 GB (SK Hynix) HBM2 32 GB (SK Hynix/Samsung) HBM2
HBM2 Bandwidth 1 TB/s (Peak) 1 TB/s (Peak)
Graphics Architecture GCN 4.0 (Polaris) Next-CUDA (Compute Oriented)
Successor of (GPU) Fiji (Radeon 300/Fury) GM200 (Maxwell)



Leave A Reply

Your email address will not be published.