NVIDIA Pascal GP100 GPU
NVIDIA Pascal GP100 GPU: Some information about Nvidia Pascal GPU upcoming FinFET based chip has been leaked out which shows the total computing performance. The information was brought up by 3DCenter. It is about Pascal based Drive PX 2 module, which is automobile super computer which will use Graphic power of GPU to drive the Graphic cards automatically. It is not mentioned by Nvidia about the Flagship chip in the presentation, but it is expected to know more on GTC in April 2016.
It is about flagship chip which is known as GP100 (the official name is unknown, it is expected), would feature on Graphics Cards like Tesla, Quardo, and GeForce. With efficiency improvements and better performance per watt, it is based on the 16nm FinFET process with double precision computation. During the performance analysis, some serious results in the performance per watt department were gained by Maxwell, NVIDIA’s Gen Architect.
The source of this information is slides being provided by iMacmatican , a Beyond3D forum member, who has given good details about Nvidia’s GPU Graphic Cards Change. Most of slides show some change to be made in design of GPU, The approximate values for NVIDIA’s CUDA generation of GPUs have been given below:
- Volta: 22
- Pascal: 14
- Kepler: 5.5
- Fermi: 2
- Tesla: 0.5
As it is clear in the slides, that Volta is 22Giga Floating Point Operations per Second per Watt while on the other hand, Pascal is 14GFOPs per Watt which is more than twice of the dual Precision GFOPs/Watt.
Pascal is rated at 42GFOPs/Watt, Volta is rated as 73GFOPSs/Watt, while Maxwell is rated at 23 GFLOPs/W with the dual-chip offering pushing that up to 25 GFLOPs/W, while discussing the single Precision or Single precision floating General Matrix Multiply (SGEMM). Pascal and the next upcoming generations of GPUs will come with mixed precision compute, allowing users to get twice the compute performance in FP16 Work Loads as compared to FP32 by computing at 16-bit with twice the accuracy of FP32. Pascal would take that up to 85 GFLOPs/W while Volta would have up to 145 GFLOPs /W as compared to Maxwell which has just 26 half precision GFLOPs/W.
NVIDIA is claiming that they have integrated the memory (HBM2) to be part of the actual GPU die, as mentioned in the slides. It means there are two possibilities. Either NVIDIA has actually managed to integrated HBM2 and a 16nm GPU on the same die or they could be using a similar design as the Fury cards from AMD which fuse the GPU and HBM chips on single interposer for a single chip solution just similar to SoC.
What we know so far about Nvidia’s flagship Pascal GP100 GPU:
- Pascal graphics architecture.
- 2x performance per watt estimated improvement over Maxwell.
- To launch in 2016, purportedly the second half of the year.
- DirectX 12 feature level 12_1 or higher.
- Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
- Built on the 16nm FinFET manufacturing process from TSMC.
- Allegedly has a total of 17 billion transistors, more than twice that of GM200.
- Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.
- Features a 4096-bit memory bus interface, same as AMD’s Fiji GPU power the Fury series.
- Features NVLink (only compatible with next generation IBM PowerPC server processors)
- Supports half precision FP16 compute at twice the rate of full precision FP32.
|GPU Architecture||NVIDIA Fermi||NVIDIA Kepler||NVIDIA Maxwell||NVIDIA Pascal|
|GPU Process||40nm||28nm||28nm||16nm (TSMC FinFET)|
|GPU Design SM||(Streaming Multiprocessor)||SMX (Streaming Multiprocessor)||SMM (Streaming Multiprocessor Maxwell)||TBA|
|Maximum Transistors||3.00 Billion||7.08 Billion||8.00 Billion||Up to 17 Billion|
|Maximum Die Size||520mm2||561mm2||601mm2||TBA|
|Stream Processors Per Compute Unit||32 SPs||192 SPs||128 SPs||TBA|
|Maximum CUDA Cores||512 CCs (16 CUs)||2880 CCs (15 CUs)||3072 CCs (24 CUs)||TBA|
|Compute Performance||1.6 TFLOPs||5.1 TFLOPs||6.1 TFLOPs||12 TFLOPs|
|Maximum VRAM||1.5 GB GDDR5||6 GB GDDR5||12 GB GDDR5||32 GB HBM2|
|Maximum Bandwidth||192 GB/s||336 GB/s||336 GB/s||1 TB/s|
|Average Performance Increase over Predecessor||+45% (GTX 580 Versus GTX 285)||+55 (GTX Titan Black Versus GTX 580)||+30% (GTX Titan X Versus GTX Titan Black)||TBA|
|Flagship GPU Price (Consumer Only)||$499 US (GTX 580)||$999 US (GTX Titan Black)||$999 US (GTX Titan X)||TBA|
|Launch Year||2010 (GTX 580)||2014 (GTX Titan Black)||2015 (GTX Titan X)||2016|
As mentioned in slides, NVIDIA Pascal GPU with Stacked DRAM (1 TB/s) featuring up to 4 TFLOPs of Double Precision (FP64) and 12 TFLOPs of Single Precision (FP32) compute performance. The Pascal-Solo GPU called by features just 1 GPU and has a 235W TDP. The Tesla GPU with PCI-e Active/Passive cooling options is expected to be launched in Q2 in 2016.
|GPU Family||AMD Polaris||NVIDIA Pascal|
|Flagship GPU||Name TBC (Polaris Based)||NVIDIA GP100/GP200|
|GPU Process||GloFo 14nm FinFET||TSMC 16nm FinFET|
|GPU Transistors||15-18 Billion||~17 Billion|
|HBM Memory (Consumers)||Up to 16 GB (SK Hynix) HBM2||Up to 16 GB (SK Hynix/Samsung) HBM2|
|HBM Memory (Dual-Chip Professional/ HPC)||32 GB (SK Hynix) HBM2||32 GB (SK Hynix/Samsung) HBM2|
|HBM2 Bandwidth||1 TB/s (Peak)||1 TB/s (Peak)|
|Graphics Architecture||GCN 4.0 (Polaris)||Next-CUDA (Compute Oriented)|
|Successor of (GPU)||Fiji (Radeon 300/Fury)||GM200 (Maxwell)|