NVIDIA introduces the most powerful graphics accelerators on the market. These cards can accelerate many massively parallel tasks and scientific (HPC) applications or efficiently solve artificial intelligence (AI) algorithms. We’ve put together a guide to choosing NVIDIA GPU cards.
GPU | L4 | A16 | A40 | L40S | A100 SXM4 | PCIe | H100 PCIe | H100 | H200 SXM5 | B100 | B200 |
---|---|---|---|---|---|---|---|---|---|
Architecture | Ada Lovelace | Ampere | Ampere | Ada Lovelace | Ampere | Hopper | Hopper | Blackwell | Blackwell |
Card chip | AD104 | GA107 | GA102 | AD102 | GA100 | GH100 | GH100 | B100 | B200 |
# CUDA cores | 7 680 | 4x 1 280 | 10 752 | 18 176 | 6 912 | 14 592 | 16 896 | TBA | TBA |
# Tensor cores | 240 | 4x 40 | 336 | 568? | 432 | 456 | 528 | TBA | TBA |
FP64 (TFlops) | 0,49 | 0,271 | 1,179 | 1,413? | 9,69 | 26 | 34 | TBA | TBA |
FP64 Tensor (TFlops) | — | — | — | — | 19,5 | 51 | 67 | 30 | 40 |
FP32 (TFlops) | 30,3 | 4x 4,5 | 37,4 | 91,6 | 19,5 | 51 | 67 | TBA | TBA |
TF32 Tensor (TFlops) | 120* | 4x 18* | 150* | 366* | 312* | 756* | 989* | 1 800 | 2 200 |
FP16 Tensor (TFlops) | 242* | 4x 35,9* | 299* | 733* | 624* | 1 513* | 1 979* | 3 500 | 4 500 |
INT8 Tensor (TOPS) | FP8 485* | 4x 71,8* | 599* | 1466* | 1 248* | 3 026* | 3 958* | 7 000 | 9 000 |
GPU memory | 24 GB | 4x 16 GB | 48 GB | 48 GB | 80 | 40 GB | 80 GB | 80 | 141 GB | 192 GB | 192 GB |
Memory technology | GDDR6 | GDDR6 | GDDR6 | GDDR6 | HBM2 | HBM3 | HBM3 | HBM3e | HBM3e |
Memory throughput | 300 GB/s | 4x 200 GB/s | 696 GB/s | 864 GB/s | 1 935 | 2 039 GB/s | 2 TB/s | 3,3 | 4.8 TB/s | 8 TB/s | 8 TB/s |
Multi-Instance GPU | vGPU | vGPU | vGPU | vGPU | 7 instances | 7 instances | 7 instances | TBA | TBA |
NVENC | NVDEC | JPEG engines | 2 | 4 | 4 | 4 | 8 | 1 | 2 | 3 | 3 | 4 | 0 | 5 | 5 | 0 | 7 | 7 | 0 | 7 | 7 | TBA | TBA |
GPU link | PCIe 4 | PCIe 4 | NVLink 3 | PCIe 4 | NVLink 3 | NVLink 4 | NVLink 4 | NVLink 5 | NVLink 5 |
Power consumption | 40-72W | 250 W | 300 W | 350 W | 400W | 300W | 350W | 700W | 700W | 1000W |
Form factor | PCIe gen4 1-slot LP | PCIe gen4 2-slot FHFL | PCIe gen4 2-slot FHFL | PCIe gen4 2-slot FHFL | SXM4 | PCIe gen4 2-slot FHFL | PCIe gen5 2-slot FHFL | SXM5 card | SXM5 card | SXM5 card |
Spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | spec sheet | N/A | N/A |
Announcement | 2023 | 2021 | 2020 | 2023 | 2020 | 2022 | 2022 | 2023 | 2024 | 2024 |
Availability | |||||||||
GPU | L4 | A16 | A40 | L40S | A100 SXM4 | PCIe | H100 PCIe | H100 | H200 SXM5 | B100 | B200 |
* the stated power is for calculations with sparse nuts (Sparsity), for standard calculations the power is half of the specified values
** NVIDIA A100 PCIe achieves 90% of the stated computing power
GPU Accelerators for data centres
NVIDIA Tesla and Ampere graphics accelerators are designed to accelerate HPC applications or deploy artificial intelligence and deep learning algorithms.
Key benefits of NVIDIA cards include dedicated Tensor cores for machine learning applications. And large memory (up to 80 GB per accelerator), secured by ECC technology. To enable the accelerators to communicate quickly with each other, NVIDIA has connected them with a special interface with huge data throughput – NVLink. NVLink achieves transfer rates of up to 600 GB/s. In addition, the NVIDIA DGX A100 offers a super-powerful NVSwitch. This provides a total throughput between eight NVIDIA Ampere A100 cards of up to 4.8 TB/s.
Intersect360 Research analysis shows that most of the most widely used HPC applications already support NVIDIA cards. These include These include GROMACS, Ansys Fluent, Gaussian, VASP, NAMD, Abaqus, OpenFoam, LS Dyna, BLAST, Amber, Gamess, ParaView, NASTRAN and many others. The large expansion of NVIDIA accelerators has been helped by support for deep learning frameworks – TensorFlow, Caffe, PyTorch, MXNET, Chainer, Keras and many more.
The graph on the right shows how fast the graphics accelerator field is evolving, with a ninefold increase in performance in just four years. The figures are based on the average of benchmark results for the most widely used AI and HPC applications (Amber, Chroma, GROMACS, MILC, NAMD, PyTorch, Quantum Espresso, TensorFlow and VASP), measured on dual-socket servers with four P100, V100 or A100 accelerators each.
How to choose the best GPU?
The current GP GPU cards for data centers and their typical deployments are mentioned in the infographics.
NVIDIA visualization cards
NVIDIA RTX professional cards are designed primarily for graphics processing and simulation, machine learning, data analytics, and high-performance workstation virtualization.
Comparison of Nvidia cards for visualization
Parameter | RTX A2000 | RTX A4000 | RTX A4500 | RTX A5000 | RTX A5500 | RTX A6000 | RTX 6000 Ada |
---|---|---|---|---|---|---|---|
Architecture | Ampere | Ampere | Ampere | Ampere | Ampere | Ampere | Ada Lovelace |
Card chip | GA106 | GA104 | GA102 | GA102 | GA102 | GA102 | AD102 |
# CUDA cores | 3 328 | 6 144 | 7 168 | 8 192 | 10 240 | 10 752 | 18 176 |
# Tensor cores | 104 | 192 | 224 | 256 | 320 | 336 | 568 |
FP64 (TFlops) | 0,124 | 0,6 | 0,739 | 0,87 | 1,085 | 1,25 | 1,423 |
FP32 (TFlops) | 8 | 19,2 | 23,65 | 27,7 | 34,1 | 40 | 91,1 |
FP16 Tensor (TFlops) | 63,9* | 153,4* | 189,2 | 222,2* | 272,8* | 309,7* | 728* |
GPU memory | 6 / 12 GB | 16 GB | 20 GB | 24 GB | 24 GB | 48 GB | 48GB |
Memory | GDDR6 | GDDR6 | GDDR6 | GDDR6 | GDDR6 | GDDR6 | GDDR6 |
Memory throughput | 288 GB / s | 448 GB / s | 640 GB / s | 768 GB / s | 768 GB / s | 768 GB / s | 960 GB/s |
ECC memory | ECC | ECC | ECC | ECC | ECC | ECC | ECC |
GPU link | PCIe gen4 | PCIe gen4 | NVLink 2-way | NVLink 2-way | NVLink 2-way | NVLink 2-way | PCIe gen4 |
Max. power consumption | 70 W | 140 W | 200 W | 230 W | 230 W | 300 W | 300W |
Form factor | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 |
For data centres** | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Announcement year | 2021 | 2021 | 2021 | 2021 | 2022 | 2020 | 2022 |
Card | RTX A2000 | RTX A4000 | RTX A4500 | RTX A5000 | RTX A5500 | RTX A6000 | RTX 6000 Ada |
* the stated power is for calculations with sparse nuts (Sparsity), for standard calculations the power is half of the specified values
Under Nvidia’s Card Driver License Terms (EULA), GeForce (GTX, RTX) graphics cards are not intended for data centers:
“No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.”
source: https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us&type=GeForce
NVIDIA offers special pricing and project-specific programs on both GPU and DGX systems, plus support for educational institutions (EDUs) and start-ups.
Testing
To test the performance and especially the speed of deploying ML and AI applications, we have the NVIDIA DGX Station and, as part of the NVIDIA Test Drive program, the NVIDIA A100, NVIDIA A40, NVIDIA A10, Tesla V100 or Tesla T4 accelerators. If you are interested in our testing offer, please fill out this form.