
NVIDIA GPU Comparison for Data Centres
Parameter | NVIDIA A2 | NVIDIA A16 | NVIDIA A10 | NVIDIA A40 | NVIDIA A30 | NVIDIA A100 SXM4 | PCIe | DGX Station A100 | DGX A100 |
---|---|---|---|---|---|---|---|---|
Architecture | Ampere | Ampere | Ampere | Ampere | Ampere | Ampere | Ampere | Ampere |
Card chip | GA107 | GA107 | GA102 | GA102 | GA100 | GA100 | GA100 | GA100 |
# CUDA cores | 1 280 | 4x 1 280 | 9 216 | 10 752 | 6 912 | 6 912 | 27 648 | 55 296 |
# Tensor cores | 40 | 4x 40 | 288 | 336 | 224 | 432 | 1 728 | 3 456 |
FP64 (TFlops) | 0,07 | 0,271 | 0,97 | 1,179 | 5,2 | 9,7 | 38,8 | 77,6 |
FP64 Tensor (TFlops) | — | — | — | — | 10,3 | 19,5 | 78 | 156 |
FP32 (TFlops) | 4,5 | 4x 4,5 | 31,2 | 37,4 | 10,3 | 19,5 | 78 | 156 |
TF32 Tensor (TFlops) | 18* | 4x 18* | 125* | 150* | 165* | 312* | 1 248* | 2 496* |
FP16 Tensor (TFlops) | 35,9* | 4x 35,9* | 250* | 299* | 330* | 624* | 2 496* | 4 992* |
INT8 Tensor (TOPS) | 71,8* | 4x 71,8* | 500* | 599* | 661* | 1248* | 4 992* | 9 994* |
INT4 Tensor (TOPS) | 144* | 4x 144* | 1000* | 1197* | 1321* | 2496* | 9 992* | 19 968* |
GPU memory | 16 GB | 4x 16 GB | 24 GB | 48 GB | 24 GB | 40 GB / 80 GB | 160 / 320 GB | 320 / 640 GB |
Multi-Instance GPU | vGPU mode | vGPU mode | vGPU mode | vGPU mode | 4 instance | 7 instances | 28 instances | 56 instances |
Memory technology | GDDR6 | GDDR6 | GDDR6 | GDDR6 | HBM2 | HBM2 | HBM2 | HBM2 |
Memory throughput | 200 GB/s | 4x 200 GB/s | 600 GB/s | 696 GB/s | 933 GB/s | 1 ,5 / 2,0 TB/s | 1 ,5 / 2,0 TB/s | 1 ,5 / 2,0 TB/s |
GPU link | PCIe 4 | PCIe 4 | PCIe 4 | NVLink | NVLink 3 | NVLink 3 | NVLink 3 | NVSwitch3, non-blocking, 4.8 TB/s |
Power consumption | 40-60 W | 250 W | 150 W | 300 W | 165 W | 400 W | 250 W | 1 500 W | 6,6 kW |
Form factor | PCIe card | PCIe card | PCIe card | PCIe card | PCIe card | SXM4 | PCIe | tower, water cooling CPU and GPU | rack, 6U |
PCIe generation | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCI gen4 | PCIe gen4 | PCIe gen4 |
Year of launch | 2021 | 2021 | 2021 | 2020 | 2021 | 2020 | 2020 | 2020 |
Card | NVIDIA A2 | NVIDIA A16 | NVIDIA A10 | NVIDIA A40 | NVIDIA A30 | NVIDIA A100 SXM4 | DGX Station A100 | DGX A100 |
* the stated power is for calculations with sparse nuts (Sparsity), for standard calculations the power is half of the specified values
** NVIDIA A100 PCIe achieves 90% of the stated computing power
GPU Accelerators for data centres
NVIDIA Tesla and Ampere graphics accelerators are designed to accelerate HPC applications or deploy artificial intelligence and deep learning algorithms.
Key benefits of NVIDIA cards include dedicated Tensor cores for machine learning applications. And large memory (up to 80 GB per accelerator), secured by ECC technology. To enable the accelerators to communicate quickly with each other, NVIDIA has connected them with a special interface with huge data throughput – NVLink. NVLink achieves transfer rates of up to 600 GB/s. In addition, the NVIDIA DGX A100 offers a super-powerful NVSwitch. This provides a total throughput between eight NVIDIA Ampere A100 cards of up to 4.8 TB/s.
Intersect360 Research analysis shows that most of the most widely used HPC applications already support NVIDIA cards. These include These include GROMACS, Ansys Fluent, Gaussian, VASP, NAMD, Abaqus, OpenFoam, LS Dyna, BLAST, Amber, Gamess, ParaView, NASTRAN and many others. The large expansion of NVIDIA accelerators has been helped by support for deep learning frameworks – TensorFlow, Caffe, PyTorch, MXNET, Chainer, Keras and many more.
The graph on the right shows how fast the graphics accelerator field is evolving, with a ninefold increase in performance in just four years. The figures are based on the average of benchmark results for the most widely used AI and HPC applications (Amber, Chroma, GROMACS, MILC, NAMD, PyTorch, Quantum Espresso, TensorFlow and VASP), measured on dual-socket servers with four P100, V100 or A100 accelerators each.
How to choose the best GPU?
The current GP GPU cards for data centers and their typical deployments are mentioned in the infographics.
NVIDIA visualization cards
NVIDIA RTX professional cards are designed primarily for graphics processing and simulation, machine learning, data analytics, and high-performance workstation virtualization.
Comparison of Nvidia cards for visualization
Parameter | RTX 3080 | RTX 3090 | RTX A2000 | RTX A4000 | RTX A4500 | RTX A5000 | RTX A6000 |
---|---|---|---|---|---|---|---|
Architecture | Ampere | Ampere | Ampere | Ampere | Ampere | Ampere | Ampere |
Card chip | GA102 | GA102 | GA106 | GA104 | GA102 | GA102 | GA102 |
# CUDA cores | 8704 | 10 496 | 3 328 | 6 144 | 7 168 | 8 192 | 10 752 |
# Tensor cores | 272 | 328 | 104 | 192 | 224 | 256 | 336 |
FP64 (TFlops) | 0,47 | 0,56 | 0,124 | 0,6 | 0,739 | 0,87 | 1,25 |
FP32 (TFlops) | 29,8 | 35.6 | 8 | 19,2 | 23,65 | 27,7 | 40 |
FP16 Tensor (TFlops) | 119/238* | 142/284* | 63,9* | 153,4* | 189,2 | 222,2* | 309,7* |
GPU memory | 10 GB | 24 GB | 6 / 12 GB | 16 GB | 20 GB | 24 GB | 48 GB |
Memory | GDDR6X | GDDR6X | GDDR6 | GDDR6 | GDDR6 | GDDR6 | GDDR6 |
Memory throughput | 760 GB / s | 936 GB / s | 288 GB / s | 448 GB / s | 640 GB / s | 768 GB / s | 768 GB / s |
ECC memory | none | none | ECC | ECC | ECC | ECC | ECC |
GPU link | PCIe gen4 | NVLink 2-way | PCIe gen4 | PCIe gen4 | NVLink 2-way | NVLink 2-way | NVLink 2-way |
Max. power consumption | 320 W | 350 W | 70 W | 140 W | 200 W | 230 W | 300 W |
Form factor | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 | PCIe gen4 |
For data centres** | No | No | Yes | Yes | Yes | Yes | Yes |
Announcement year | 2020 | 2020 | 2021 | 2021 | 2021 | 2021 | 2020 |
Card | RTX 3080 | RTX 3090 | RTX A2000 | RTX A4000 | RTX A4500 | RTX A5000 | RTX A6000 |
* uvedený výkon je pro výpočty s řídkými maticemi (Sparcity), pro standardní výpočty je výkon poloviční oproti uvedeným hodnotám
** podle Nvidia licenčních podmínek k ovladačům karet (EULA) nejsou grafické karty GeForce (GTX, RTX) určeny pro datová centra:
“No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.”
zdroj: https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us&type=GeForce
NVIDIA offers special pricing and project-specific programs on both GPU and DGX systems, plus support for educational institutions (EDUs) and start-ups.
Testing
To test the performance and especially the speed of deploying ML and AI applications, we have the NVIDIA DGX Station and, as part of the NVIDIA Test Drive program, the NVIDIA A100, NVIDIA A40, NVIDIA A10, Tesla V100 or Tesla T4 accelerators. If you are interested in our testing offer, please fill out this form.