NVIDIA GPU Comparison for Data Centres

NVIDIA introduces the most powerful graphics accelerators on the market. These cards can accelerate many massively parallel tasks and scientific (HPC) applications or efficiently solve artificial intelligence (AI) algorithms. We’ve put together a guide to choosing NVIDIA GPU cards.

GPU	L4	A16	A40	L40S	A100 SXM4 \| PCIe	H100 PCIe	H100 \| H200 SXM5	B100	B200
Architecture	Ada Lovelace	Ampere	Ampere	Ada Lovelace	Ampere	Hopper	Hopper	Blackwell	Blackwell
Card chip	AD104	GA107	GA102	AD102	GA100	GH100	GH100	B100	B200
# CUDA cores	7 680	4x 1 280	10 752	18 176	6 912	14 592	16 896	TBA	TBA
# Tensor cores	240	4x 40	336	568?	432	456	528	TBA	TBA
FP64 (TFlops)	0,49	0,271	1,179	1,413?	9,69	26	34	TBA	TBA
FP64 Tensor (TFlops)	—	—	—	—	19,5	51	67	30	40
FP32 (TFlops)	30,3	4x 4,5	37,4	91,6	19,5	51	67	TBA	TBA
TF32 Tensor (TFlops)	120*	4x 18*	150*	366*	312*	756*	989*	1 800	2 200
FP16 Tensor (TFlops)	242*	4x 35,9*	299*	733*	624*	1 513*	1 979*	3 500	4 500
INT8 Tensor (TOPS)	FP8 485*	4x 71,8*	599*	1466*	1 248*	3 026*	3 958*	7 000	9 000
GPU memory	24 GB	4x 16 GB	48 GB	48 GB	80 \| 40 GB	80 GB	80 \| 141 GB	192 GB	192 GB
Memory technology	GDDR6	GDDR6	GDDR6	GDDR6	HBM2	HBM3	HBM3	HBM3e	HBM3e
Memory throughput	300 GB/s	4x 200 GB/s	696 GB/s	864 GB/s	1 935 \| 2 039 GB/s	2 TB/s	3,3 \| 4.8 TB/s	8 TB/s	8 TB/s
Multi-Instance GPU	vGPU	vGPU	vGPU	vGPU	7 instances	7 instances	7 instances	TBA	TBA
NVENC \| NVDEC \| JPEG engines	2 \| 4 \| 4	4 \| 8	1 \| 2	3 \| 3 \| 4	0 \| 5 \| 5	0 \| 7 \| 7	0 \| 7 \| 7	TBA	TBA
GPU link	PCIe 4	PCIe 4	NVLink 3	PCIe 4	NVLink 3	NVLink 4	NVLink 4	NVLink 5	NVLink 5
Power consumption	40-72W	250 W	300 W	350 W	400W \| 300W	350W	700W	700W	1000W
Form factor	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	SXM4 \| PCIe gen4 2-slot FHFL	PCIe gen5 2-slot FHFL	SXM5 card	SXM5 card	SXM5 card
Spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	N/A	N/A
Announcement	2023	2021	2020	2023	2020	2022	2022 \| 2023	2024	2024
Availability
GPU	L4	A16	A40	L40S	A100 SXM4 \| PCIe	H100 PCIe	H100 \| H200 SXM5	B100	B200

* the stated power is for calculations with sparse nuts (Sparsity), for standard calculations the power is half of the specified values
** NVIDIA A100 PCIe achieves 90% of the stated computing power

Detailed guide to choosing an NVIDIA GPU

At AIserver.eu we have created a detailed guide to NVIDIA GPUs. For a quick introduction to NVIDIA cards, the Quick guide. Expert guide contains a comparison of all card parameters. Application benchmarks or parameters of older cards are available.

Detailed guide on AIserver.eu

GPU Accelerators for data centres

NVIDIA Tesla and Ampere graphics accelerators are designed to accelerate HPC applications or deploy artificial intelligence and deep learning algorithms.

Key benefits of NVIDIA cards include dedicated Tensor cores for machine learning applications. And large memory (up to 80 GB per accelerator), secured by ECC technology. To enable the accelerators to communicate quickly with each other, NVIDIA has connected them with a special interface with huge data throughput – NVLink. NVLink achieves transfer rates of up to 600 GB/s. In addition, the NVIDIA DGX A100 offers a super-powerful NVSwitch. This provides a total throughput between eight NVIDIA Ampere A100 cards of up to 4.8 TB/s.

Intersect360 Research analysis shows that most of the most widely used HPC applications already support NVIDIA cards. These include These include GROMACS, Ansys Fluent, Gaussian, VASP, NAMD, Abaqus, OpenFoam, LS Dyna, BLAST, Amber, Gamess, ParaView, NASTRAN and many others. The large expansion of NVIDIA accelerators has been helped by support for deep learning frameworks – TensorFlow, Caffe, PyTorch, MXNET, Chainer, Keras and many more.

The graph on the right shows how fast the graphics accelerator field is evolving, with a ninefold increase in performance in just four years. The figures are based on the average of benchmark results for the most widely used AI and HPC applications (Amber, Chroma, GROMACS, MILC, NAMD, PyTorch, Quantum Espresso, TensorFlow and VASP), measured on dual-socket servers with four P100, V100 or A100 accelerators each.

NVIDIA GP GPU Performance Evolution in Time

How to choose the best GPU?

The current GP GPU cards for data centers and their typical deployments are mentioned in the infographics.

NVIDIA visualization cards

NVIDIA RTX professional cards are designed primarily for graphics processing and simulation, machine learning, data analytics, and high-performance workstation virtualization.

Comparison of Nvidia cards for visualization

Parameter	RTX A2000	RTX A4000	RTX A4500	RTX A5000	RTX A5500	RTX A6000	RTX 6000 Ada
Architecture	Ampere	Ampere	Ampere	Ampere	Ampere	Ampere	Ada Lovelace
Card chip	GA106	GA104	GA102	GA102	GA102	GA102	AD102
# CUDA cores	3 328	6 144	7 168	8 192	10 240	10 752	18 176
# Tensor cores	104	192	224	256	320	336	568
FP64 (TFlops)	0,124	0,6	0,739	0,87	1,085	1,25	1,423
FP32 (TFlops)	8	19,2	23,65	27,7	34,1	40	91,1
FP16 Tensor (TFlops)	63,9*	153,4*	189,2	222,2*	272,8*	309,7*	728*
GPU memory	6 / 12 GB	16 GB	20 GB	24 GB	24 GB	48 GB	48GB
Memory	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6
Memory throughput	288 GB / s	448 GB / s	640 GB / s	768 GB / s	768 GB / s	768 GB / s	960 GB/s
ECC memory	ECC	ECC	ECC	ECC	ECC	ECC	ECC
GPU link	PCIe gen4	PCIe gen4	NVLink 2-way	NVLink 2-way	NVLink 2-way	NVLink 2-way	PCIe gen4
Max. power consumption	70 W	140 W	200 W	230 W	230 W	300 W	300W
Form factor	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4
For data centres**	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Announcement year	2021	2021	2021	2021	2022	2020	2022
Card	RTX A2000	RTX A4000	RTX A4500	RTX A5000	RTX A5500	RTX A6000	RTX 6000 Ada
	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet

* the stated power is for calculations with sparse nuts (Sparsity), for standard calculations the power is half of the specified values

Under Nvidia’s Card Driver License Terms (EULA), GeForce (GTX, RTX) graphics cards are not intended for data centers:
“No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.”
source: https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us&type=GeForce

NVIDIA offers special pricing and project-specific programs on both GPU and DGX systems, plus support for educational institutions (EDUs) and start-ups.

Get an offer

NVIDIA B200

NVIDIA Blackwell chips are the next most powerful generation of accelerators for dramatically accelerating AI and HPC projects, improving efficiency and scalability. The Blackwell platform offers the largest physical chip size possible, which also leads to huge transistor density and unparalleled data throughput.

Testing

To test the performance and especially the speed of deploying ML and AI applications, we have the NVIDIA DGX Station and, as part of the NVIDIA Test Drive program, the NVIDIA A100, NVIDIA A40, NVIDIA A10, Tesla V100 or Tesla T4 accelerators. If you are interested in our testing offer, please fill out this form.