NVIDIA GPU Guide for Data Centres

NVIDIA introduces the most powerful graphics accelerators on the market. These cards can accelerate many massively parallel tasks
and scientific (HPC) applications or efficiently solve artificial intelligence (AI) algorithms. We’ve put together a guide to choosing NVIDIA GPU cards.

GPU	L4	A16	A40	L40S	RTX PRO 6000 Blackwell SE	A100 SXM4 \| PCIe	H100 PCIe	H100 NVL	H100 \| H200 SXM5	H200 NVL	B200 \| B300
Architecture	Ada Lovelace	Ampere	Ampere	Ada Lovelace	Blackwell	Ampere	Hopper	Hopper	Hopper	Hopper	Blackwell
Card chip	AD104	GA107	GA102	AD102	GB202	GA100	GH100	GH100	GH100	GH100	B200
# CUDA cores	7 680	4x 1 280	10 752	18 176	24 064	6 912	14 592	16 896	16 896	16 896	TBA
# Tensor cores	240	4x 40	336	568?	752	432	456	528	528	528	TBA
FP64 (TFlops)	0,49	0,271	1,179	1,413?	—	9,69	26	30	34	34	TBA
FP64 Tensor (TFlops)	—	—	—	—	—	19,5	51	60	67	67	40
FP32 (TFlops)	30,3	4x 4,5	37,4	91,6	117	19,5	51	60	67	67	80
TF32 Tensor (TFlops)	120*	4x 18*	150*	366*	—	312*	756*	835*	989*	989*	2 200
FP16 Tensor (TFlops)	242*	4x 35,9*	299*	733*	—	624*	1 513*	1 671*	1 979*	1 979*	4 500
INT8 Tensor (TOPS)	FP8 485*	4x 71,8*	599*	1466*	—	1 248*	3 026*	3 341*	3 958*	3 958*	9 000
GPU memory	24 GB	4x 16 GB	48 GB	48 GB	96 GB	80 \| 40 GB	80 GB	94 GB	80 \| 141 GB	141 GB	192 GB \| 288 GB
Memory technology	GDDR6	GDDR6	GDDR6	GDDR6	GDDR7	HBM2	HBM3	HBM3e	HBM3 \| HBM3e	HBM3e	HBM3e
Memory throughput	300 GB/s	4x 200 GB/s	696 GB/s	864 GB/s	1.6 TB/s	1 935 \| 2 039 GB/s	2 TB/s	3.9 TB/s	3,3 \| 4.8 TB/s	4.8 TB/s	8 TB/s
Multi-Instance GPU	vGPU	vGPU	vGPU	vGPU	4 instances	7 instances	7 instances	7 instances	7 instances	7 instances	TBA
NVENC \| NVDEC \| JPEG engines	2 \| 4 \| 4	4 \| 8	1 \| 2	3 \| 3 \| 4	4 \| 4 \| 4	0 \| 5 \| 5	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	0 \| 7 \| 7	TBA
GPU link	PCIe 4	PCIe 4	NVLink 3	PCIe 4	PCIe 5	NVLink 3	NVLink 4	NVLink 4	NVLink 4	NVLink 4	NVLink 5
Power consumption	40-72W	250 W	300 W	350 W	600 W	400W \| 300W	350W	400 W	700W	600W	1000W
Form factor	PCIe gen4 1-slot LP	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen4 2-slot FHFL	PCIe gen5 2-slot FHFL	SXM4 \| PCIe gen4 2-slot FHFL	PCIe gen5 2-slot FHFL	PCIe gen5 2-slot FHFL	SXM5 card	PCIe gen5 2-slot FHFL	SXM5 card
Spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	spec sheet	N/A
Announcement	2023	2021	2020	2023	2025	2020	2022	2024	2022 \| 2023	2023	2024
Availability											\|
GPU	L4	A16	A40	L40S	RTX PRO 6000 Blackwell Server Edition	A100 SXM4 \| PCIe	H100 PCIe	H100 NVL	H100 \| H200 SXM5	H200 NVL	B200

* the listed performance is for calculations with sparse matrices (Sparsity), for standard calculations the performance is half of the listed values

Detailed guide to choosing an NVIDIA GPU

At AIserver.eu we have created a detailed guide to NVIDIA GPUs. For a quick introduction to NVIDIA cards, the Quick guide. Expert guide contains a comparison of all card parameters. Application benchmarks or parameters of older cards are available.

Detailed guide on AIserver.eu

GPU Accelerators for data centres

NVIDIA Tesla and Ampere graphics accelerators are designed to accelerate HPC applications or deploy artificial intelligence and deep learning algorithms.

Key benefits of NVIDIA cards include dedicated Tensor cores for machine learning applications. And large memory (up to 192 GB per accelerator), secured by ECC technology. To enable the accelerators to communicate quickly with each other, NVIDIA has connected them with a special interface with huge data throughput – NVLink. NVLink can significantly increase the throughput between cards, up to 1.8 TB/s when connecting Blackwell generation cards.

Intersect360 Research analysis shows that most of the most widely used HPC applications already support NVIDIA cards. These include These include GROMACS, Ansys Fluent, Gaussian, VASP, NAMD, Abaqus, OpenFoam, LS Dyna, BLAST, Amber, Gamess, ParaView, NASTRAN and many others. The large expansion of NVIDIA accelerators has been helped by support for deep learning frameworks – TensorFlow, Caffe, PyTorch, MXNET, Chainer, Keras and many more.

The graph on the right shows how fast the graphics accelerator field is evolving, with a ninefold increase in performance in just four years. The figures are based on the average of benchmark results for the most widely used AI and HPC applications (Amber, Chroma, GROMACS, MILC, NAMD, PyTorch, Quantum Espresso, TensorFlow and VASP), measured on dual-socket servers with four P100, V100 or A100 accelerators each.

How to choose the best GPU?

The current GP GPU cards for data centers and their typical deployments are mentioned in the infographics.

NVIDIA visualization cards

NVIDIA RTX professional cards are designed primarily for graphics processing and simulation, machine learning, data analytics, and high-performance workstation virtualization.

NVIDIA RTX PRO 6000 Blackwell Workstation Edition

NVIDIA Visualization Card Comparison

Parametr	RTX A2000	RTX A4000	RTX A4500	RTX A5000	RTX A5500	RTX A6000	RTX 4000 SFF Ada	RTX 4000 Ada	RTX 4500 Ada	RTX 5000 Ada	RTX 6000 Ada	RTX PRO 4000 Blackwell	RTX PRO 4500 Blackwell	RTX PRO 5000 Blackwell	RTX PRO 6000 Blackwell
Architektura	Ampere	Ampere	Ampere	Ampere	Ampere	Ampere	Lovelace	Lovelace	Lovelace	Lovelace	Lovelace	Blackwell	Blackwell	Blackwell	Blackwell
Čip karty	GA 106	GA 104	GA 102	GA 102	GA 102	GA 102	AD 103	AD 103	AD 103	AD 103	AD 102	GB 203	GB 203	GB 202	GB 202
# CUDA jader	3 328	6 144	7 168	8 192	10 240	10 752	6 144	6 144	7 680	12 800	18 176	8 960	10 496	14 080	24 064
# Tensor jader	104	192	224	256	320	336	192	192	240	400	568	280	328	440	752
FP64 (TFlops)	0,124	0,6	0,739	0,87	1,085	1,25	0.296	0.417	0.6192	1.020	1.423	0.73	0.86	1.15	-
FP32 (TFlops)	8	19,2	23,65	27,7	34,1	40	19.2	26.73	39.63	65.28	91.1	46.9	54.9	73.7	126
FP16 Tensor (TFlops)	63,9*	153,4*	189,2	222,2*	272,8*	309,7*	307	327.6*	634*	1 044.4*	1 457*	-	-	-	2 015.2*
GPU paměť	6 / 12 GB	16 GB	20 GB	24 GB	24 GB	48 GB	20 GB	20 GB	24 GB	32 GB	48 GB	24 GB	32 GB	48 GB	96 GB
Typ Paměti	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR6	GDDR7	GDDR7	GDDR7	GDDR7
Propustnost pamětí	288 GB / s	448 GB / s	640 GB / s	768 GB / s	768 GB / s	768 GB / s	280 GB/s	360 GB/s	432 GB/s	576 GB/s	960 GB/s	672 GB/s	896 GB/s	1340 GB/s	1792 GB/s
ECC paměti	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano
Propojení GPU	PCIe gen4	PCIe gen4	NVLink 2-way	NVLink 2-way	NVLink 2-way	NVLink 2-way	-	-	-	-	-	-	-	-	-
Max. příkon	70 W	140 W	200 W	230 W	230 W	300 W	70 W	130 W	210 W	250 W	300 W	140 W	200 W	300 W	600 W
Provedení	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4	PCIe gen4	PCIe 4.0	PCIe 4.0	PCIe 4.0	PCIe 4.0	PCIe 4.0	PCIe 5.0	PCIe 5.0	PCIe 5.0	PCIe 5.0
Pro datacentra**	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano	Ano
Oznámení	2021	2021	2021	2021	2022	2020	2023	2023	2023	2023	2022	2025	2025	2025	2025
Karta	RTX A2000	RTX A4000	RTX A4500	RTX A5000	RTX A5500	RTX A6000	RTX 4000 SFF Ada	RTX 4000 Ada	RTX 4500 Ada	RTX 5000 Ada	RTX 6000 Ada	RTX PRO 4000 Blackwell	RTX PRO 4500 Blackwell	RTX PRO 5000 Blackwell	RTX PRO 6000 Blackwell Workstation Edition
	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet	Datasheet

* the stated power is for calculations with sparse nuts (Sparsity), for standard calculations the power is half of the specified values

** according to the Nvidia Card Driver License Agreement (EULA), GeForce graphics cards (GTX, RTX) are not intended for data centers:
“No Datacenter Deployment. The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted.”
source: https://www.nvidia.com/content/DriverDownload-March2009/licence.php?lang=us&type=GeForce

EDU prices

NVIDIA offers special pricing and project-specific programs on both GPU and DGX systems, plus support for educational institutions (EDUs) and start-ups.

Get an offer

NVIDIA B200

NVIDIA Blackwell chips are the next most powerful generation of accelerators for dramatically accelerating AI and HPC projects, improving efficiency and scalability. The Blackwell platform offers the largest physical chip size possible, which also leads to huge transistor density and unparalleled data throughput.

More information

DGX B200

Testing

We have a wide range of NVIDIA GPU cards available for performance testing and heavy-duty processing.
If you are interested in our testing offer, please fill out this form.

Kamila Jeřábková

NVIDIA, AI solutions

+420 734 161 516

kamila.jerabkova@mcomputers.cz