NVIDIA DGX systems

NVIDIA DGX systems are supercomputers for problem-solving of machine learning and AI. It is currently the most powerful hardware and tuned software stack that contains the most widely used machine learning frameworks in the form of application containers.

NVIDIA
DGX H200

Fifth generation rackmount supercomputer for machine learning and artificial intelligence computing, equipped with eight Hopper architecture accelerators – Nvidia H200 141GB.

More information

NVIDIA
DGX B200

The sixth generation of the most powerful supercomputer designed exclusively for AI training is equipped with eight Blackwell accelerators – NVIDIA B200.

More information

NVIDIA
DGX B300

The seventh-generation NVIDIA DGX supercomputer is built on an enhanced Blackwell Ultra architecture that focuses on even faster AI training and inference.

More information

Hardware

Let’s take a look at the NVIDIA DGX in detail, first from a hardware point of view.

Parametr	DGX H200	DGX B200	DGX B300
GPU	8× NVIDIA H200 141GB	8x NVIDIA B200	8x NVIDIA Blackwell Ultra GPU
Výkon (Tensor FP8)	32 petaFLOPS	72 petaFLOPS	72 petaFLOPS
Celková GPU paměť	1128 GB HBM3	1536 GB HBM3e	2.3 TB HBM3e
Počet CUDA jader	135 168	TBA	TBA
Počet Tensor jader	4 224 (4. generace)	TBA	TBA
Multi-instance GPU	56 instancí	56 instancí	TBA
RAM	2 TB	až 4 TB	až 4 TB
HDD	OS: 2x 1.9TB NVMe M.2 Storage: 8x 3.84TB NVMe U.2	OS: 2x 1.9TB NVMe M.2 Storage: 8x 3.84TB NVMe U.2	OS: 2x 1.9TB NVMe M.2 Storage: 8x 3.84TB NVMe U.2
Propojení GPU karet	4x NVIDIA NVSwitch (7.2TB/s) 18x NVIDIA NVLink/GPU (900GB/s)	36x NVLink / GPU (1.8TB/s)	36x NVLink / GPU (1.8TB/s)
Network	8x Single-port ConnectX-7 VPI (400Gb/s InfiniBand / 200 Gb/s Ethernet) 2x Dual-port ConnectX-7 VPI (400Gb/s InfiniBand / 200 Gb/s Ethernet)	4x OSFP pro 8x single-port NVIDIA ConnectX-7 VPI (400Gb/s InfiniBand/Ethernet) 2x dual-port QSFP112 NVIDIA BlueField-3 DPU (400Gb/s InfiniBand/Ethernet)	4x OSFP pro 8x single-port NVIDIA ConnectX-7 VPI (400Gb/s InfiniBand/Ethernet) 2x dual-port QSFP112 NVIDIA BlueField-3 DPU (400Gb/s InfiniBand/Ethernet)
Maximální příkon	~ 10.2 kW max	~ 14.3 kW max	~ 14 kW max
Provedení	rack, 8U	rack, 10U	rack, 10U
	Datasheet	Datasheet	Datasheet

All NVIDIA DGX systems are equipped with the latest and fastest accelerators. The standard configuration includes eight cards, allowing NVLink to get the most out of the architecture with speeds up to 8TB/s. All NVIDIA GPUs boast dedicated Tensor cores that efficiently accelerate computation for training machine learning and artificial intelligence models. Fast and efficient training is also aided by the large and fast graphics memory, which in the case of the DGX B200 can be up to 1536 GB HBM3e.

Overview of NVIDIA cards for data centers

How to effectively leverage Multi-GPU systems?

This is one of the most often questions we can hear from our customers. There are a couple of techniques you can use as described in a webinar on the Multi-GPU topic. Or you can attend Fundamentals of Deep Learning for Multi-GPUs workshop organized by us, together with NVIDIA Deep Learning Institute (DLI).

Software

What is more interesting, however, is the already mentioned software package offered by NVIDIA DGX machines. All of these offer pre-installed and performance-tuned environments for machine learning (e.g. Caffe, resp. Caffe 2, Theano, TensorFlow, PyTorch, or MXNet) or an intuitive environment for data analysts (NVIDIA Digits). All of this is elegantly packed in Docker Containers. Such a tuned environment provides 30% more power for machine learning applications against applications deployed purely on NVIDIA hardware. The main advantage of the pre-installed environment is the deployment speed, which is in units of hours. The base DGX system image contains Ubuntu operating system, NVIDIA GPU drivers and Docker environment for application containers downloadable from NVIDIA GPU Cloud (NGC). NVIDIA also supports running these Docker images in a Singularity environment.

NVIDIA Developer

NVIDIA GPU Cloud

NVIDIA GPU Cloud (NGC) represents the repository of the most used frameworks for machine learning and deep learning applications, HPC applications, or NVIDIA GPU cards accelerated visualization. Deploying these applications is a question minutes — copying a link of the appropriate Docker image from the NGC repository, moving it on the DGX system, and downloading and running the Docker container. Individual development environments — versions of all included libraries and frameworks, settings of environment parameters — are updated and optimized by NVIDIA for deployment on DGX systems. https://ngc.nvidia.com/

NGC Catalogue

Support

The strength of the NVIDIA solution is to support the entire system. Hardware support (in case of failure of any of the components) is a matter of course. Software support for the entire environment is critical if something does not work as intended. The customer has hundreds of developers ready to help. Support is part of NVIDIA DGX purchase. It is available for 1, 3 or 5 years and can be further extended after this time.

NVIDIA support includes:

Hardware SLA (replacement parts) 1 day
Software support — DGX OS image and full AI software including ML frameworks available on NGC
access to Enterprise Support Portal
24×7 hotline
access to Nvidia Knowledgebase
access to NVIDIA GPU Cloud (NGC) portal
NVIDIA Cloud Management
DGX Software upgrades
DGX Software updates
DGX Firmware updaty

Primary telephone support is provided by M Computers in Czech.

Applications performance comparison

GPU vs. CPU: Data analysis

NVIDIA GPU vs x86 CPU

H100 vs. B200 in model training

NVIDIA B200 vs H100

NVIDIA DGX Systems deliver much better performance for data analytics and training of AI algorithms thanks to the combination of tuned hardware, rich software stack and high quality of NVIDIA support that covers both DGX hardware and software.

The difference between tuned DGX system solution for fast and powerful machine learning and DIY variant (Do It Yourself) is evident from the following video:

We delivered our first Nvidia DGX-2 system to IT4Innovations Supercomputing Center VSB in Ostrava, Czech Republic. You can see the details of the installation in the short reference video.

Reference Architectures

NVIDIA DGX systems represent huge computing power. When designing an architecture, it is necessary to consider their involvement in the overall IT infrastructure and its tuning to achieve maximum performance. NVIDIA has introduced NVIDIA DGX POD Reference Architecture, including networking and storage disc arrays. Here you will find individual design proposals from the key storage vendors that describe the overall infrastructure solution for running ML and AI applications. Interesting technology to speed up data transfer between the GPU and the data storage can be GPUDirect storage.

ONTAP AI VAST DATA

DDN A3I Pure Storage AIRI

NetApp ONTAP AI reference architecture

NVIDIA offers special programs for DGX systems and Tesla accelerators for EDU organizations or start-up companies. Thanks to the international collaboration between NVIDIA and IBM Global Financing, preferential financing in the form of operating leases is available for DGX models.

Get an offer

NVIDIA Deep Learning Institute

The NVIDIA Deep Learning Institute (DLI) offers both online and hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning and accelerated computing.

More information

Testing

We have a wide range of NVIDIA GPU cards available for performance testing and heavy-duty processing.
If you are interested in our testing offer, please fill out this form.

Kamila Jeřábková

NVIDIA, Lenovo ISG

+420 734 161 516

kamila.jerabkova@mcomputers.cz