Masaryk University operates the most advanced AI computing system

29. 5. 2023

Masaryk University has become a pioneer in artificial intelligence (AI) and computing technology by installing the latest and most advanced NVIDIA DGX H100 system. It is the first solution of its kind in the region, bringing extreme computing power and innovative research capabilities.

Who uses the DGX H100 system

CERIT-SC at Masaryk University and their newly installed NVIDIA DGX H100 system open the door to close collaboration with scientists from all over the region through e-INFRA CZ. This prestigious network brings together leading research centres and institutions from the Czech Republic that focus on advanced computing technologies and research in the field of artificial intelligence. We have been involved in the e-INFRA CZ project with previous infrastructure deliveries, more here.

CERIT-SC is part of the national e-infrastructure, which is a complex system of interconnected network, computing and storage capacities and related services for the research community in the Czech Republic. CERIT-SC complements the other two components of the national e-infrastructure – the CESNET association and the IT4Innovations supercomputer centre.

Scientists connected to e-INFRA CZ will have access to NVIDIA DGX H100 resources at Masaryk University and will be able to use its computing capacity for their projects. This collaboration will provide an environment for innovative research and development in AI and accelerate progress in areas such as machine learning, big data analytics and AI application development.

e-INFRA CZ is a unique e-infrastructure for research and development in the Czech Republic, which represents a transparent environment providing comprehensive capacity and resources for the transfer, storage and processing of scientific data to all entities engaged in research and development, regardless of the sector in which it is carried out. It creates a communication, information, storage and computing base for research and development at national and international level and provides a comprehensive portfolio of ICT services without which modern research and development cannot be realised.

Why DGX H100

✓ Large model training with large card memory
✓ Faster processing of large datasets
✓ Ability to work on multiple projects simultaneously

Areas of application

✓ Spoken word analysis
✓ 3D image reconstruction
✓ Detection of neurodegenerative diseases

Processing Engines Used

✓ TensorFlow / Keras
✓ PyTorch / PyTorch Lighting
✓ CUDA, cuDNN

"The purchase and installation of the NVIDIA DGX H100 system is an important milestone for CERIT-SC and the entire e-INFRA CZ infrastructure. This system takes our AI capabilities to a whole new level and opens up a wide range of new research opportunities for our students and scientists."

– RNDr. Lukáš Hejtmánek, Ph.D. –

What makes the NVIDIA DGX system special?

The NVIDIA DGX H100 is a unique solution for artificial intelligence and research thanks to its powerful hardware and innovative architecture that enables processing of massive amounts of data with high speed and accuracy.
⁓
This system is equipped with the latest NVIDIA H100 GPUs (graphics processing units), which provide up to 10x more performance compared to the previous generation NVIDIA A100 Ampere GPUs.
⁓
It offers extreme computing power of up to 32 PFLOPS (petaFLOPS), making it one of the most powerful AI and research solutions on the market.
⁓
The system supports the latest frameworks and libraries for machine and deep learning, such as TensorFlow, PyTorch, Caffe and others, allowing scientists and researchers to use a wide range of tools for their projects.

This system is designed to minimize the time required for AI application development with its powerful infrastructure, optimized algorithms and fine-tuned software with professional support from Nvidia engineers.
⁓
One of the exceptional features of the NVIDIA DGX H100 is its parallel processing capability, which enables rapid development and training of complex AI models.
⁓
NVIDIA DGX H100 supports high memory bandwidth and fast data transfers, enabling efficient handling of large datasets and complex analyses.
⁓
The system is equipped with advanced cooling and energy management, ensuring reliable operation even under high load and contributing to efficient use of electricity.

8 GPU

H100 80GB SXM5

135 168

GPU cores

640 GB

GPU memory

1

2

3

4

5

6

2 TB

RAM

3,84 TB

NVMe for OS

30 TB

NVMe for data

Technical specification

NVIDIA DGX systems aren’t just cutting-edge hardware, they also come with innovative enhancements for easier infrastructure management and AI implementation. They feature a fine-tuned Docker environment and DGX OS, in addition to the new NVIDIA Base Command tool that enables efficient management of the entire infrastructure. This simplifies the deployment and implementation of AI applications for research and development teams.

The system also includes the NVIDIA AI Enterprise (NVAIE) software stack, which provides a complete set of tools for developing and optimizing AI applications. This combination of technologies facilitates and accelerates the process of developing and deploying AI solutions across the entire infrastructure.

8x NVIDIA H100 GPU with 640 gigabytes of total GPU memory
18x NVIDIA NVLink per GPU, 900 GB/s bidirectional throughput between GPUs.
Two Intel Xeon Platinum 8480C processors, 112 cores and 2 TB of system memory
Powerful processors for the most demanding AI tasks
30 Terabyte NVMe SSD
High-speed storage for maximum performance
4x NVIDIA NVSwitch
7.2 terabytes per second of bi-directional GPU-to-GPU bandwidth, 1.5x more than the previous generation
10x NVIDIA ConnectX-7 400 GB/s network interface
1 TB/s of peak bidirectional network bandwidth
Space- and energy-efficient solution with high computing power density
8U (rack unit) size and maximum system power consumption of 10.2 kW at a theoretical power of 32 petaFLOPS
Complete software layer for AI application development
AI Enterprise – optimized software suite for AI
Base Command – sw for orchestration, scheduling and cluster management
Operating System – DGX OS / Ubuntu / Red Hat Enterprise Linux / Rocky

Parametr	NVIDIA DGX H100 640 GB
GPUs	8× NVIDIA H100 SXM5 80 GB
GPU memory	640 GB
CPU	Dual Intel Xeon Platinum 8480C CPU, (112 jader) 2.00 GHz (Base), 3.80 GHz (Max Boost)
Výkon (FP8 tensor operace)	32 PetaFLOPS (FP8)
# CUDA jader	135 168
# Tensor jader	4 224
Multi-instantce GPU	56 instancí
RAM	2 TB
HDD	OS: 2× 1.92 TB NVMe data: 30 TB (8× 3.84 TB) NVMe
Network	8x single-port ConnectX-7 VPI 400 Gb/s InfiniBand/ 200Gb/s Ethernet 2x dual-port ConnectX-7 VPI 400 Gb/s InfiniBand/ 200Gb/s Ethernet
Max. spotřeba	~10,2 kW max
Provedení	rack, 8U
Technická specifikace	Datasheet

NVIDIA GPU Cloud (NGC)

NVIDIA GPU Cloud (NGC) represents the repository of the most used frameworks for machine learning and deep learning applications, HPC applications, or NVIDIA GPU cards accelerated visualization. Deploying these applications is a question minutes — copying a link of the appropriate Docker image from the NGC repository, moving it on the DGX system, and downloading and running the Docker container.

The individual development environments – versions of all included libraries and frameworks, settings of environment parameters – are updated and optimized by NVIDIA for deployment on DGX systems. https://ngc.nvidia.com/

Software

What makes DGX systems the most different from bare-metal solutions is the software. All of them offer pre-installed and, above all, performance-tuned environments for machine learning (eg Caffe or Caffe 2, Theano, TensorFlow, Torch or MXNet) or an intuitive environment for data analytics (NVIDIA Digits). All of this is elegantly packed in Docker Containers. These constantly updated containers can be downloaded from the website NVIDIA GPU Cloud (NGC).

According to NVIDIA, such a tuned environment provides 30% higher performance for machine learning applications compared to applications deployed purely on NVIDIA hardware. However, the main advantage of a pre-installed environment is the speed of deployment, which can be fully operational in the order of hours.

Hardware and software support

The strength of the NVIDIA solution is also the support of the whole system. Fast hardware support (in case of failure of any of the components) is a matter of course.

Software support for the entire environment is critical if something does not work as intended. The customer has hundreds of developers ready to help. Support is included with the purchase of all NVIDIA DGX systems. It is available for 3-5 years and can be extended beyond this period.

M Computers represents NVIDIA on the Czech market in the Enterprise area of computing accelerators and AI systems.

We were the first company in Central and Eastern Europe to receive the highest ELITE PARTNER status, as well as NVIDIA AI Innovator and NVIDIA AI Champion awards.

Testing

We have a wide range of NVIDIA GPU cards available for performance testing and heavy-duty processing.
If you are interested in our testing offer, please fill out this form.

Kamila Jeřábková

NVIDIA, Lenovo ISG

+420 734 161 516

kamila.jerabkova@mcomputers.cz

NVIDIA offers special programs for DGX systems and Tesla accelerators for EDU organizations or start-up companies. Thanks to the international collaboration between NVIDIA and IBM Global Financing, preferential financing in the form of operating leases is available for DGX models.

Get an offer