Masaryk University operates the most advanced AI computing system
29. 5. 2023
29. 5. 2023
CERIT-SC at Masaryk University and their newly installed NVIDIA DGX H100 system open the door to close collaboration with scientists from all over the region through e-INFRA CZ. This prestigious network brings together leading research centres and institutions from the Czech Republic that focus on advanced computing technologies and research in the field of artificial intelligence. We have been involved in the e-INFRA CZ project with previous infrastructure deliveries, more here.
CERIT-SC is part of the national e-infrastructure, which is a complex system of interconnected network, computing and storage capacities and related services for the research community in the Czech Republic. CERIT-SC complements the other two components of the national e-infrastructure – the CESNET association and the IT4Innovations supercomputer centre.
Scientists connected to e-INFRA CZ will have access to NVIDIA DGX H100 resources at Masaryk University and will be able to use its computing capacity for their projects. This collaboration will provide an environment for innovative research and development in AI and accelerate progress in areas such as machine learning, big data analytics and AI application development.
e-INFRA CZ is a unique e-infrastructure for research and development in the Czech Republic, which represents a transparent environment providing comprehensive capacity and resources for the transfer, storage and processing of scientific data to all entities engaged in research and development, regardless of the sector in which it is carried out. It creates a communication, information, storage and computing base for research and development at national and international level and provides a comprehensive portfolio of ICT services without which modern research and development cannot be realised.
✓ Being able to train big deep learning models thanks to GPUs memory size.
✓ Faster processing of large datasets
✓ Ability to work on multiple projects simultaneously
✓ speach recognition analysis
✓ 3D image reconstruction
✓ detection of neurodegenerative diseases
✓ TensorFlow / Keras
✓ PyTorch / PyTorch Lighting
✓ CUDA, cuDNN
H100 80GB SXM5
GPU cores
GPU memory
8x NVIDIA H100 80 GB
Dual 56-core 4th Gen Intel Xeon CPU
32x 64 GB DDR5
8x single-port ConnectX-7 VPI 400 Gb/s InfiniBand / 200Gb/s Ethernet
2x dual-port ConnectX-7 VPI 400 Gb/s InfiniBand / 200Gb/s Ethernet
2x 1,92 TB NVMe M.2
8x 3,84 TB NVMe U.2
10 Gb/s onboard NIC (RJ45)
50 GbE optional NIC
RAM
NVMe for OS
NVMe for data
NVIDIA DGX systems aren’t just cutting-edge hardware, they also come with innovative enhancements for easier infrastructure management and AI implementation. They feature a fine-tuned Docker environment and DGX OS, in addition to the new NVIDIA Base Command tool that enables efficient management of the entire infrastructure. This simplifies the deployment and implementation of AI applications for research and development teams.
The system also includes the NVIDIA AI Enterprise (NVAIE) software stack, which provides a complete set of tools for developing and optimizing AI applications. This combination of technologies facilitates and accelerates the process of developing and deploying AI solutions across the entire infrastructure.
Parametr | NVIDIA DGX H100 640 GB |
---|---|
GPUs | 8× NVIDIA H100 SXM5 80 GB |
GPU memory | 640 GB |
CPU | Dual Intel Xeon Platinum 8480C CPU, (112 jader) 2.00 GHz (Base), 3.80 GHz (Max Boost) |
Výkon (FP8 tensor operace) | 32 PetaFLOPS (FP8) |
# CUDA jader | 135 168 |
# Tensor jader | 4 224 |
Multi-instantce GPU | 56 instancí |
RAM | 2 TB |
HDD | OS: 2× 1.92 TB NVMe data: 30 TB (8× 3.84 TB) NVMe |
Network | 8x single-port ConnectX-7 VPI 400 Gb/s InfiniBand/ 200Gb/s Ethernet 2x dual-port ConnectX-7 VPI 400 Gb/s InfiniBand/ 200Gb/s Ethernet |
Max. spotřeba | ~10,2 kW max |
Provedení | rack, 8U |
Technická specifikace | Datasheet |
NVIDIA GPU Cloud (NGC) represents the repository of the most used frameworks for machine learning and deep learning applications, HPC applications, or NVIDIA GPU cards accelerated visualization. Deploying these applications is a question minutes — copying a link of the appropriate Docker image from the NGC repository, moving it on the DGX system, and downloading and running the Docker container.
The individual development environments – versions of all included libraries and frameworks, settings of environment parameters – are updated and optimized by NVIDIA for deployment on DGX systems. https://ngc.nvidia.com/
What makes DGX systems the most different from bare-metal solutions is the software. All of them offer pre-installed and, above all, performance-tuned environments for machine learning (eg Caffe or Caffe 2, Theano, TensorFlow, Torch or MXNet) or an intuitive environment for data analytics (NVIDIA Digits). All of this is elegantly packed in Docker Containers. These constantly updated containers can be downloaded from the website NVIDIA GPU Cloud (NGC).
According to NVIDIA, such a tuned environment provides 30% higher performance for machine learning applications compared to applications deployed purely on NVIDIA hardware. However, the main advantage of a pre-installed environment is the speed of deployment, which can be fully operational in the order of hours.
The strength of the NVIDIA solution is also the support of the whole system. Fast hardware support (in case of failure of any of the components) is a matter of course.
Software support for the entire environment is critical if something does not work as intended. The customer has hundreds of developers ready to help. Support is included with the purchase of all NVIDIA DGX systems. It is available for 3-5 years and can be extended beyond this period.
To test the performance and especially the speed of deployment of ML and AI applications, we have not only systems NVIDIA DGX Station A100 and within NVIDIA Tesla Test Drive Program also accelerators NVIDIA H100, A100, A30 and others, as well as demo licenses for GPU virtualization(vGPU) and a software environment for easy deployment of AI applications – NVIDIA AI Enterprise (NVAIE). If you are interested in our testing offer, please fill out this form.