CUDA cores
Tensor cores
CPU cores
DDR4 RAM
1x AMD Epyc 7742
8x 64 GB DDR4
1x 1,92 TB NVMe
1x 7,68 TB NVMe
4x NVIDIA A100 40/80 GB
2x 10 Gb/s Ethernet
1x 1 Gb/s Ethernet
4x Mini DisplayPort
1x VGA
GPU memory
NVMe for OS
NVMe for data
multiple instances
Hardware
Parameter | NVIDIA DGX Station A100 160 GB | NVIDIA DGX Station A100 320 GB |
---|---|---|
GPU | 4× NVIDIA A100 SXM4 40 GB | 4× NVIDIA A100 SXM4 80 GB |
Performance (tensor operations) | 2,5 PetaFLOPS | 2,5 PetaFLOPS |
GPU memory | 160 GB total | 320 GB total |
CPU | 1× AMD Epyc 7742, 64 cores 3.6GHz | 1× AMD Epyc 7742, 64 cores 3.6GHz |
# CUDA cores | 27 648 | 27 648 |
# Tensor cores | 1 728 | 1 728 |
Multi-instantce GPU | 28 instances | 28 instances |
RAM | 512 GB DDR4 | 512 GB DDR4 |
HDD | OS: 1× 1,92 TB NVMe data: 1x 7,68 TB U.2 NVMe | OS: 1× 1,92 TB NVMe data: 1x 7,68 TB U.2 NVMe |
Network | 2x 10 Gb/s Ethernet 1x 1 Gb/s Ethernet (BMC management) | 2x 10 Gb/s Ethernet 1x 1 Gb/s Ethernet (BMC management) |
Graphics outputs | 4x Mini DisplayPort 4GB GPU memory | 4x Mini DisplayPort 4GB GPU memory |
Max. power consumption | 1,5 kW | 1,5 kW |
Form factor | Tower, water cooling | Tower, water cooling |
Software
But what is much more interesting is the aforementioned software features of the offered NVIDIA machines. They all offer pre-installed and especially performance-tuned machine learning environments (e.g. Caffe, or Cafe 2, Theano, TensorFlow, Torch, or MXNet) or intuitive data analytics environments (NVIDIA Digits). All neatly packaged in Docker containers, freely downloadable on the NVIDIA GPU Cloud (NGC). Such a tuned environment provides 30% more power for machine learning applications against applications deployed purely on NVIDIA hardware. The main advantage of the pre-installed environment is the deployment speed, which is in units of hours.
NVIDIA GPU Cloud (NGC)
NVIDIA GPU Cloud (NGC) represents the repository of the most used frameworks for machine learning and deep learning applications, HPC applications, or NVIDIA GPU cards accelerated visualization. Deploying these applications is a question minutes — copying a link of the appropriate Docker image from the NGC repository, moving it on the DGX system, and downloading and running the Docker container. The individual development environments – versions of all included libraries and frameworks, settings of environment parameters – are updated and optimized by NVIDIA for deployment on DGX systems. https://ngc.nvidia.com/
NVIDIA offers special programs for DGX systems and Tesla accelerators for EDU organizations or start-up companies. Thanks to the international collaboration between NVIDIA and IBM Global Financing, preferential financing in the form of operating leases is available for DGX models.
Support
The strength of the NVIDIA solution is to support the entire system. Hardware support (in case of failure of any of the components) is a matter of course. Software support for the entire environment is critical if something does not work as intended. The customer has hundreds of developers ready to help. Support is part of NVIDIA DGX purchase. It is available for 3 or 5 years and can be extended after this period.
NVIDIA support covers:
- Dispatch a customer engineer onsite for field replacement unit (FRU) replacement
- Run onboard diagnostic tools (from remote) to troubleshoot system issues
- Remote software support
- Access to the latest software updates and upgrades
- Direct communication with NVIDIA support experts
- A private NGC container repository for accessing NVIDIA-optimized AI software with powerful sharing and collaboration features for your organization
- A searchable knowledge base with how-to articles, application notes, and product documentation
- Rapid response and timely issue resolution through a support portal and 24×7 phone access
- Lifecycle support for NVIDIA DGX systems’ AI software
- Replacement of defective HW components by the next working day
- DGX Systems Appliance Support Services Terms and Conditions
- End-user license agreement (EULA)
Through a combination of tuned hardware, software and NVIDIA support, NVIDIA DGX delivers significantly higher performance and acceleration in the learning phase of machine learning applications:
Díky kombinaci vyladěného hardwaru, softwaru a podpory NVIDIA přináší systémy DGX výrazně vyšší výkon a zrychlení ve fázi učení aplikací strojového učení:
The NVIDIA Deep Learning Institute (DLI) offers both online and hands-on training for developers, data scientists, and researchers looking to solve challenging problems with deep learning and accelerated computing.
Testing
To test the performance and especially the speed of deployment of ML and AI applications, we have the NVIDIA DGX Station available via NVIDIA Tesla Test Drive program as well as 2× NVIDIA Tesla V100 and NVIDIA Tesla T4. If you are interested in our testing offer, please fill out this form.
Kamila Jeřábková
M: +420 734 161 516
<a href="mailto:kamila.jerabkova@mcomputers.cz">kamila.jerabkova@mcomputers.cz</a>