RCI cluster delivery to FEL ČVUT in Prague
We successfully implemented another supercomputer in the Czech Republic — the RCI cluster at Czech Technical University in Prague, The Faculty of Electrical Engineering . A press conference and discussion on the RCI project took place today.
The Faculty of Electrical Engineering and the Faculty of Information Technology of the Czech Technical University in Prague have joined forces to create a Research Center for Informatics. The five-year project was supported by the European Union with a financial contribution of around 500 million CZK. From these funds, CTU in Prague has built the most powerful compute cluster for AI applications in the Czech Republic for 2M EUR. The supercomputer is designed for research in the field of artificial intelligence and is located in the underground floor of the historic CTU FEE building on Charles Square, which brings some challenges for the implementation.
About RCI
The Research Center for Informatics is the cutting edge of scientific excellence in computer science and artificial intelligence. It was set up in March 2018 and is planned to run until 2022. The aim of the project is to strengthen collaboration between basic and applied research, invite qualified foreign scientists to school and connect experienced scientists with young students.
The Center is managed by the Head of the Department of Computer Science, Faculty of Electrical Engineering, CTU in Prague prof. Michal Pěchouček, who says about the project: “Today we have an amazing opportunity to work on a common research goal across the university. We all want to push the development of computer science to the international level, and that connects us. Everyone is an expert in their field and will bring value in research into machine learning, artificial intelligence, theoretical computer science, bioinformatics, cyber security or computer graphics. In addition, with the newly built computer cluster, we open up opportunities that we never dreamed of before.”
“NVIDIA V100 Tensor Core GPUs are the most powerful accelerators for high performance computing and AI. The total installed performance of more than 6 PetaFLOPS makes the CTU installation the most powerful supercomputer for AI applications in the Czech Republic,” said Rod Evans, EMEA Director Supercomputing and AI at NVIDIA.
Putting this powerful supercomputer into operation was not an easy task. “The entire installation took us six weeks, including the delivery of all hardware components, construction work, air conditioning installation and software startup,” explains Petr Plodík from M Computers, who was in charge of implementing the cluster. “Building such a cluster in the center of Prague is quite unique. We had to create new wiring for power and cooling, and we also faced the limits of the local transformer station,” says Plodík.
Technical parameters
The technical parameters are admirable. The supercomputer consists of 20 Gigabyte CPU nodes with Intel Xeon Gold processors (480 processor cores in total), 12 Supermicro NVIDIA GPU computing nodes, one Lenovo ThinkSystem SR950 node with large CPU cores and shared memory (192 processor cores, 1.5TB operational memory), Infiniband EDR (100Gb / s) high speed networking from Mellanox, Western Digital fast NVMe SSDs and DELL EMC Isilon shared scalable disk array. Each GPU node is equipped with four NVIDIA V100 Tensor Core GPUs. In total, the cluster includes 48 accelerators with a total of 245,760 CUDA cores, 30,720 Tensor Cores, and total computing power over 6 PetaFLOPS in AI operations. With this performance, researchers at the Information Technology Research Center will be able to make full use of the deep learning method, which is key to artificial intelligence, such as speech recognition, virtual personal assistants training, and autonomous car driving. And in these areas, thanks to the IT Research Center project, the Czech Republic will be able to compete with foreign universities and centers of excellence.
More information about the project can be found at http://www.rci.cvut.cz/.
Additional features of RCI supercomputer
- Difference compute architectures in one environment — CPU compute nodes, GPU compute nodes (about 80% of performance), SMP node.
- Total theoretical compute performance in LinPack (Rpeak) is cca 470 TFLOPs (sum of all CPUs and GPUs) vs. 6 PetaFLOPS for AI applications (48× Nvidia Tesla V100 125 TFLOPs).
- Total of 245 760 CUDA cores, 30 720 tensor cores.
- Hyperconverged compute cluster — each compute node is populated by NVMe disc HGST SN200 (3,3GB/s read, 2,1GB/s write, 835 000 IOPS read, 75 000 IOPS write). All these discs across the nodes are connected together by BeeGFS file system with dynamic file systems creation based on running jobs requirements (BeeOND technology).
- Maximum power of the IT part of the supercomputer – 37 kW.
- Supercomputer is able to use application containers from Nvidia GPU Cloud (NGC) — https://ngc.nvidia.com
- Open source programs are available to manage the supercomputer:
- Installation of frameworks and software build: EasyBuild
- Container Management: Singularity
- Boot OS for Cluster Nodes: Warewulf
- File system for scratch data storage: BeeGFS
- Job Scheduler: SLURM
- Cluster Management: Nagios
- Monitoring: Ganglia
Photo gallery
source: CTU in Prague
Shots from Czech TV