Delivery of RCI cluster to FEL CTU

16. 4. 2019

We managed to successfully implement another supercomputer project in the Czech Republic — the RCI computing cluster at FEL CTU. A press conference and discussion about the RCI project took place today.

RCI cluster FEL ČVUT v Praze (zdroj: ČVUT v Praze)

Official press release of CTU in Prague

The Faculty of Electrical Engineering and the Faculty of Information Technology of the CTU in Prague have joined forces and established the Research Center for Informatics (RCI). Within the framework of the project, the most powerful computer cluster for artificial intelligence research in the Czech Republic was built at a cost of 41.6 million crowns. This unique facility, whose performance ranks among supercomputers, is located in the underground of the historic building of the Faculty of Electrical Engineering of the CTU on Charles Square. The RCI Centre of Excellence for Research in Computer Science is a top Czech science centre in the field of computer science and artificial intelligence. The aim of the Centre is to continue to develop internationally competitive research quality, to strengthen cooperation between basic and applied research, to invite qualified scientists from abroad to the university and to connect experienced scientists with young students. The operation of the RCI is funded by the Operational Programme Research, Development and Education under the Call for Excellent Research with a total budget of CZK 580 million. Thanks to the subsidy provided by the Ministry of Education, it was possible to build a computer cluster for research in the field of artificial intelligence, which is the best in the whole country in this respect. The centre is planning a follow-up expansion of the entire facility in 2022. The technical parameters of the facility are NVIDIA V100 Tensor Core GPU, which represents the most powerful accelerator for so-called high performance computing and artificial intelligence. Total installed power of over 6 PetaFLOPS. All this makes the installation at CTU currently the most powerful supercomputer for AI applications in the Czech Republic. The cluster consists of 20 Gigabyte CPU compute nodes with Intel Xeon Gold processors (480 CPU cores in total), 12 NVIDIA GPU Supermicro compute nodes, one Lenovo ThinkSystem SR950 node with a large number of CPU cores and shared memory (192 CPU cores, 1.5 TB of RAM), Mellanox high-speed Infiniband EDR (100Gb/s) interconnect network, Western Digital high-speed NVMe SSDs, and a shared DELL EMC Isilon scalable disk array. Each GPU node is equipped with four NVIDIA V100 Tensor Core GPU accelerators. In total, the cluster is populated with 48 accelerators with a total of 245,760 CUDA cores, 30,720 Tensor Core cores and a total compute performance of over 6 PetaFLOPS in AI operations. With this power, RCI researchers will be able to conduct full-scale basic research on deep learning methods that are key to artificial intelligence, such as for applications in robotics, bioinformatics, cybersecurity application development, and autonomous car control. And in these areas, thanks to the RCI project, the Czech Republic will now be able to compete with foreign universities and centres of excellence. More information about the project can be found at www.rci.cvut.cz. source: https://aktualne.cvut.cz/stalo-se/20190416-cvut-ma-nejvykonnejsi-superpocitac-pro-vyzkum-umele-inteligence-v-cr

Other supercomputer parameters

Parameters and highlights of the installed supercomputer:

Different computer architectures in one environment — CPU nodes, GPU nodes (about 80% performance), SMP node.
Total theoretical compute performance of LinPack Rpeak approx. 470 TFLOPS (sum of installed CPU and GPU) vs. 6 PetaFLOPS for AI applications (48× Nvidia Tesla V100 performance 125 TFLOPS).
Total 245,760 CUDA cores, 30,720 tensor cores.
Hyper-converged compute cluster — each node is equipped with a powerful HGST SN200 NVMe disk (3.3GB/s read, 2.1GB/s, 835,000 IOPS read, 75,000 IOPS write), and a BeeGFS parallel file system is built on top of these disks, with dynamic creation of file systems according to the needs of the computing task (BeeOND technology).
Maximum power consumption of the IT part of the supercomputer — 37 kW.
The supercomputer is able to leverage application containers from the Nvidia GPU Cloud (NGC) — https://ngc.nvidia.com/
Open source programs are available to manage the supercomputer:
- Installing frameworks and software build: EasyBuild
- Container management: singularity
- Booting OS for cluster nodes: warewulf
- File system for the scratch dataspace: beeGFS
- Task scheduler: SLURM
- Cluster management: Nagios
- Monitoring: Ganglia

“The entire installation took us six weeks, including the delivery of all hardware components, installation of the air conditioning and getting the software up and running. Building such a supercomputer in the centre of Prague is quite unique. We had to create new wiring for power and cooling, and we also faced the limits of the local substation.”

Petr Plodík, Sales Director, M Computers

Source: CTU in Prague

Reportage in the programme Události on Czech Television:

https://www.ceskatelevize.cz/ivysilani/1097181328-udalosti/219411000100416/obsah/688845-novy-superpocitac-pro-umelou-inteligenci