1 Petabyte – Smarter and Cheaper

On March, our company has introduced solution designed for Masaryk University. It was an installation with a total capacity of 1 024 TB, i.e. 1 PB (petabyte).


1PB-reference-introduction

1PB reference introduction

Nowadays, scientific and research institution generate large volumes of data that keep growing. Last year, Masaryk University in Brno added a new installation with the total capacity of 1,024 TB, i.e. 1 PB (petabyte) to its list of data storages, with the storage built on IBM Spectrum Scale technology and Infortrend disk arrays being the winner of the tender. The array was supplied by M Computers, a specialized Czech provider.

At the event presented specialists from M Computers (performance reference 1PB storage), IBM (Software Defined Storage for unstructured data file and object) and also Infortrend (versatile storage solution). The event included also a tour of the data center at Masaryk University with demonstration of 1 PB storage.

The solution

img_5548The subject-matter of the demand was the installation and configuration of a “data service cluster” of the net capacity of 1,000 TB. The delivery included hardware assembly, connection and configuration, installation and configuration of software packages and components, performance tuning, verification of the flawless operation of all storage components and finally, the demonstration of the required performance through acceptance testing. The data storage is intended to store data processed in CERIT-SC and MetaCentra computing environments. The users are both from Masaryk University and scientific community across the Czech Republic. In total, there are 1,300 active accesses.

Speed requirement comes first

The contracting authority had many storage-related technical requirements, such as high availability with maximum resilience against breakdowns of each of the components, active-active access through NFSv4 and CIFS protocols due to connections of users from different environments such as Windows or Linux, or utilisation of SSD disks for maximum speed of file operations in the storage.  And specifically, speed requirements were of the utmost importance – 7 GB/s read and 5GB/s write for 16 files in parallel.

Cheaper, yet more powerful and faster

Owing to their know-how, experts from M Computers could select a very cost-effective solution based on Infortrend ESDS 1000 disk arrays with 16 positions in 3U chassis. Despite being primarily targeted at SMB segment, these disk arrays are recognized for their high speed and reliability, great modularity, while providing strong data integrity, and supporting SAS, Fibre Channel or iSCSI connection.  Last but not least, they will facilitate the work of administrators thanks to user-friendly open management.

Solution parameters:
Net capacity: 1PB (1024 TB)
Write speed: 5 GB/s
Speed reading: 7 GB/s
The number od access servers in HA clusteru: 3
Scalability:
– up to 2^99 bytu/fs
– 16 384 nodes in cluster
Snapshots: unlimited +
Easy integration for backup, to cloud or object storage
Price: 4,2 mil CZK without VAT

One disk array returns an output up to 550k IOPS and throughput up to 5,500 MB/s, and is delivered with dual controllers, redundant power supplies and supports up to 316 HDD in the total capacity up to 1,869 TB. The disk array provides an option to use SSD cache, to boost significantly IO reading operations and allow multi-level automated tiering. As the requirements specified maximum throughput of the whole storage, M Computers specialists used Fibre Channel fast interfaces to access data arrays and Infiniband together with 10Gbit Ethernet for data access by users.

Solution that is ready for large volumes of data

img_5619Other requirements included snapshot functionality, i.e. freezing of data at certain point of time. As part of the solved storage, the snapshots are utilised to benefit effective operative backups. The backups allow users to return to the previous version of the file or recover any data deleted by mistake.  For the current day, snapshots generated every fifteen minutes provide fast recovery, while one snapshot daily is available for up to next 14 days.

Large volumes and high speed place high demands on snapshot functionality that most of the systems available on the market cannot meet. However, IBM Spectrum Scale system has unique features in this area and enables to fully harness the snapshot functionality without degrading the performance or other limitations.

Controlling software

Simg_5499pectrum Scale from IBM, as one of the global leading products for software-defined storage, has become a software heart of the storage. The system addresses comprehensively the issue of storing and accessing large data volumes in the field of scientific calculations, big data, analyses, objects, general unstructured data and other. High system scalability enables a maximum file system size up to 2^99 bytu/fs and up to 16,384 nodes in the storage cluster.

The system is designed from the scratch for enterprise deployment, robustness and high performance. It supports hierarchic data storage technology with a unique option of integrating disk, tape and external cloud storage in a single file system with a uniform user access to the data stored on different media.

Support and service

img_5566The solution was delivered including service and in-warranty support for 5 years with the maximum next working day response time from the moment when the failure is reported, with replacement of defective components on the installation site.

The system is connected to 24/7 control centre of M Computers. The control centre technicians have available a full set of spare parts and may respond immediately to service requests.