Our experience in high-performance computational chemistry, led us to the choice of structuring our computational resources as clusters of homogenous computational nodes (that most often are commercial off-the-shelf, hardware), managed by a POSIX Operating System (Linux) and sitting behind a batch system (Torque/MAUI) to serve long running jobs. Computational chemistry problems are, in most cases, running for weeks or months, so one of the constrain we have to meet, is maintaining a large number of healthy and efficient machines, up and running for very long periods, often for years.

HPC chemistry problems are, at the same time, expensive, in terms of CPU loads and I/O activity, and large, in terms of memory and storage capacity. Hence, our machines have to satisfy these requirements and, at the same time, be of course as inexpensive as possible. Over time, these constrains led us to conceive a “model” of “machine”, where the machine is actually a network of computational nodes, that we have implemented over and over again for building our HPC infrastructure.
At present the latter can be summarized as follows:

892 CPU computational cores with 5.25 Tb of installed RAM
6.2Tflops CPU computing power
4 GPU computational cores with 12Gb of installed RAM
2 Tflops GPU computing power
78Tb scratch workload disk space

which concur to create our internal HPC grid.

The following picture, a live snapshot of the cluster created by the Ganglia tool, gives an idea of the state of our computational resources in a normal working day: 



Our clusters of homogenous (in terms of hardware and software architectures) nodes are as follows:

Intel Simplecore sccw (one rack) x86_64 cluster (March 2013)

  ◦ one access master node

  ◦ 84 CPU computational cores with 455 Gb of installed RAM
  ◦ 600Gflops CPU computing power
  ◦ 6Tb scratch workload disk space             


Intel Multicore mccw (two racks) x86_64 cluster (October 2013)
  ◦ one access master node
  ◦ 808 CPU computational cores with 4.7Tb of installed RAM
  ◦ 5.6Tflops CPU computing power
  ◦ 4 GPU computational cores with 12Gb of installed RAM
  ◦ 2Tflops GPU computing power
  ◦ 72Tb scratch workload disk space

• Intel Multicore tccw x86_64 cluster (June 2017)
 ◦ one access master node
 ◦ 256 CPU computational cores with 2Tb of installed RAM
 ◦ 25Tb scratch workload disk space


As can be expected, Linux plays a fundamental role as the building block of our computational solutions, as it is inexpensive, open source, easy to manage, flexible and stable at the same time, but we have started our POSIX experience a long time ago, using IBM AIX (a proprietary UNIX variant) and, at the same time, we strive to support Windows users.

Besides serving the day by day molecular science research activity, our resources are used to develop new programs and tools, ranging from the implementation of theoretical chemistry algorithms to the application of parallel and grid paradigms to ab initio techniques, in a modern and up to date information technology environment.

The HPC Grid is completed by an IBM Datacenter, SAN centric, virtualized infrastructure implementing storage, networking and tape library services

2 SAN switches
2 LAN switches
3 Fibre Channel storages for a totale of 12Tb disk space
LTO4 800Gb Fibre Channel 44 tapes library
one BladeCenter server
  ◦ one access module
  ◦ 7 installed blade servers
  ◦ 2 LAN switches
  ◦ 2 SAN switches
5 IBM x346 server
1 IBM x306 server

and the infrastructure is powered by a 40Kw Uninterruptible Power Supply, deployed behind a virtualized firewall, on an Internet connected, private, gigabit switched network.

The private network is accessible, to our researchers, from the Gigabit local area network of the Department of Chemistry of the University of Perugia, through a directly connected Gigabit link and from world wide research labs, through our firewall, via a GARR 10 gigabits link.



Software and hardware technical Managers: Giuseppe VitillaroEdoardo Mosconi