TACC: Engineering Research in HPC

2nd Generation Intel® Xeon® Scalable processors and Intel® Optane™ DC persistent memory speed processing and memory capacity.

Executive Summary
The Texas Advanced Computing Center (TACC) continuously re-invents supercomputing at larger and larger scale to enable breakthrough research and deliver the resources that scientists need. Frontera, a 38.75 petaFLOPS cluster, that earned the #5 ranking on the June 2019 Top500 list,1 is its latest supercomputing system comprising nearly a half-million cores of 2nd Generation Intel® Xeon® Scalable processors inside Dell EMC PowerEdge* servers.

Challenge
The Texas Advanced Computing Center (TACC) is a world-renowned facility for supercomputing, enabling new discoveries across a range of disciplines in science and industry.

“Our mission here at the Texas Advanced Computing Center,” said TACC’s Executive Director, Dr. Dan Stanzione, “is to provide groundbreaking new computing capabilities to enable new kinds of scientific discoveries, and new kinds of engineering research.”

Deployed in 2017, TACC’s Stampede2 supercomputer incorporated the latest Intel® Xeon® Scalable processors inside Dell EMC PowerEdge* servers and including Intel® Omni-Path Architecture fabric. Designed as a capability machine, Stampede2 will support three to four thousand projects over its lifetime. But, every few years, TACC looks at the kinds of problems that researchers are tackling and what types of architecture will offer the best support for that science. Some of those problems address the ‘grand challenges’ of our time and require computing on a massive scale.

“We’re looking at control problems around fusion reactors,” commented Stanzione as he offered an example of the kinds of massive scale research that will require new levels of supercomputing performance. “We’re looking at mantle convection as a whole Earth problem, where you see single simulations across the entire planet.”

Such a scale of problems requires a different scale of supercomputer than Stampede2.

Frontera hardware and software system overview.

Solution
Frontera is TACC’s newest supercomputer, supported by a $60 million award from the U.S. National Science Foundation. It contains a large main system that will deliver peak performance of 38.71 petaFLOPS, according to Stanzione. The main system is built on the 2nd Gen Intel® Xeon® Platinum processor with 8,008 dual-socket nodes of 56 cores per node, interconnected by InfiniBand* Architecture at 100 Gbps. Its 448,448 cores give TACC more computing capacity and memory capacity than the center has had in the past.

By selecting Intel’s latest server processor, frontera offers:

  • A higher clock rate than previous systems, delivering higher single-thread performance
  • More processor cores to run more threads at the same time
  • More memory bandwidth that can feed data to all those cores

“Frontera will address a narrower mission than Stampede2,” explained Stanzione. “Instead of supporting thousands of projects, we’ll have a few hundred that have an extraordinary computational need and massive scale of computation. It’ll solve the very biggest sort of grand challenge projects in the scientific ecosystem. We’ll be running calculations at a speed and at a scale that we’ve never been able to do before.”

Frontera will also support new technologies previously unavailable, including Intel® Deep Learning Boost (Intel® DL Boost) targeted for artificial intelligence workloads. These new technologies will help TACC supercomputer designers understand better which of these are useful to researchers, so the technologies can be integrated into the next next-generation TACC machine slated for 2025. One such technology is Intel® Optane™ DC persistent memory.

“Intel® Optane™ DC persistent memory,” commented Stanzione, “has several unique characteristics for us that offer advantages over traditional memory and advantages over traditional storage. There are many potential interesting use cases, such as very, very large memory nodes—multiple terabytes per node—or simple fault tolerance. When a server fails, we can keep the state of memory and allow the computation to keep running, versus having to restart it across the whole 8,008 nodes that make up the machine.”

“Intel® Optane™ DC persistent memory has several unique characteristics for us that offer advantages over traditional memory and advantages over traditional storage." —TACC’s Executive Director, Dr. Dan Stanzione

Result
Grand challenge problems need massive computing capacity.

“It’s going to be a remarkably productive system,” said Stanzione. “We think, in terms of real science throughput, we’ll get three or four times the performance of its predecessor.”

Beyond the Standard Model
With the discovery of the Higgs boson using the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland, the final piece of the Standard Model of Physics was put in place. Now, scientists around the world are looking Beyond the Standard Model to gain a finer sense of what makes up high-energy particle physics. The LHC, with one of its detectors called ATLAS (A Toroidal LHC ApparatuS), will again be at the center of their research. CERN plans on increasing the number of LHC collisions by a factor of ten in the coming years.

The LHC requires enormous amounts of computing capacity to interpret its collisions. CERN scientists have run workloads on Stampede2. Now that Frontera is operational, CERN will have a much larger system to use to understand what is happening at these subatomic scales.

“We simulate the detector response to a given physics model,” said Robert Gardner, a research professor in the Enrico Fermi Institute at the University of Chicago, who co-leads the distributed computing facility group for the U.S. ATLAS collaboration.

“When we’re doing the analysis on the actual data, we may plot some distributions such as the particle mass, transverse momentum, or the ‘missing energy’ in the collision. And you get the number of candidates that we have for the raw data coming off the detector. Then we compare those to different kinds of models and see if we can match up the distributions. This provides clues to what might be actually happening during the collisions.”

From Nuclear Fission to Fusion Power
Another area involving global scientific collaboration is innovating new resources for supplying the world’s power needs. From more efficient wind generation to battery research and hydrogen mining from water, science is trying to find clean alternatives to fossil fuels.

Nuclear fusion—the merging of nuclei to release massive amounts of energy, like Earth’s Sun does—is considered the holy grail of energy production, without the drawbacks of today’s fission reactors. In France, such a reactor—the International Thermonuclear Experimental Reactor (ITER)—is being built by a consortium of seven governments. Scheduled for a 2025 completion date, it is designed to produce 20 to 25 times more power than it uses.

An urgent problem for designers is to be able to accurately and reliably predict—and avoid—large-scale disruptions. But for years, scientists have struggled to match physics models and simulations with the dynamics in a real reactor.

“If you try to use conventional theoretical methods, buttressed by high performance computing, you still aren’t going to be able to make predictions,” said William Tang, principal research physicist at the Princeton Plasma Physics Laboratory—the U.S. DOE National Lab for fusion studies. “You needed the impact of big data analytics that can deal with a lot of data that’s relevant to disruptions.”

Tang and his team have turned to Artificial Intelligence to help solve the problem. The team developed the Fusion Recurrent Neural Net (FRNN) Code, deploying deep learning for better predictions. Their code can predict disruption events with 90+ percent accuracy more than 30 milliseconds ahead of the disruption trigger event. Tang will take advantage of Frontera’s new resources for deep learning to further his research with the FRNN code and develop a control system that can avoid disruptions in ITER.

Computation for World Problems
Other challenges requiring massive computing scale include using precision agriculture and genomics to feed the world’s growing population and innovating cleaner coal combustion, which is still a leading source of energy.

“We need systems like Frontera to answer the big questions of our time, such as the sustainability of the environment and renewable energy,” said Professor Gardner. “We have to continue to work on frontier science and everything that comes after it, and we can’t do that without computation.”

A view between two rows of Frontera servers in the TACC Data Center.

Solution Summary
Frontera was built to support a new, much larger scale of scientific computing than TACC previously was able to. Built on 2nd Generation Intel® Xeon® Platinum processors inside Dell EMC PowerEdge* servers, with nearly half a million cores, Frontera will deliver a peak performance of 38.7 petaFLOPS, according to TACC’s Executive Director Dan Stanzione. The new supercomputer will also allow scientists to test new technologies, including Intel® Optane™ DC persistent memory, to assess how the supercomputing center might implement these technologies on their next next-generation supercomputer.

Frontera Highlights

  • 8,008 dual-socket Dell PowerEdge* C6420 servers with 2nd Generation Intel® Xeon® Scalable processors (448,448 cores total)
  • Peak performance of 38.7 petaFLOPS1
  • 50 nodes with Intel® Optane™ DC persistent memory
  • #5 most powerful supercomputer in the world, and the fastest at any university

Solution Ingredients

  • 8,008 Dell EMC PowerEdge C6420 compute nodes, consisting of 2nd Generation Intel® Xeon® Platinum processors, 56 cores per node
  • Intel® Optane™ DC persistent memory

Solution Summary
Frontera was built to support a new, much larger scale of scientific computing than TACC previously was able to. Built on 2nd Generation Intel® Xeon® Platinum processors inside Dell EMC PowerEdge* servers, with nearly half a million cores, Frontera will deliver a peak performance of 38.7 petaFLOPS, according to TACC’s Executive Director Dan Stanzione. The new supercomputer will also allow scientists to test new technologies, including Intel® Optane™ DC persistent memory, to assess how the supercomputing center might implement these technologies on their next next-generation supercomputer.

Frontera Highlights

  • 8,008 dual-socket Dell PowerEdge* C6420 servers with 2nd Generation Intel® Xeon® Scalable processors (448,448 cores total)
  • Peak performance of 38.7 petaFLOPS1
  • 50 nodes with Intel® Optane™ DC persistent memory
  • #5 most powerful supercomputer in the world, and the fastest at any university

Solution Ingredients

  • 8,008 Dell EMC PowerEdge C6420 compute nodes, consisting of 2nd Generation Intel® Xeon® Platinum processors, 56 cores per node
  • Intel® Optane™ DC persistent memory

Explore Related Intel® Products

Intel® Xeon® Scalable Processors

Drive actionable insight, count on hardware-based security, and deploy dynamic service delivery with Intel® Xeon® Scalable processors.

Learn more

Intel® Optane™ DC Persistent Memory

Extract more actionable insights from data – from cloud and databases, to in-memory analytics, and content delivery networks.

Learn more

Intel® Deep Learning Boost

Intel® Xeon® Scalable processors take embedded AI performance to the next level with Intel® Deep Learning Boost (Intel® DL Boost).

Learn more

Notices and Disclaimers

Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at https://www.intel.co.uk. // Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit https://www.intel.co.uk/benchmarks. // Performance results are based on testing as of the date set forth in the configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. // Cost reduction scenarios described are intended as examples of how a given Intel®-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. // Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. // In some test cases, results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling, and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance.

Product and Performance Information

1

Testing conducted by TACC for July 2019 TOP500 rating. See https://www.top500.org/system/179607.