The world’s fastest supercomputer doesn’t even run a day without problems

“Frontier” should actually help science to unravel mysteries for a long time.¬†But so far the supercomputer is still making trouble.

Supercomputers are not called that for nothing: Enormous computing power should ensure that even the most complex scientific and medical questions and problems will one day be solved. The huge computer complex called Frontier is located in the US Oak Ridge National Laboratory and is scheduled to go into service in 2023.

So far, however, the technicians have been confronted with new problems every day and Frontier can usually only deliver a fraction of the potential available performance. The main reason for this is probably the ambitious structure of the supercomputer.

Can it run Crysis?

Frontier is impressive when you look at the raw numbers. Purely in terms of performance, with its 1.685 exaFLOPS it beats the previous leader of the TOP500 list , the Fugaku from Japan , by more than three times (0.537 exaFLOPS). The following hardware ensures this:

  • Processors¬†: A total of 591,872 CPU cores @2 GHz, spread over 9,248 copies of the 64-core AMD Epyc Trento.
  • Graphics accelerator¬†: AMD technology is also used here, more precisely 36,992 times an Instinct MI250X, each of which has 220 compute units.¬†In total, this results in an impressive 8,138,240 units for calculations.
  • Interconnector¬†: With a huge computer network like Frontier, it is of central importance that the thousands of independent units communicate with each other in the best possible way when they are interconnected.¬†The Interconnector Slingshot-11 is supposed to take care of that in the Frontier.

We save on the power consumption at this point.¬†With a home balcony power plant,¬†like the one our author Dennis Ziesecke built,¬†the researchers won’t get very far.¬†And our other tips for saving electricity can probably only do little with the Frontier supercomputer.¬†But it doesn’t matter, it’s worth listening to at least for you, I promise!

The performance cannot be brought to the street

Frontier’s hardware configuration above gives an idea of ‚Äč‚Äčthe complex undertaking involved in the design and successful operation of the supercomputer, which is estimated to cost 600 million US dollars.¬†And this is exactly where we come to the problems mentioned at the beginning.¬†Because Frontier cannot currently be used in the way those responsible had hoped.

Not a day should go by without a hardware error. And when Frontier does do its job, the performance does not even reach two thirds of its theoretical peak value with just 1 exaFLOPs, which of course is not enough in view of the complex calculations.

In an interview with the InsideHTPC website , program manager Justin Witt explains that the bugs are currently being fixed and that they are not unusual for a project of this magnitude. Let the intervals between occurrences of errors be hours, not days .

Frontier in seiner ganzen Pracht. Hoffen wir mal, dass die Putzkraft keinen Stecker zieht.

In the interview, Justin Witt does not explain exactly where the shoe pinches. Rumors assume that the built-in AMD hardware is not very reliable, but Witt does not confirm this. At the current time there are no concerns about the hardware used . Rather, the sources of the problem are widely spread .

Time will tell if the team at Oak Ridge National Laboratory can finish their supercomputer on time. If so, the queue of those who want to use it to calculate complex simulations is likely to be very long.

What do you think about such gigantic supercomputers? Do you bang your ears when you study the specifications or do you just wave it off because you are only interested in the tangible results that come out at the end? Let us know what you think about this topic in the comments!

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

GIPHY App Key not set. Please check settings