Jump to content

Fujitsu A64FX: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
TheKanter (talk | contribs)
Adding information about on-die Tofu interconnect.
TheKanter (talk | contribs)
m added invisible comment
Line 36: Line 36:
It has "Four-operand [[fused multiply–add|FMA]] with Prefix Instruction",<ref name="FujitsuHotChips" /> i.e. MOVPRFX instruction followed by [[FMA instruction set#FMA3 instruction set|FMA3]] ([[AArch64|ARM]], like [[RISC]] in general, is a 3-operand machine, doesn't have space for 4 operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)[[general_matrix_multiply|GEMM]] and INT16/8 [[dot product]]".<ref name="FujitsuHotChips" />
It has "Four-operand [[fused multiply–add|FMA]] with Prefix Instruction",<ref name="FujitsuHotChips" /> i.e. MOVPRFX instruction followed by [[FMA instruction set#FMA3 instruction set|FMA3]] ([[AArch64|ARM]], like [[RISC]] in general, is a 3-operand machine, doesn't have space for 4 operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)[[general_matrix_multiply|GEMM]] and INT16/8 [[dot product]]".<ref name="FujitsuHotChips" />


The processor uses 32 gigabytes of [[HBM2]] memory with a bandwidth of 1&nbsp;TB per second.<ref name="2018-08-22-fujitsu-pr" /> The processor contains 16 [[PCI Express|PCI Express generation 3]] lanes<ref name="FujitsuHotChips" /> to connect to accelerators (<!--not mentioned in source, while "NVM-based File I/O accelerator" is-->hypothetical e.g. [[general-purpose computing on graphics processing units|GPUs]] and [[field-programmable gate array|FPGAs]]). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28Gbps to connect multiple nodes in a cluster<ref name="FujitsuHotChips" />. The reported transistor count is about 8.8<!--8.786--> billion.<ref name="2018-08-22-fujitsu-pr" />
The processor uses 32 gigabytes of [[HBM2]] memory with a bandwidth of 1&nbsp;TB per second.<ref name="2018-08-22-fujitsu-pr" /> The processor contains 16 [[PCI Express|PCI Express generation 3]] lanes<ref name="FujitsuHotChips" /> to connect to accelerators (<!--not mentioned in source, while "NVM-based File I/O accelerator" is. +1 the PCIe lanes are meant for storage, not accelerators.-->hypothetical e.g. [[general-purpose computing on graphics processing units|GPUs]] and [[field-programmable gate array|FPGAs]]). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28Gbps to connect multiple nodes in a cluster<ref name="FujitsuHotChips" />. The reported transistor count is about 8.8<!--8.786--> billion.<ref name="2018-08-22-fujitsu-pr" />


Each A64FX processor has 4 NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor.<ref name="2020-odajima" /><ref name="2019-11-13-fujitsu-pr" /><ref name="2020-06-28-fujitsu-specs" /> Each NUMA node also has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes.<ref name="2020-odajima" />
Each A64FX processor has 4 NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor.<ref name="2020-odajima" /><ref name="2019-11-13-fujitsu-pr" /><ref name="2020-06-28-fujitsu-specs" /> Each NUMA node also has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes.<ref name="2020-odajima" />

Revision as of 21:56, 18 October 2021

A64FX
General information
Launched2019
Marketed byFujitsu
Designed byFujitsu
Common manufacturer
Architecture and classification
Technology node7 nm
MicroarchitectureIn-house
Instruction setARMv8.2-A with SVE and SBBA level 3
Physical specifications
Cores
  • 48 per CPU[1] plus optional assistant cores[2][3]

The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu.[1][4] The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications.[5] It powers the Fugaku supercomputer, the fastest supercomputer in the world by TOP500 rankings as of June 2020[4][5][6][7] as well as November 2020 and June 2021.

Design

Fujitsu collaborated with ARM to develop the processor; it is the first processor to use the ARMv8.2-A Scalable Vector Extension SIMD instruction set with 512-bit vector implementation.[4]

It has "Four-operand FMA with Prefix Instruction",[1] i.e. MOVPRFX instruction followed by FMA3 (ARM, like RISC in general, is a 3-operand machine, doesn't have space for 4 operands), which get packed into a single operation in the pipeline. For the processor the designer claim ">90% execution efficiency in (D|S|H)GEMM and INT16/8 dot product".[1]

The processor uses 32 gigabytes of HBM2 memory with a bandwidth of 1 TB per second.[4] The processor contains 16 PCI Express generation 3 lanes[1] to connect to accelerators (hypothetical e.g. GPUs and FPGAs). The processor also integrates a TofuD fabric controller with 10 ports implemented as 20 lanes of high-speed 28Gbps to connect multiple nodes in a cluster[1]. The reported transistor count is about 8.8 billion.[4]

Each A64FX processor has 4 NUMA nodes, with each NUMA node having 12 compute cores, for a total of 48 cores per processor.[8][2][3] Each NUMA node also has its own level 2 cache, HBM2 memory, and assistant cores for non-computational purposes.[8]

Fujitsu intends to produce lower specification machines with reduced assistant cores.[2][3] Reliability, availability and serviceability (RAS) capabilities are claimed, i.e. ~128,400 error checkers in total.

In June 2020 the Fugaku supercomputer using this processor reached 442 petaFLOPS and became the fastest supercomputer in the world.

Implementations

Fujitsu designed the A64FX for the Fugaku. As of June and November 2020, the Fugaku is the fastest supercomputer in the world by TOP500 rankings.[9] Fujitsu intends to sell smaller machines with A64FX processors.[2][3] Anandtech reported in June 2020 that the cost of a PRIMEHPC FX700 server, with 2 A64FX nodes, was ¥4,155,330 (c. US$39,000).[10]

Cray is developing supercomputers using the A64FX.[11][12] The Isambard 2 supercomputer is being built for a consortium in the United Kingdom, led by the University of Bristol and also including the Met Office, using the Fujitsu processors.[13][14] It is an upgrade to the Isambard supercomputer which was built with the Marvell ThunderX2, another ARM architecture microprocessor.[14]

Ookami is an open testbed system supported by NSF run by Stony Brook University and the University at Buffalo providing researchers access to A64FX processors.

See also

References

  1. ^ a b c d e f "Hot Chips 30 conference; Fujitsu briefing" (PDF). Toshio Yoshida. Archived from the original (PDF) on 5 December 2020.
  2. ^ a b c d "Fujitsu Launches New PRIMEHPC Supercomputers Using Fugaku Technology - Fujitsu Global". www.fujitsu.com. 13 November 2019. Retrieved 28 June 2020.
  3. ^ a b c d "FUJITSU Supercomputer PRIMEHPC Specifications". www.fujitsu.com. Retrieved 28 June 2020.
  4. ^ a b c d e "Fujitsu Successfully Triples the Power Output of Gallium-Nitride Transistors - Fujitsu Global". www.fujitsu.com. Fujitsu. Retrieved 8 March 2020.
  5. ^ a b Morgan, Timothy Prickett (24 August 2018). "Fujitsu's A64FX Arm Chip Waves The HPC Banner High". The Next Platform. Retrieved 8 March 2020.>
  6. ^ "Outline of the Development of the Supercomputer Fugaku | RIKEN Center for Computational Science RIKEN Website". www.r-ccs.riken.jp. Retrieved 18 November 2020.
  7. ^ "Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D | TOP500". www.top500.org. Retrieved 18 November 2020.
  8. ^ a b Odajima, Tetsuya; Kodama, Yuetsu; Tsuji, Miwako; Matsuda, Motohiko; Maruyama, Yutaka; Sato, Mitsuhisa (September 2020). "Preliminary Performance Evaluation of the Fujitsu A64FX Using HPC Applications". 2020 IEEE International Conference on Cluster Computing (CLUSTER): 523–530. doi:10.1109/CLUSTER49012.2020.00075. ISBN 978-1-7281-6677-3. S2CID 226266547.
  9. ^ "Supercomputer Fugaku - Supercomputer Fugaku, A64FX 48C 2.2GHz, Tofu interconnect D | TOP500". www.top500.org. Retrieved 18 November 2020.
  10. ^ Cutress, Dr Ian (26 June 2020). "HPC Systems Special Offer: Two A64FX Nodes in a 2U for $40k". www.anandtech.com. Retrieved 28 June 2020.
  11. ^ "Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020". HPCwire. 13 November 2019. Retrieved 8 March 2020.
  12. ^ Tsukimori, Osamu (7 January 2021). "Japan's Fugaku supercomputer is tackling some of the world's biggest problems". The Japan Times. Retrieved 26 January 2021.
  13. ^ Bristol, University of. "February: GW4 Isambard - News and features - University of Bristol". www.bristol.ac.uk. Retrieved 8 March 2020.
  14. ^ a b Burt, Jeffrey (9 March 2020). "Isambard 2 Is About Driving Technology Diversity". The Next Platform. Retrieved 9 March 2020.
-