Performance for High Performance Computing (HPC) Platforms1 2 3

Performance for High Performance Computing (HPC) Platforms<sup>1</sup> <sup>2</sup> <sup>3</sup>

Performance for High Performance Computing (HPC) Platforms1 2 3

The latest Intel platform delivers the capability and agility to reduce the need for dedicated systems running specialized hardware and software for unique workloads. In addition, the 2nd Gen Intel® Xeon® Scalable processor offers outstanding performance across the board: Compute, floating point, deep learning, memory bandwidth, platform technologies, density, and real-world application ...performance.

Intel® Xeon® Scalable Processors

Workload-optimized to support high-demand applications and drive actionable insight.

Learn more

Maximize Processor Performance and Memory Bandwidth

The Intel® Server System S9200WK product family is a purpose built, performance-optimized data center block ideal for use in high performance computing (HPC) and AI applications.

Learn more

Breakthrough Performance for Your Real-World Challenges

From AI and analytics to simulation and modeling, Intel’s high performance computing (HPC) platform integrates powerful memory, storage, fabric, and acceleration to tackle your biggest challenges.

Learn more

Related Videos

Product and Performance Information

1

30x inference throughput improvement on Intel® Xeon® Platinum 9282 processor with Intel® Deep Learning Boost (Intel® DL Boost): Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282 processor (56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, CentOS* 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer syntheticData: 3x224x224, 56 instance/2 socket, Datatype: INT8 vs. Tested by Intel as of July 11, 2017: 2S Intel® Xeon® Platinum 8180 processor @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS* Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800 GB, 2.5in SATA 6Gb/s, 25nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, synthetic dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50). Intel® C++ Compiler ver. 17.0.2 20170213, Intel® Math Kernel Library (Intel® MKL) small libraries version 2018.0.20170425. Caffe run with “numactl -l”.

2

4X Linpack performance with 2nd Gen Intel® Xeon® Platinum 9242 processor vs AMD* EPYC* 7601 at scale (4-node, 8-node).

Intel® Xeon® 9242 Processor: 
Intel Reference Platform with 2S Intel® Xeon® 9242 processors (2.2GHz, 48C), 16x16GB DDR4-2933, 1 SSD, Cluster File System: 2.12.0-1 (server) 2.11.0-14.1 (client), BIOS: PLYXCRB1.86B.0572.D02.1901180818, Microcode: 0x4000017, CentOS* 7.6, Kernel: 3.10.0-957.5.1.el7.x86_64, OFED stack: OFED OPA 10.8 on RH7.5 with Lustre v2.10.4, HBA: 100Gbps Intel® Omni-Path Architecture (Intel® OPA) 1 port PCIe* x16, Switch: (Intel® OPA) Edge Switch 100 Series 48 Port, HPL 2.1, Intel Compiler 2019u1, Intel® Math Kernel Library (Intel® MKL) 2019, Intel MPI 2019u1, HT=ON, Turbo=OFF, 2 threads per core, 4-node=20,408.00, 8-node=39921 GF/s, higher is better, test by Intel on 3/3/2019.

AMD EPYC 7601
: Supermicro AS -1023US-TR4, 2S AMD EPYC 7601 (2.2GHz, 32C), 16x16GB DDR4-2666, 1 SSD, BIOS ver: 1.1b (08/20/2018), Microcode ver: 0x8001227, Oracle* Linux Server release 7.5 (3.10.0-862.14.4.el7.crt1.x86_64), Cluster File System: Panasas (124 TB storage) Firmware version 5.5.0.b-1067797.15 EDR based IEEL Lustre, 100Gbps Mellanox EDR MT27700, 36 Port Mellanox EDR IB Switch, OFED MLNX mlnx-4.3-3.0.2.0, HPL 2.2, Intel Compiler 2018u3, AMD BLIS v0.4.0, Intel MPI 2018u3, SMT=ON,Turbo=ON, 2 threads per core, 4Node=4739.96, 8Node=9406.07 GF/s, higher is better, test by Intel on 9/23/2018.

3

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.co.uk/benchmarks.

Performance results are based on testing as of the dates set forth in the configuration details and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.co.uk.