Supercomputing Innovations and Breakthroughs
Introducing the 2nd Generation Intel® Xeon® Scalable Processors
Discover the flexibility of one platform to handle your AI, simulation and modeling, analytics and other general HPC workloads.
The Convergence of AI and HPC
Artificial intelligence (AI) has the potential to change our lives. It’s powerful and far reaching yet does not require separate highly specialized hardware and software systems to run. Your familiar Intel®-based HPC infrastructure handles AI along with your big-data analytical workloads and more: For more insight, discovery, and competitive advantage.
This solution brief introduces the challenges and opportunities around running AI and analytics workloads on existing HPC clusters.
Learn about capabilities for running TensorFlow* and Apache Spark* on existing HPC systems using standard batch schedulers.
Intel® Select Solutions for HPC
Intel® Select Solutions are pre-configured HPC stacks designed and optimized for specific data-center workloads — like professional visualization, simulation and modeling, and genomic analytics. Available through Intel’s trusted OEM partners, Intel® Select Solutions will accelerate your HPC plans.
Intel HPC Products and Technology
Stay connected to technologies, trends, and ideas that are shaping the future of the workplace with the Intel IT Center.
Product and Performance Information
30x inference throughput improvement on Intel® Xeon® Platinum 9282 processor with Intel® Deep Learning Boost (Intel® DL Boost): Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282 processor (56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, CentOS* 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer syntheticData: 3x224x224, 56 instance/2 socket, Datatype: INT8 vs. Tested by Intel as of July 11, 2017: 2S Intel® Xeon® Platinum 8180 processor @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS* Linux release 7.3.1611 (Core), Linux kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800 GB, 2.5in SATA 6Gb/s, 25nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, synthetic dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50). Intel® C++ Compiler ver. 17.0.2 20170213, Intel® Math Kernel Library (Intel® MKL) small libraries version 2018.0.20170425. Caffe run with “numactl -l”.
4X Linpack performance with 2nd Gen Intel® Xeon® Platinum 9242 processor vs AMD* EPYC* 7601 at scale (4-node, 8-node).
Intel® Xeon® 9242 Processor: Intel Reference Platform with 2S Intel® Xeon® 9242 processors (2.2GHz, 48C), 16x16GB DDR4-2933, 1 SSD, Cluster File System: 2.12.0-1 (server) 2.11.0-14.1 (client), BIOS: PLYXCRB1.86B.0572.D02.1901180818, Microcode: 0x4000017, CentOS* 7.6, Kernel: 3.10.0-957.5.1.el7.x86_64, OFED stack: OFED OPA 10.8 on RH7.5 with Lustre v2.10.4, HBA: 100Gbps Intel® Omni-Path Architecture (Intel® OPA) 1 port PCIe* x16, Switch: (Intel® OPA) Edge Switch 100 Series 48 Port, HPL 2.1, Intel Compiler 2019u1, Intel® Math Kernel Library (Intel® MKL) 2019, Intel MPI 2019u1, HT=ON, Turbo=OFF, 2 threads per core, 4-node=20,408.00, 8-node=39921 GF/s, higher is better, test by Intel on 3/3/2019.
AMD EPYC 7601: Supermicro AS -1023US-TR4, 2S AMD EPYC 7601 (2.2GHz, 32C), 16x16GB DDR4-2666, 1 SSD, BIOS ver: 1.1b (08/20/2018), Microcode ver: 0x8001227, Oracle* Linux Server release 7.5 (3.10.0-862.14.4.el7.crt1.x86_64), Cluster File System: Panasas (124 TB storage) Firmware version 5.5.0.b-1067797.15 EDR based IEEL Lustre, 100Gbps Mellanox EDR MT27700, 36 Port Mellanox EDR IB Switch, OFED MLNX mlnx-4.3-126.96.36.199, HPL 2.2, Intel Compiler 2018u3, AMD BLIS v0.4.0, Intel MPI 2018u3, SMT=ON,Turbo=ON, 2 threads per core, 4Node=4739.96, 8Node=9406.07 GF/s, higher is better, test by Intel on 9/23/2018.
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.co.uk/benchmarks.
Performance results are based on testing as of the dates set forth in the configuration details and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure. Intel® technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.co.uk.