Becoming AI Ready

How to become AI ready with Intel® Xeon® Scalable processors and implement AI into a current HPC environment.

Modern enterprises increasingly lean on high performance computing (HPC) and artificial intelligence (AI) to advance their organizations. The powerful combination of these technologies -- alongside high-performance data analytics - results in better business intelligence, insights to improve productivity, and innovative approaches for new products and services.

Today, many financial institutions rely on AI’s prowess for automatic fraud detection. In the automotive industry, modeling and simulation with HPC systems help design safer, more fuel-efficient cars. AI can also couple with the Internet of Things (IoT) devices and monitor manufacturing equipment for warning signs of failure and notify the appropriate staff to perform required maintenance. Healthcare providers use AI to examine patients’ CT scans to identify anomalies requiring medical intervention. In many cases, AI can evaluate the scans faster than possible for radiologists. In a fast-moving global economy, AI offers remarkable benefits for today’s enterprises seeking to remain leaders in their respective industries.

Implementing AI on HPC

For organizations seeking to enable machine learning, or neural-network based deep learning, four steps frame the process: Data preparation, model creation and training, quantization, and inference.

  1. Data preparation: Modern corporations have massive volumes of data available to them. Some of that data is structured, tagged, and organized. Other unstructured data, like documents or images, prove more challenging to interpret for meaning. Combining, organizing, and culling data in advance gives AI a jump-start on evaluating it for essential insights.
  2. Model creation and training: AI requires a model, or framework, on which to base its data evaluations. Today, many tools like TensorFlow* (which is optimized for use on Intel® Xeon® platforms) and Apache Spark* help ease this process with tools and libraries to accelerate model creation and implementation. Training a model can take time, and substantial compute resources, making HPC a critical component of the process.
  3. Quantization: In its simplest terms, quantization equates to intelligent data compression. When vast data sets are involved with AI training, quantization helps reduce the size of the data sets to make the process less compute-intensive. Using this technique, enterprises can accelerate calculations and get their AI models up and running more quickly.
  4. Inference: Once an AI model undergoes necessary training for accurate results, it can start sifting through new and existing data to mine for correlations, meaning, and insights. Plus, the more use a model experiences, the smarter it gets.

Steps for implementing AI on HPC

For researchers and data scientists planning for AI implementation on their current HPC, considering a four-step process can help jump-start the process:

  • Data preparation: Advanced science can involve massive amounts of data. Honing that information to eliminate extraneous data, and centralizing it for fast access, will accelerate the AI training process.
  • Model creation and training: While some institutions may have the in-house skills to create custom AI modes, others will benefit from existing tools and libraries like TensorFlow* and Apache Spark* which can reduce the time required for model building by a substantial margin. With a fledgling model in place, training is the next step. As you can see in the diagram, training a model to recognize images is an involved process. For deep learning, various "layers" of a neural network need time to form, build, "weight" successes, and re-test. Over time the model improves its accuracy. Because a process like this is resource-intensive, your HPC system offers an ideal platform for the task.
  • Quantization: AI training can involve truly massive data sets. Quantization encompasses a variety of techniques used for intelligent data compression. Reducing data volume though quantization can speed calculations needed for model training, thereby shortening the time required for the training process.
  • Inference: At this stage, all your efforts creating your model pay off. Because the newly built model can now infer detailed correlations from disparate data sources, it can offer in-depth insights and revelations for breakthrough science.

The Intel platform is AI ready

While many enterprises have adopted HPC already for its multitude of benefits, uptake of AI has lagged behind it. That trend is changing quickly, though. According to IDC, 70 percent of CIOs will aggressively apply data and AI to IT operations, tools, and processes by 2021.1 In the interim, though, companies face perceived barriers like staff expertise for AI implementation, the cost associated with deployment, and the need for specialized IT infrastructure. However, if 2nd Generation Intel® Xeon® Scalable processors, which can deliver up to 3.7x faster performance on HPC workloads,2 underlie your HPC system, you may be closer to AI deployment than you realize. Intel offers an AI-ready platform including tools like Intel® Deep Learning Boost (Intel® DL Boost) – which helps deliver up to 30x faster time to insight vs. previous Gen Intel® Xeon® processors3 – that are purpose-built to ease each step down the road toward AI implementation.

To learn more about AI implementation in your current HPC environment, read our white papers, The Case for Running AI, Analytics on HPC Clusters and Jump-Start Your AI Journey with Your Existing HPC Infrastructure, visit https://www.intel.co.uk/ai or explore the Intel® Select Solution for HPC & AI Converged.

Intel® Xeon® Scalable Processors


Discover all the features and benefits.

Notices and Disclaimers:

Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors.

Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations, and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.co.uk/benchmarks.

Performance results are based on testing as of the date in the configurations and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely secure.

Intel does not control or audit third-party data. You should review this content, consult other sources, and confirm whether referenced data are accurate.

Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.
© Intel Corporation

Product and Performance Information

2

30x inference throughput improvement on Intel® Xeon® Platinum 9282 processor with Intel® Deep Learning Boost (Intel® DL Boost): Tested by Intel as of 2/26/2019. Platform: Dragon rock 2 socket Intel® Xeon® Platinum 9282 processor (56 cores per socket), HT ON, turbo ON, Total Memory 768 GB (24 slots/ 32 GB/ 2933 MHz), BIOS: SE5C620.86B.0D.01.0241.112020180249, CentOS* 7 Kernel 3.10.0-957.5.1.el7.x86_64, Deep Learning Framework: Intel® Optimization for Caffe* version: https://github.com/intel/caffe d554cbf1, ICC 2019.2.187, MKL DNN version: v0.17 (commit hash: 830a10059a018cd2634d94195140cf2d8790a75a), model: https://github.com/intel/caffe/blob/master/models/intel_optimized_models/int8/resnet50_int8_full_conv.prototxt, BS=64, No datalayer synthetic Data: 3x224x224, 56 instance/2 socket, Datatype: INT8 vs. Tested by Intel as of July 11, 2017: 2S Intel® Xeon® Platinum 8180 processor CPU @ 2.50GHz (28 cores), HT disabled, turbo disabled, scaling governor set to “performance” via intel_pstate driver, 384GB DDR4-2666 ECC RAM. CentOS* Linux release 7.3.1611 (Core), Linux* kernel 3.10.0-514.10.2.el7.x86_64. SSD: Intel® SSD DC S3700 Series (800 GB, 2.5in SATA 6Gb/s, 25nm, MLC). Performance measured with: Environment variables: KMP_AFFINITY='granularity=fine, compact‘, OMP_NUM_THREADS=56, CPU Freq set with cpupower frequency-set -d 2.5G -u 3.8G -g performance. Caffe: (http://github.com/intel/caffe/), revision f96b759f71b2281835f690af267158b82b150b5c. Inference measured with “caffe time --forward_only” command, training measured with “caffe time” command. For “ConvNet” topologies, synthetic dataset was used. For other topologies, data was stored on local storage and cached in memory before training. Topology specs from https://github.com/intel/caffe/tree/master/models/intel_optimized_models (ResNet-50). Intel® C++ Compiler ver. 17.0.2 20170213, Intel® Math Kernel Library (Intel® MKL) small libraries version 2018.0.20170425. Caffe run with “numactl -l”.

3

OLTP Warehouse claim of up to 3.7x: 1-node, 2x Intel® Xeon® processor E5-2697 v2 on Canoe Pass with 256 GB (16 slots/16 GB/1866) total memory, ucode 0x42d on RHEL7.6, 3.10.0-957.el7.x86_65, 2x Intel® SSD DC P3700 Series PCIe* for DATA, 2x Intel® SSD DC P3700 Series PCIe* for REDO, HammerDB 3.1, HT on, turbo on, result: transactions per minute=2242024, test by Intel on 2/1/2019 vs. 1-node, 2x Intel® Xeon® Platinum 8280 processor on Wolf Pass with 384 GB (12 slots/32 GB/2933) total memory, ucode 0x4000013 on RHEL7.6, 3.10.0-957.el7.x86_65, 2x Intel® SSD DC P4610 Series for DATA, 2x Intel® SSD DC P4610 Series for REDO, HammerDB 3.1, HT on, turbo on, result: transactions per minute=8459206, test by Intel on 2/1/2019.