Intel® High Level Synthesis Compiler Pro Edition: Best Practices Guide

ID 683152
Date 12/04/2023
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

5.4. Minimize Loop-Carried Dependencies

Loop-carried dependencies occur when the code in a loop iteration depends on the output of previous loop iterations. Loop-carried dependencies in your component increase loop initiation interval (II), which reduces the performance of your component.

The loop structure that follows has a loop-carried dependency because each loop iteration reads data written by the previous iteration. As a result, each read operation cannot proceed until the write operation from the previous iteration completes. The presence of loop-carried dependencies reduces the pipeline parallelism that the Intel® HLS Compiler Pro Edition can achieve, which reduces component performance.

for(int i = 1; i < N; i++)
{
    A[i] = A[i - 1] + i;
}

The Intel® HLS Compiler Pro Edition performs a static memory dependency analysis on loops to determine the extent of parallelism that it can achieve. If the Intel® HLS Compiler Pro Edition cannot determine that there are no loop-carried dependencies, it assumes that loop-dependencies exist. The ability of the compiler to test for loop-carried dependencies is impeded by unknown variables at compilation time or if array accesses in your code involve complex addressing.

To avoid unnecessary loop-carried dependencies and help the compiler to better analyze your loops, follow these guidelines:

Avoid Pointer Arithmetic

Compiler output is suboptimal when your component accesses arrays by dereferencing pointer values derived from arithmetic operations. For example, avoid accessing an array as follows:

for(int i = 0; i < N; i++)
{
    int t = *(A++);
    *A = t;
}

Introduce Simple Array Indexes

Some types of complex array indexes cannot be analyzed effectively, which might lead to suboptimal compiler output. Avoid the following constructs as much as possible:
  • Nonconstants in array indexes.

    For example, A[K + i], where i is the loop index variable and K is an unknown variable.

  • Multiple index variables in the same subscript location.

    For example, A[i + 2 × j], where i and j are loop index variables for a double nested loop.

    The array index A[i][j] can be analyzed effectively because the index variables are in different subscripts.

  • Nonlinear indexing.

    For example, A[i & C], where i is a loop index variable and C is a nonconstant variable.

Use Loops with Constant Bounds Whenever Possible

The compiler can perform range analysis effectively when loops have constant bounds.

You can place an if-statement inside your loop to control in which iterations the loop body executes.

Ignore Memory Dependencies

If there are no implicit memory dependencies across loop iterations, you can use the ivdep pragma to tell the Intel® HLS Compiler Pro Edition to ignore possible memory dependencies.

For details about how to use the ivdep pragma, see Loop-Carried Dependencies (ivdep Pragma) in the Intel® High Level Synthesis Compiler Pro Edition Reference Manual.