Monday, May 4, 2020

SIMD and Vectorization

The process of converting an algorithm from a scalar implementation to a vector process is known as vectorization. That said, a single instruction operate on vector such as array etc.

Automatic vectorization is part of some modern C++ compiler like Microsoft Visual Studio 2019, Intel C++ compiler version 19.x etc. These compilers has automatic vectorizer which uses SIMD instruction provied by Intels SSE/SSE2/SSE3/SSE4/AVX instruction sets.

During the compilation process, while optimization is on, vectorization is expected to take place provided source code adeheres to certain standard. To see if compiler is able to vectorize source code or not we need use some switch. Say for example, in Visual Studio 2019, we need to set /Qvec-report:2 ( in configuration properties ----> C/C++ -------> Command Line )

* /Qvec-report:2 - Enables listing for both scenarios when compiler is able to vectorize as well as cases when compiler is not.
* /Qvec-report:1 - Enable listing only for successful cases of vectorization.

Once we set then we will see following output in Visual Studios build output panel:

1>C:\SIMD\Vectorized\Vectorized\Vectorized.cpp(43) : info C5001: loop vectorized
1>C:\SIMD\Vectorized\Vectorized\Vectorized.cpp(52) : info C5002: loop not vectorized due to reason '1305'

Most of the cases, vectorization identifies code (loop) and vectorizes by its own, however, sometime developers needs to use specific keywords or alignment to vectorize a given piece of code.

We will see in a while, what are the cases when vectorization takes place and cases when it's not.

As per Intel, following are the guidelines to write vectorizable code:
1. Use Simple loop. Avoid complex loop termination condition. The loop trip count must be known while entering the loop at runtime may not be necessary while compile time. However, the trip count must be constant for the duration of the loop.

Example: A simple naive C++ code, where /O2 optimization is on and Microsoft C++ compiler during build vectorized the loop:

<pre>
int main()
{
    const int nMax = 1000;
    float dataStore[1000] = { 0.0 };

    int i = 0;
    while (i < nMax)
    {
        dataStore[i] = (0.1 * i) + 1.0;
        i++;
    }
}
</pre>

Output during build event:
C:\SIMD\Auto_Vectorization\Auto_Vectorization\Auto_Vectorization.cpp(17) : info C5001: loop vectorized

No comments:

Post a Comment