Saturday, April 25, 2020

Parallelism at instruction level

SIMD aka Single instruction multiple data. It is a data level parallelism but not concurrency. SIMD allows simultaneous (parallel) computation on data but with single instruction at a given moment.

Example: Say, have two sets of integer array of same length and would like to add members in pairs only. That mean, add array_1[0] and array_2[0] elements and store the result in another array, say result[0]. Similarly add array_1[1] with array_2[1] and store in result[1] and so on.

Here, each result is independent of another and hence we can leverage instruction level parallelism. So, Intel x86 architecture, we have SSE (streaming SIMD extension). It has provided a set of registers [xmm0 - xmm7] (128 bits) with additional instructions for scalar as well as packed data operation of single instruction type at any point of time.

The details on SSE can be found in wiki page.

The following C++ routine (x86) shows how can we use inline assembly instruction to detect if processor has support for SSE or not. The code is Intel x86 based code and used Visual Studio 2019.

std::string getCPUManfactureID()
{
    uint32_t data[4] = {0};

    __asm
    {
        cpuid;
        mov data[0], ebx;
        mov data[4], edx;
        mov data[8], ecx;
    }

    return std::string((const char*)data);
}

void getCPUFeaturesforSSE()
{
    int regDX, regcx;
    __asm
    {
        mov eax, 1;
        cpuid;
        mov regDX, edx;
        mov regcx, ecx;
    }

    if ((regDX & (1 << 26)) != 0)
    {
        std::cout << "SSE2 is supported..." << std::endl;
    }
    if ((regcx & 1) != 0)
    {
        std::cout << "SSE3 is supported..." << std::endl;
    }
    if ((regcx & (1 << 19)) != 0)
    {
        std::cout << "SSE4.1 is supported..." << std::endl;
    }
    if ((regcx & (1 << 20)) != 0)
    {
        std::cout << "SSE4.2 is supported..." << std::endl;
    }
}

No comments:

Post a Comment