The Art of SIMD Programming by Sergey Slotin

Modern hardware is highly parallel, but not only in terms of multiprocessing. There are many other forms of parallelism that, if used correctly, can greatly boost program efficiency — and without requiring more CPU cores. One such type of parallelism actively adopted by CPUs is "Single Instruction, Multiple Data" (SIMD): a class of instructions that can perform the same operation on a block of 16, 32, or 64 bytes of data in one go, yielding a proportional speedup over scalar code.

While SIMD shares many similarities with classic multiprocessor computing, it is quite different and often requires creative use of the instruction set. In this talk, we will give a general introduction to the technology (focusing on x86/AVX2), derive and implement several state-of-the-art SIMD algorithms, and discuss their use in impactful open-source projects.
Be the first to comment