SIMD support on Lambda
SIMD has drastic performance impacts. But instruction sets are not supported by any hardware.
Written on May 20, 2020
A few weeks ago, we talked about Arrow and how it was optimizing memory layout to benefit from SIMD optimizations. We mentioned that not all hardware supported all instruction sets:
Some of the oldest, like the ILLIAC IV [1], date back to the 70s.
SSE-1 was introduced on Pentium 3 in 1999
SSE-2 was introduced on Pentium 4 in 2000
SSE-3 first shipped in 2004
SSE-4 was introduced in 2007
AVX was introduced on Sandy Bridge in 2011
AVX2 was introduced in 2013
AVX512 first shipped on Knights Landing in 2016
Each new version supports more datatypes and/or processes more scalars in parallel (128-bit registers for SSE, 256-bit for AVX2, and surprise... 512-bit for AVX512!). Of course, it takes years for processors with the new instruction sets to arrive at your hardware store and to become generally available in the cloud data-centers. For instance, AWS first communicated in 2017 the availability of AVX512 only on its compute optimized instances [2].
On AWS Lambda, you typically do not have any guarantee on the type of hardware your function is going to be running on. For this reason, we might want to avoid making hard dependencies on given instruction sets. Currently Arrow does not provide the possibility to choose the instruction set at runtime, and by default it builds with SSE4.2. For now, we decided to keep this default setting as it has never generated problems when running on Lambda. This instruction set is more than 10 years old and probably one of the most widespread, so it is an acceptable risk for not.
We regularly check the availability of the different instruction sets within the lambdas we run. You can do so very easily with a simple system call in Python:
def lambda_handler(event, context): cmdline = ["cat", "/proc/cpuinfo"] print("Run CMD: ", cmdline) subprocess.check_call(cmdline, shell=False, stderr=subprocess.STDOUT)
The "flags" section contains the supported instruction sets:
We see indeed that AVX2 and AVX512 are not mentioned. Limiting ourselves to SSE4.2 is a bit restrictive but it is sufficient for now. The good news is that runtime SIMD dispatching is a hot topic in the Arrow community, which makes it likely that we will soon be able to perfectly adapt to the instruction sets available during each individual lambda run!