Vector selection
- Memory—even low-level cache—works with only one value at a time
- Vector selection, or "shuffle", was introduced to x86 with SSSE3 in 2006
- Supplemental Streaming SIMD Extensions 3… What were they thinking?
- ARM and POWER vector instruction sets have supported vector selection from the beginning
- The shuffle instruction selects from the 16 bytes in one vector register using 0-based indices from another
- To select from more than one register, multiple shuffles must be used
- Selecting 2-, 4-, or 8-byte elements is usually not supported