20 “P” Standard Extension for Packed-SIMD Instructions, Version 0.1
In this chapter, we outline a standard packed-SIMD extension for RISC-V. We’ve reserved the instruction subset name “P” for a future standard set of packed-SIMD extensions. Many other extensions can build upon a packed-SIMD extension, taking advantage of the wide data registers and datapaths separate from the integer unit.
Packed-SIMD extensions, first introduced with the Lincoln Labs TX-2 [tx2], have become a popular way to provide higher throughput on data-parallel codes. Earlier commercial microprocessor implementations include the Intel i860, HP PA-RISC MAX [lee-max-ieeemicro1996], SPARC VIS [tremblay-vis-ieeemicro1996], MIPS MDMX [gwennap-mdmx-mpr1996], PowerPC AltiVec [diefendorff-altivec-ieeemicro2000], Intel x86 MMX/SSE [peleg-mmx-ieeemicro1996 raman-sse-ieeemicro2000], while recent designs include Intel x86 AVX [lomont-avx-irm2011] and ARM Neon [goodacre-armisa-computer2005]. We describe a standard framework for adding packed-SIMD in this chapter, but are not actively working on such a design. In our opinion, packed-SIMD designs represent a reasonable design point when reusing existing wide datapath resources, but if significant additional resources are to be devoted to data-parallel execution then designs based on traditional vector architectures are a better choice and should use the V extension.
A RISC-V packed-SIMD extension reuses the floating-point registers
(f0
-f31
). These registers can be defined to have widths
of FLEN=32 to FLEN=1024. The standard floating-point instruction
subsets require registers of width 32 bits (“F”), 64 bits (“D”),
or 128 bits (“Q”).
It is natural to use the floating-point registers for packed-SIMD values rather than the integer registers (PA-RISC and Alpha packed-SIMD extensions) as this frees the integer registers for control and address values, simplifies reuse of scalar floating-point units for SIMD floating-point execution, and leads naturally to a decoupled integer/floating-point hardware design. The floating-point load and store instruction encodings also have space to handle wider packed-SIMD registers. However, reusing the floating-point registers for packed-SIMD values does make it more difficult to use a recoded internal format for floating-point values.
The existing floating-point load and store instructions are used to
load and store various-sized words from memory to the f
registers. The base ISA supports 32-bit and 64-bit loads and stores,
but the LOAD-FP and STORE-FP instruction encodings allows 8 different
widths to be encoded as shown in Table [psimdwidth]. When used
with packed-SIMD operations, it is desirable to support non-naturally
aligned loads and stores in hardware.
width field | Code | Size in bits |
---|---|---|
000 | B | 8 |
001 | H | 16 |
010 | W | 32 |
011 | D | 64 |
100 | Q | 128 |
101 | Q2 | 256 |
110 | Q4 | 512 |
111 | Q8 | 1024 |
Packed-SIMD computational instructions operate on packed values in
f
registers. Each value can be 8-bit, 16-bit, 32-bit, 64-bit,
or 128-bit, and both integer and floating-point representations can be
supported. For example, a 64-bit packed-SIMD extension can treat each
register as 1×64-bit, 2×32-bit, 4×16-bit, or
8×8-bit packed values.
Simple packed-SIMD extensions might fit in unused 32-bit instruction opcodes, but more extensive packed-SIMD extensions will likely require a dedicated 30-bit instruction space.
Discussions at the 5th RISC-V workshop indicated a desire to drop this packed-SIMD proposal for floating-point registers in favor of standardizing on the V extension for large floating-point SIMD operations. However, there was interest in packed-SIMD fixed-point operations for use in the integer registers of small RISC-V implementations.