- Intro & References
- GCC Extended Assembler
- Instruction List
- rv32
- rv32 / conditional-branches
- rv32 / unconditional-jumps
- rv32 / programmers-model-for-base-integer-isa
- rv32 / integer-register-immediate-instructions
- rv32 / integer-computational-instructions
- rv32 / integer-register-register-operations
- rv32 / load-and-store-instructions
- rv32 / sec:fence
- rv32 / rv32
- rv32 / environment-call-and-breakpoints
- rv64
- rv128
- a
- c
- d
- f
- m
- n
- q
- v
- v / _vector_length_register_code_vl_code
- v / _vector_type_register_code_vtype_code
- v / _vector_unit_stride_instructions
- v / _examples
- v / _vector_strided_instructions
- v / _vector_indexed_instructions
- v / _unit_stride_fault_only_first_loads
- v / _vector_single_width_floating_point_add_subtract_instructions
- v / _vector_floating_point_min_max_instructions
- v / _vector_floating_point_sign_injection_instructions
- v / _vector_floating_point_move_instruction
- v / _vector_floating_point_merge_instruction
- v / _introduction
- v / _vector_single_width_floating_point_multiply_divide_instructions
- v / _vector_single_width_floating_point_fused_multiply_add_instructions
- v / _vector_widening_floating_point_add_subtract_instructions
- v / _vector_widening_floating_point_multiply
- v / _vector_widening_floating_point_fused_multiply_add_instructions
- v / _vector_single_width_floating_point_reduction_instructions
- v / _vector_widening_floating_point_reduction_instructions
- v / _vector_floating_point_dot_product_instruction
- v / _vector_single_width_integer_add_and_subtract
- v / _vector_integer_min_max_instructions
- v / _vector_bitwise_logical_instructions
- v / _vector_register_gather_instruction
- v / _vector_slide_instructions
- v / _vector_slidedown_instructions
- v / _vector_integer_add_with_carry_subtract_with_borrow_instructions
- v / _vector_integer_merge_instructions
- v / _vector_integer_move_instructions
- v / _vector_single_width_saturating_add_and_subtract
- v / _vector_single_width_averaging_add_and_subtract
- v / _vector_single_width_bit_shift_instructions
- v / _vector_single_width_fractional_multiply_with_rounding_and_saturation
- v / _vector_single_width_scaling_shift_instructions
- v / _vector_narrowing_integer_right_shift_instructions
- v / sec-narrowing
- v / _vector_narrowing_fixed_point_clip_instructions
- v / _vector_widening_integer_reduction_instructions
- v / _vector_integer_dot_product_instruction
- v / _vector_single_width_integer_reduction_instructions
- v / _vector_compress_instruction
- v / _vector_integer_comparison_instructions
- v / _vector_floating_point_compare_instructions
- v / sec-mask-register-logical
- v / __code_vfirst_code_find_first_set_mask_bit
- v / __code_vmsif_m_code_set_including_first_mask_bit
- v / __code_vmsbf_m_code_set_before_first_mask_bit
- v / _vector_iota_instruction
- v / _vector_element_index_instruction
- v / _vector_integer_divide_instructions
- v / _vector_single_width_integer_multiply_instructions
- v / _vector_single_width_integer_multiply_add_instructions
- v / _vector_widening_integer_add_subtract
- v / _vector_widening_integer_multiply_instructions
- v / _vector_widening_integer_multiply_add_instructions
- v / _vector_slide1up
- v / _vector_slide1down_instruction
- custom
- csr
- supervisor
- hypervisor
Intro & References
For information on assembler programming:
- RISC-V Assembly Programmer’s Manual
- Pseudo Opcodes (Not covered below)
- RISC-V Instruction Table
- RISC-V Opcode Map
- GNU Assembler, RISC-V Section
- GNU Linker
Some good cheat sheets.
- RISC-V Instruction-Set Cheatsheet, from Erik Engheim. PDF Version
- RISC-V-QuickRefCard-v042.pdf, “basic assembly programmer’s quick reference card” from Dylan McNamee.
- A old but nicely formatted “green card” summary of the ISA: RISCVGreenCardv8-20151013.pdf
GCC Extended Assembler
GCC gives direct access to instructions via __asm__
. e.g.
- No argument instructions:
__asm__ volatile ("nop"); __asm__ volatile ("wfi");
- With register arguments:
__asm__ volatile ("csrrw %0, mie, %1" /* read and write atomically */ : "=r" (ret) /* output: register %0 */ : "r" (value) /* input: register %1 */ : /* clobbers: none */);
Opcodes are listed in machine readable format here
Instruction List
rv32 | rv64 | rv128 | a | c | d | f | m | n | q | v | custom | csr | supervisor | hypervisor |
rv32
rv32 / conditional-branches
RV32I Base Integer Instruction Set, Version 2.1 / Control Transfer Instructions
Operation | Arguments | Description |
beq | rs1, rs2, bimm12 |
BEQ and BNE take the branch if registers rs1 and rs2 are equal or unequal respectively
|
blt | rs1, rs2, bimm12 |
BLT and BLTU take the branch if rs1 is less than rs2 , using signed and unsigned comparison respectively
Note, BGT, BGTU, BLE, and BLEU can be synthesized by reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.
|
bge | rs1, rs2, bimm12 |
BGE and BGEU take the branch if rs1 is greater than or equal to rs2 , using signed and unsigned comparison respectively
|
bltu | rs1, rs2, bimm12 |
Signed array bounds may be checked with a single BLTU instruction, since any negative index will compare greater than any nonnegative bound.
|
rv32 / unconditional-jumps
RV32I Base Integer Instruction Set, Version 2.1 / Control Transfer Instructions
Operation | Arguments | Description |
jalr | rd, rs1, imm12 |
The indirect jump instruction JALR (jump and link register) uses the I-type encoding
The JALR instruction was defined to enable a two-instruction sequence to jump anywhere in a 32-bit absolute address range
Note that the JALR instruction does not treat the 12-bit immediate as multiples of 2 bytes, unlike the conditional branch instructions
In practice, most uses of JALR will have either a zero immediate or be paired with a LUI or AUIPC, so the slight reduction in range is not significant.
Clearing the least-significant bit when calculating the JALR target address both simplifies the hardware slightly and allows the low bit of function pointers to be used to store auxiliary information
When used with a base rs1 = x0 , JALR can be used to implement a single instruction subroutine call to the lowest
JALR instructions should push/pop a RAS as shown in the TableÂ
|
rv32 / programmers-model-for-base-integer-isa
RV32I Base Integer Instruction Set, Version 2.1 / Programmersâ Model for Base Integer ISA
Operation | Arguments | Description |
jal | rd, jimm20 |
See the descriptions of the JAL and JALR instructions.
|
rv32 / integer-register-immediate-instructions
RV32I Base Integer Instruction Set, Version 2.1 / Integer Computational Instructions
Operation | Arguments | Description |
lui | rd, imm20 |
LUI (load upper immediate) is used to build 32-bit constants and uses the U-type format
LUI places the U-immediate value in the top 20 bits of the destination register rd , filling in the lowest 12 bits with zeros.
|
auipc | rd, imm20 |
AUIPC (add upper immediate to pc ) is used to build pc -relative addresses and uses the U-type format
AUIPC forms a 32-bit offset from the 20-bit U-immediate, filling in the lowest 12 bits with zeros, adds this offset to the address of the AUIPC instruction, then places the result in register rd .
The AUIPC instruction supports two-instruction sequences to access arbitrary offsets from the PC for both control-flow transfers and data accesses
The combination of an AUIPC and the 12-bit immediate in a JALR can transfer control to any 32-bit PC-relative address, while an AUIPC plus the 12-bit immediate offset in regular load or store instructions can access any 32-bit PC-relative data address.
|
addi | rd, rs1, imm12 |
ADDI adds the sign-extended 12-bit immediate to register rs1
ADDI rd, rs1, 0 is used to implement the MV rd, rs1 assembler pseudoinstruction.
|
slli | rd, rs1 |
The right shift type is encoded in bit 30. SLLI is a logical left shift (zeros are shifted into the lower bits); SRLI is a logical right shift (zeros are shifted into the upper bits); and SRAI is an arithmetic right shift (the original sign bit is copied into the vacated upper bits).
|
slti | rd, rs1, imm12 |
SLTI (set less than immediate) places the value 1 in register rd rs1 is less than the sign-extended immediate when both are treated as signed numbers, else 0 is written to rd
|
sltiu | rd, rs1, imm12 |
SLTIU is similar but compares the values as unsigned numbers (i.e., the immediate is first sign-extended to XLEN bits then treated as an unsigned number)
Note, SLTIU rd, rs1, 1 sets rd rs1 equals zero, otherwise sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs ).
|
xori | rd, rs1, imm12 |
Note, XORI rd, rs1, -1 rs1 (assembler pseudoinstruction NOT rd, rs ).
|
andi | rd, rs1, imm12 |
ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and XOR on register rs1 and the sign-extended 12-bit immediate and place the result in rd
|
rv32 / integer-computational-instructions
RV32I Base Integer Instruction Set, Version 2.1 / Integer Computational Instructions
Operation | Arguments | Description |
add | rd, rs1, rs2 |
add t0, t1, t2 slti t3, t2, 0 slt t4, t0, t1 bne t3, t4, overflow In RV64I, checks of 32-bit signed additions can be optimized further by comparing the results of ADD and ADDW on the operands.
|
rv32 / integer-register-register-operations
RV32I Base Integer Instruction Set, Version 2.1 / Integer Computational Instructions
Operation | Arguments | Description |
sub | rd, rs1, rs2 |
SUB performs the subtraction of rs2 from rs1
|
sll | rd, rs1, rs2 |
SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in register rs1 by the shift amount held in the lower 5 bits of register rs2 .
|
slt | rd, rs1, rs2 |
SLT and SLTU perform signed and unsigned compares respectively, writing 1 to rd if
|
sltu | rd, rs1, rs2 |
Note, SLTU rd , x0 , rs2 sets rd to 1 if rs2 is not equal to zero, otherwise sets rd to zero (assembler pseudoinstruction SNEZ rd, rs )
|
and | rd, rs1, rs2 |
AND, OR, and XOR perform bitwise logical operations.
|
rv32 / load-and-store-instructions
RV32I Base Integer Instruction Set, Version 2.1 / Load and Store Instructions
Operation | Arguments | Description |
lb | rd, rs1, imm12 |
LB and LBU are defined analogously for 8-bit values
|
lh | rd, rs1, imm12 |
LH loads a 16-bit value from memory, then sign-extends to 32-bits before storing in rd
|
lw | rd, rs1, imm12 |
The LW instruction loads a 32-bit value from memory into rd
|
lhu | rd, rs1, imm12 |
LHU loads a 16-bit value from memory but then zero extends to 32-bits before storing in rd
|
sw | imm12hi, rs1, rs2, imm12lo |
The SW, SH, and SB instructions store 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory.
|
rv32 / sec:fence
RV32I Base Integer Instruction Set, Version 2.1 / Memory Ordering Instructions
Operation | Arguments | Description |
fence | rs1, rd |
The FENCE instruction is used to order device I/O and memory accesses as viewed by other RISC-V harts and external devices or coprocessors
Informally, no other RISC-V hart or external device can observe any operation in the successor set following a FENCE before any operation in the predecessor set preceding the FENCE
Instruction-set extensions might also describe new I/O instructions that will also be ordered using the I and O bits in a FENCE.
The fence mode field fm defines the semantics of the FENCE
A FENCE with fm =0000 orders all memory operations in its predecessor set before all memory operations in its successor set.
The optional FENCE.TSO instruction is encoded as a FENCE instruction with fm =1000, predecessor =RW, and successor =RW
FENCE.TSO orders all load operations in its predecessor set before all memory operations in its successor set, and all store operations in its predecessor set before all store operations in its successor set
This leaves non-AMO store operations in the FENCE.TSOâs predecessor set unordered with non-AMO loads in its successor set.
The FENCE.TSO encoding was added as an optional extension to the original base FENCE instruction encoding
The base definition requires that implementations ignore any set bits and treat the FENCE as global, and so this is a backwards-compatible extension.
The unused fields in the FENCE instructionsâ rs1 and rd âare reserved for finer-grain fences in future extensions
|
rv32 / rv32
RV32I Base Integer Instruction Set, Version 2.1 /
Operation | Arguments | Description |
ecall |
RV32I contains 40 unique instructions, though a simple implementation might cover the ECALL/EBREAK instructions with a single SYSTEM hardware instruction that always traps and might be able to implement the FENCE instruction as a NOP, reducing base instruction count to 38 total
|
rv32 / environment-call-and-breakpoints
RV32I Base Integer Instruction Set, Version 2.1 / Environment Call and Breakpoints
Operation | Arguments | Description |
ebreak |
The EBREAK instruction is used to return control to a debugging environment.
EBREAK was primarily designed to be used by a debugger to cause execution to stop and fall back into the debugger
EBREAK is also used by the standard gcc compiler to mark code paths that should not be executed.
Another use of EBREAK is to support âsemihostingâ, where the execution environment includes a debugger that can provide services over an alternate system call interface built around the EBREAK instruction
Because the RISC-V base ISA does not provide more than one EBREAK instruction, RISC-V semihosting uses a special sequence of instructions to distinguish a semihosting EBREAK from a debugger inserted EBREAK
|
rv64
integer register immediate instructions | integer register register operations | load and store instructions |
rv64 / integer-register-immediate-instructions
RV64I Base Integer Instruction Set, Version 2.1 / Integer Computational Instructions
Operation | Arguments | Description |
addiw | rd, rs1, imm12 |
ADDIW is an RV64I instruction that adds the sign-extended 12-bit immediate to register rs1 and produces the proper sign-extension of a 32-bit result in rd
Note, ADDIW rd, rs1, 0 writes the sign-extension of the lower 32 bits of register rs1 into register rd (assembler pseudoinstruction SEXT.W).
|
slliw | rd, rs1 |
SLLIW, SRLIW, and SRAIW are RV64I-only instructions that are analogously defined but operate on 32-bit values and produce signed 32-bit results
SLLIW, SRLIW, and SRAIW encodings with i m m [5]ââ â0
Previously, SLLIW, SRLIW, and SRAIW with i m m [5]ââ â0
|
rv64 / integer-register-register-operations
RV64I Base Integer Instruction Set, Version 2.1 / Integer Computational Instructions
Operation | Arguments | Description |
addw | rd, rs1, rs2 |
ADDW and SUBW are RV64I-only instructions that are defined analogously to ADD and SUB but operate on 32-bit values and produce signed 32-bit results
|
sllw | rd, rs1, rs2 |
SLLW, SRLW, and SRAW are RV64I-only instructions that are analogously defined but operate on 32-bit values and produce signed 32-bit results
|
rv64 / load-and-store-instructions
RV64I Base Integer Instruction Set, Version 2.1 / Load and Store Instructions
Operation | Arguments | Description |
ld | rd, rs1, imm12 |
The LD instruction loads a 64-bit value from memory into register rd for RV64I.
|
lwu | rd, rs1, imm12 |
The LWU instruction, on the other hand, zero-extends the 32-bit value from memory for RV64I
|
sd | imm12hi, rs1, rs2, imm12lo |
The SD, SW, SH, and SB instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the low bits of register rs2 to memory respectively.
|
rv128
rv128 / rv128
RV128I Base Integer Instruction Set, Version 1.7 /
Operation | Arguments | Description |
fmv.x.q | rd, rs1 |
The floating-point instruction set is unchanged, although the 128-bit Q floating-point extension can now support FMV.X.Q and FMV.Q.X instructions, together with additional FCVT instructions to and from the T (128-bit) integer format.
|
a
sec:amo | sec:lrsc |
a / sec:amo
“A” Standard Extension for Atomic Instructions, Version 2.1 / Atomic Memory Operations
Operation | Arguments | Description |
amoadd.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoxor.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoor.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoand.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amomin.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amomax.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amominu.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amomaxu.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoswap.w | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoadd.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoxor.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoor.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoand.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amomin.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amomax.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amominu.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amomaxu.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
amoswap.d | rd, rs1, rs2 |
These AMO instructions atomically load a data value from the address in rs1 , place the value into register rd , apply a binary operator to the loaded value and the original value in rs2 , then store the result back to the address in rs1
|
a / sec:lrsc
“A” Standard Extension for Atomic Instructions, Version 2.1 / Load-Reserved/Store-Conditional Instructions
Operation | Arguments | Description |
lr.w | rd, rs1 |
LR.W loads a word from the address in rs1 , places the sign-extended value in rd , and registers a reservation set âa set of bytes that subsumes the bytes in the addressed word
|
sc.w | rd, rs1, rs2 |
SC.W conditionally writes a word in rs2 to the address in rs1 : the SC.W succeeds only if the reservation is still valid and the reservation set contains the bytes being written
If the SC.W succeeds, the instruction writes the word in rs2 to memory, and it writes zero to rd
If the SC.W fails, the instruction does not write to memory, and it writes a nonzero value to rd
Regardless of success or failure, executing an SC.W instruction invalidates any reservation held by this hart
|
lr.d | rd, rs1 |
LR.D and SC.D act analogously on doublewords and are only available on RV64. For RV64, LR.W and SC.W sign-extend the value placed in rd .
|
c
c / integer-register-immediate-operations
“C” Standard Extension for Compressed Instructions, Version 2.0 / Integer Computational Instructions
Operation | Arguments | Description |
c.addi4spn |
In the standard RISC-V calling convention, the stack pointer sp C.ADDI4SPN is a CIW-format instruction that adds a zero -extended non-zero immediate, scaled by 4, to the stack pointer, x2 , and writes the result to rdâ'
C.ADDI4SPN is only valid when nzuimm â
|
|
c.addi |
C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in register rd then writes the result to rd
C.ADDI expands into addi rd, rd, nzimm[5:0]
C.ADDI is only valid when rd â x0 and nzimm â
|
|
c.srli |
C.SRLI is a CB-format instruction that performs a logical right shift of the value in register rdâ â²
For RV128C, a shift amount of zero is used to encode a shift of 64. Furthermore, the shift amount is sign-extended for RV128C, and so the legal shift amounts are 1â31, 64, and 96â127. C.SRLI expands into srli rdâ', rdâ', shamt[5:0] , except for RV128C with shamt=0 , which expands to srli rdâ', rdâ', 64 .
|
|
c.srai |
C.SRAI is defined analogously to C.SRLI, but instead performs an arithmetic right shift
C.SRAI expands to srai rdâ', rdâ', shamt[5:0] .
|
|
c.andi |
C.ANDI is a CB-format instruction that computes the bitwise AND of the value in register rdâ â²
. C.ANDI expands to andi rdâ', rdâ', imm[5:0] .
|
|
c.slli |
C.SLLI is a CI-format instruction that performs a logical left shift of the value in register rd then writes the result to rd
For RV128C, a shift amount of zero is used to encode a shift of 64. C.SLLI expands into slli rd, rd, shamt[5:0] , except for RV128C with shamt=0 , which expands to slli rd, rd, 64 .
|
c / register-based-loads-and-stores
“C” Standard Extension for Compressed Instructions, Version 2.0 / Load and Store Instructions
Operation | Arguments | Description |
c.fld |
C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rdâ â²
|
|
c.lw |
C.LW loads a 32-bit value from memory into register rdâ â²
|
|
c.flw |
C.FLW is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rdâ â²
|
|
c.fsd |
C.FSD is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2â â²
|
|
c.sw |
C.SW stores a 32-bit value in register rs2â â²
|
|
c.fsw |
C.FSW is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2â â²
|
c / control-transfer-instructions
“C” Standard Extension for Compressed Instructions, Version 2.0 / Control Transfer Instructions
Operation | Arguments | Description |
c.jal |
C.JAL is an RV32C-only instruction that performs the same operation as C.J, but additionally writes the address of the instruction following the jump ( pc +2) to the link register, x1
C.JAL expands to jal x1, offset[11:1] .
|
|
c.j |
C.J performs an unconditional control transfer
C.J can therefore target a
C.J expands to jal x0, offset[11:1] .
|
|
c.beqz |
C.BEQZ performs conditional control transfers
C.BEQZ takes the branch if the value in register rs1â â²
|
|
c.bnez |
C.BNEZ is defined analogously, but it takes the branch if rs1â â²
|
c / integer-constant-generation-instructions
“C” Standard Extension for Compressed Instructions, Version 2.0 / Integer Computational Instructions
Operation | Arguments | Description |
c.li |
C.LI loads the sign-extended 6-bit immediate, imm , into register rd
C.LI expands into addi rd, x0, imm[5:0]
C.LI is only valid when rd â x0 ; the code points with rd = x0 encode HINTs.
|
|
c.lui | rd=2 |
C.LUI loads the non-zero 6-bit immediate field into bits 17â12 of the destination register, clears the bottom 12 bits, and sign-extends bit 17 into all higher bits of the destination
C.LUI expands into lui rd, nzimm[17:12]
C.LUI is only valid when rd ââ â{ x0 , x2 }
|
c / integer-register-register-operations
“C” Standard Extension for Compressed Instructions, Version 2.0 / Integer Computational Instructions
Operation | Arguments | Description |
c.sub |
C.SUB subtracts the value in register rs2â â²
. C.SUB expands into sub rdâ', rdâ', rs2â' .
|
|
c.xor |
C.XOR computes the bitwise XOR of the values in registers rdâ â² rs2â â²
. C.XOR expands into xor rdâ', rdâ', rs2â' .
|
|
c.or |
C.OR computes the bitwise OR of the values in registers rdâ â² rs2â â²
. C.OR expands into or rdâ', rdâ', rs2â' .
|
|
c.and |
C.AND computes the bitwise AND of the values in registers rdâ â² rs2â â²
. C.AND expands into and rdâ', rdâ', rs2â' .
|
|
c.subw |
C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in register rs2â â²
. C.SUBW expands into subw rdâ', rdâ', rs2â' .
|
|
c.addw |
C.ADDW is an RV64C/RV128C-only instruction that adds the values in registers rdâ â²
. C.ADDW expands into addw rdâ', rdâ', rs2â' .
|
|
c.mv | !rs2 |
C.MV copies the value in register rs2 into register rd
C.MV expands into add rd, x0, rs2
C.MV is only valid when rs2 ââ â x0 ; the code points with rs2 â=â x0 correspond to the C.JR instruction
C.MV expands to a different instruction than the canonical MV pseudoinstruction, which instead uses ADDI
using register-renaming hardware, may find it more convenient to expand C.MV to MV instead of ADD, at slight additional hardware cost.
|
c.add | !rs1, !rs2=c.jalr |
C.ADD adds the values in registers rd and rs2 and writes the result to register rd
C.ADD expands into add rd, rd, rs2
C.ADD is only valid when rs2 ââ â x0 ; the code points with rs2 â=â x0 correspond to the C.JALR and C.EBREAK instructions
|
c / stack-pointer-based-loads-and-stores
“C” Standard Extension for Compressed Instructions, Version 2.0 / Load and Store Instructions
Operation | Arguments | Description |
c.fldsp |
C.FLDSP is an RV32DC/RV64DC-only instruction that loads a double-precision floating-point value from memory into floating-point register rd
|
|
c.lwsp |
C.LWSP loads a 32-bit value from memory into register rd
C.LWSP is only valid when rd ââ â x0 ; the code points with rd â=â x0 are reserved.
|
|
c.flwsp |
C.FLWSP is an RV32FC-only instruction that loads a single-precision floating-point value from memory into floating-point register rd
|
|
c.fsdsp |
C.FSDSP is an RV32DC/RV64DC-only instruction that stores a double-precision floating-point value in floating-point register rs2 to memory
|
|
c.swsp |
C.SWSP stores a 32-bit value in register rs2 to memory
|
|
c.fswsp |
C.FSWSP is an RV32FC-only instruction that stores a single-precision floating-point value in floating-point register rs2 to memory
|
c / compressed
“C” Standard Extension for Compressed Instructions, Version 2.0 /
Operation | Arguments | Description |
@c.nop | ||
@c.addi16sp | ||
@c.jr | ||
@c.jalr | ||
@c.ebreak | ||
@c.ld | ||
@c.sd | ||
@c.addiw | ||
@c.ldsp | ||
@c.sdsp | ||
@c.lq | ||
@c.sq | ||
@c.lqsp | ||
@c.sqsp |
d
d / sec:single-float-compute
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Computational Instructions
Operation | Arguments | Description |
fadd.d | rd, rs1, rs2 |
FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2
|
fsub.d | rd, rs1, rs2 |
FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1
|
fdiv.d | rd, rs1, rs2 |
FDIV.S performs the single-precision floating-point division of rs1 by rs2
|
fmin.d | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 rd
Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations
|
fsqrt.d | rd, rs1 |
FSQRT.S computes the square root of rs1
|
fmadd.d | rd, rs1, rs2, rs3 |
FMADD.S multiplies the values in rs1 and rs2 , adds the value in rs3 , and writes the final result to rd
FMADD.S computes (rs1 Ã rs2)+rs3 .
|
fmsub.d | rd, rs1, rs2, rs3 |
FMSUB.S multiplies the values in rs1 and rs2 , subtracts the value in rs3 , and writes the final result to rd
FMSUB.S computes (rs1 Ã rs2)-rs3 .
|
fnmsub.d | rd, rs1, rs2, rs3 |
FNMSUB.S multiplies the values in rs1 and rs2 , negates the product, adds the value in rs3 , and writes the final result to rd
FNMSUB.S computes -(rs1 Ã rs2)+rs3 .
|
fnmadd.d | rd, rs1, rs2, rs3 |
FNMADD.S multiplies the values in rs1 and rs2 , negates the product, subtracts the value in rs3 , and writes the final result to rd
FNMADD.S computes -(rs1 Ã rs2)-rs3 .
|
d / double-precision-floating-point-conversion-and-move-instructions
“D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / Double-Precision Floating-Point Conversion and Move Instructions
Operation | Arguments | Description |
fsgnj.d | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.D, FSGNJN.D, and FSGNJX.D are defined analogously to the single-precision sign-injection instruction.
|
fcvt.s.d | rd, rs1 |
The double-precision to single-precision and single-precision to double-precision conversion instructions, FCVT.S.D and FCVT.D.S, are encoded in the OP-FP major opcode space and both the source and destination are floating-point registers
FCVT.S.D rounds according to the RM field; FCVT.D.S will never round.
|
fcvt.w.d | rd, rs1 |
FCVT.W.D or FCVT.L.D converts a double-precision floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd
|
fcvt.wu.d | rd, rs1 |
FCVT.WU.D, FCVT.LU.D, FCVT.D.WU, and FCVT.D.LU variants convert to or from unsigned integer values
|
fmv.x.d | rd, rs1 |
FMV.X.D moves the double-precision value in floating-point register rs1 to a representation in IEEE 754-2008 standard encoding in integer register rd
FMV.X.D and FMV.D.X do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
|
fcvt.d.w | rd, rs1 |
FCVT.D.W or FCVT.D.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a double-precision floating-point number in floating-point register rd
Note FCVT.D.W[U] always produces an exact result and is unaffected by rounding mode.
|
fcvt.d.l | rd, rs1 |
FCVT.L[U].D and FCVT.D.L[U] are RV64-only instructions
|
fmv.d.x | rd, rs1 |
FMV.D.X moves the double-precision value encoded in IEEE 754-2008 standard encoding from the integer register rs1 to the floating-point register rd .
|
d / single-precision-floating-point-compare-instructions
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Compare Instructions
Operation | Arguments | Description |
flt.d | rd, rs1, rs2 |
FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN
|
feq.d | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (
FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN
|
d / double-precision-floating-point-classify-instruction
“D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / Double-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.d | rd, rs1 |
The double-precision floating-point classify instruction, FCLASS.D, is defined analogously to its single-precision counterpart, but operates on double-precision operands.
|
d / fld_fsd
“D” Standard Extension for Double-Precision Floating-Point, Version 2.2 / Double-Precision Load and Store Instructions
Operation | Arguments | Description |
fld | rd, rs1, imm12 |
The FLD instruction loads a double-precision floating-point value from memory into floating-point register rd
FLD and FSD are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN
FLD and FSD do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
|
fsd | imm12hi, rs1, rs2, imm12lo |
FSD stores a double-precision value from the floating-point registers to memory
|
f
f / single-precision-floating-point-conversion-and-move-instructions
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 /
Operation | Arguments | Description |
xor | rd, rs1, rs2 |
For FSGNJ, the resultâs sign bit is rs2 âs sign bit; for FSGNJN, the resultâs sign bit is the opposite of rs2 âs sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2
|
fsgnj.s | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1
Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry ); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry ); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry ).
|
fcvt.w.s | rd, rs1 |
FCVT.W.S or FCVT.L.S converts a floating-point number in floating-point register rs1 to a signed 32-bit or 64-bit integer, respectively, in integer register rd
FCVT.W.S
|
fcvt.wu.s | rd, rs1 |
FCVT.WU.S, FCVT.LU.S, FCVT.S.WU, and FCVT.S.LU variants convert to or from unsigned integer values
FCVT.WU.S
|
fcvt.l.s | rd, rs1 |
FCVT.L.S
|
fcvt.lu.s | rd, rs1 |
FCVT.LU.S
|
fmv.x.w | rd, rs1 |
FMV.X.W moves the single-precision value in floating-point register rs1 rd
|
fcvt.s.w | rd, rs1 |
FCVT.S.W or FCVT.S.L converts a 32-bit or 64-bit signed integer, respectively, in integer register rs1 into a floating-point number in floating-point register rd
A floating-point register can be initialized to floating-point positive zero using FCVT.S.W rd , x0 , which will never set any exception flags.
|
fcvt.s.l | rd, rs1 |
FCVT.L[U].S and FCVT.S.L[U] are RV64-only instructions
|
fmv.w.x | rd, rs1 |
FMV.W.X moves the single-precision value encoded in IEEE 754-2008 standard encoding from the lower 32 bits of integer register rs1 to the floating-point register rd
The FMV.W.X and FMV.X.W instructions were previously called FMV.S.X and FMV.X.S
|
f / sec:single-float-compute
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Computational Instructions
Operation | Arguments | Description |
fadd.s | rd, rs1, rs2 |
FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2
|
fsub.s | rd, rs1, rs2 |
FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1
|
fdiv.s | rd, rs1, rs2 |
FDIV.S performs the single-precision floating-point division of rs1 by rs2
|
fmin.s | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 rd
Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations
|
fsqrt.s | rd, rs1 |
FSQRT.S computes the square root of rs1
|
fmadd.s | rd, rs1, rs2, rs3 |
FMADD.S multiplies the values in rs1 and rs2 , adds the value in rs3 , and writes the final result to rd
FMADD.S computes (rs1 Ã rs2)+rs3 .
|
fmsub.s | rd, rs1, rs2, rs3 |
FMSUB.S multiplies the values in rs1 and rs2 , subtracts the value in rs3 , and writes the final result to rd
FMSUB.S computes (rs1 Ã rs2)-rs3 .
|
fnmsub.s | rd, rs1, rs2, rs3 |
FNMSUB.S multiplies the values in rs1 and rs2 , negates the product, adds the value in rs3 , and writes the final result to rd
FNMSUB.S computes -(rs1 Ã rs2)+rs3 .
|
fnmadd.s | rd, rs1, rs2, rs3 |
FNMADD.S multiplies the values in rs1 and rs2 , negates the product, subtracts the value in rs3 , and writes the final result to rd
FNMADD.S computes -(rs1 Ã rs2)-rs3 .
|
f / single-precision-floating-point-compare-instructions
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Compare Instructions
Operation | Arguments | Description |
flt.s | rd, rs1, rs2 |
FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN
|
feq.s | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (
FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN
|
f / single-precision-floating-point-classify-instruction
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.s | rd, rs1 |
The FCLASS.S instruction examines the value in floating-point register rs1 and writes to integer register rd a 10-bit mask that indicates the class of the floating-point number
FCLASS.S does not set the floating-point exception flags
|
f / single-precision-load-and-store-instructions
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Load and Store Instructions
Operation | Arguments | Description |
flw | rd, rs1, imm12 |
The FLW instruction loads a single-precision floating-point value from memory into floating-point register rd
FLW and FSW are only guaranteed to execute atomically if the effective address is naturally aligned.
FLW and FSW do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
|
fsw | imm12hi, rs1, rs2, imm12lo |
FSW stores a single-precision value from floating-point register rs2 to memory.
|
m
multiplication operations | division operations |
m / multiplication-operations
“M” Standard Extension for Integer Multiplication and Division, Version 2.0 / Multiplication Operations
Operation | Arguments | Description |
mul | rd, rs1, rs2 |
MUL performs an XLEN-bit
In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit product, but signed arguments must be proper 32-bit signed values, whereas unsigned arguments must have their upper 32 bits clear
|
mulh | rd, rs1, rs2 |
MULH, MULHU, and MULHSU perform the same multiplication but return the upper XLEN bits of the full 2
If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2 ; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2 )
If the arguments are not known to be sign- or zero-extended, an alternative is to shift both arguments left by 32 bits, then use MULH[[S]U].
|
mulhsu | rd, rs1, rs2 |
MULHSU is used in multi-word signed multiplication to multiply the most-significant word of the multiplicand (which contains the sign bit) with the less-significant words of the multiplier (which are unsigned).
|
mulw | rd, rs1, rs2 |
MULW is an RV64 instruction that multiplies the lower 32 bits of the source registers, placing the sign-extension of the lower 32 bits of the result into the destination register.
|
m / division-operations
“M” Standard Extension for Integer Multiplication and Division, Version 2.0 / Division Operations
Operation | Arguments | Description |
div | rd, rs1, rs2 |
DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned integer division of rs1 by rs2 , rounding towards zero
If both the quotient and remainder are required from the same division, the recommended code sequence is: DIV[U] rdq, rs1, rs2 ; REM[U] rdr, rs1, rs2 ( rdq rs1 or rs2 )
DIV[W]
|
divu | rd, rs1, rs2 |
DIVU[W]
|
rem | rd, rs1, rs2 |
REM and REMU provide the remainder of the corresponding division operation
For REM, the sign of the result equals the sign of the dividend.
REM[W]
|
remu | rd, rs1, rs2 |
REMU[W]
|
divw | rd, rs1, rs2 |
DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of rs1 by the lower 32 bits of rs2 , treating them as signed and unsigned integers respectively, placing the 32-bit quotient in rd , sign-extended to 64 bits
|
remw | rd, rs1, rs2 |
REMW and REMUW are RV64 instructions that provide the corresponding signed and unsigned remainder operations respectively
Both REMW and REMUW always sign-extend the 32-bit result to 64 bits, including on a divide by zero.
|
n
n / user-status-register-ustatus
“N” Standard Extension for User-Level Interrupts, Version 1.1 / Additional CSRs
Operation | Arguments | Description |
uret |
A new instruction, URET, is used to return from traps in U-mode
URET copies UPIE into UIE, then sets UPIE, before copying uepc pc
|
q
q / sec:single-float-compute
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Computational Instructions
Operation | Arguments | Description |
fadd.q | rd, rs1, rs2 |
FADD.S and FMUL.S perform single-precision floating-point addition and multiplication respectively, between rs1 and rs2
|
fsub.q | rd, rs1, rs2 |
FSUB.S performs the single-precision floating-point subtraction of rs2 from rs1
|
fdiv.q | rd, rs1, rs2 |
FDIV.S performs the single-precision floating-point division of rs1 by rs2
|
fmin.q | rd, rs1, rs2 |
Floating-point minimum-number and maximum-number instructions FMIN.S and FMAX.S write, respectively, the smaller or larger of rs1 and rs2 rd
Note that in version 2.2 of the F extension, the FMIN.S and FMAX.S instructions were amended to implement the proposed IEEE 754-201x minimumNumber and maximumNumber operations, rather than the IEEE 754-2008 minNum and maxNum operations
|
fsqrt.q | rd, rs1 |
FSQRT.S computes the square root of rs1
|
fmadd.q | rd, rs1, rs2, rs3 |
FMADD.S multiplies the values in rs1 and rs2 , adds the value in rs3 , and writes the final result to rd
FMADD.S computes (rs1 Ã rs2)+rs3 .
|
fmsub.q | rd, rs1, rs2, rs3 |
FMSUB.S multiplies the values in rs1 and rs2 , subtracts the value in rs3 , and writes the final result to rd
FMSUB.S computes (rs1 Ã rs2)-rs3 .
|
fnmsub.q | rd, rs1, rs2, rs3 |
FNMSUB.S multiplies the values in rs1 and rs2 , negates the product, adds the value in rs3 , and writes the final result to rd
FNMSUB.S computes -(rs1 Ã rs2)+rs3 .
|
fnmadd.q | rd, rs1, rs2, rs3 |
FNMADD.S multiplies the values in rs1 and rs2 , negates the product, subtracts the value in rs3 , and writes the final result to rd
FNMADD.S computes -(rs1 Ã rs2)-rs3 .
|
q / quad-precision-convert-and-move-instructions
“Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / Quad-Precision Convert and Move Instructions
Operation | Arguments | Description |
fsgnj.q | rd, rs1, rs2 |
Floating-point to floating-point sign-injection instructions, FSGNJ.Q, FSGNJN.Q, and FSGNJX.Q are defined analogously to the double-precision sign-injection instruction.
|
fcvt.s.q | rd, rs1 |
FCVT.S.Q or FCVT.Q.S converts a quad-precision floating-point number to a single-precision floating-point number, or vice-versa, respectively
|
fcvt.d.q | rd, rs1 |
FCVT.D.Q or FCVT.Q.D converts a quad-precision floating-point number to a double-precision floating-point number, or vice-versa, respectively.
|
fcvt.w.q | rd, rs1 |
FCVT.W.Q or FCVT.L.Q converts a quad-precision floating-point number to a signed 32-bit or 64-bit integer, respectively
|
fcvt.wu.q | rd, rs1 |
FCVT.WU.Q, FCVT.LU.Q, FCVT.Q.WU, and FCVT.Q.LU variants convert to or from unsigned integer values
|
fcvt.q.w | rd, rs1 |
FCVT.Q.W or FCVT.Q.L converts a 32-bit or 64-bit signed integer, respectively, into a quad-precision floating-point number
|
fcvt.q.l | rd, rs1 |
FCVT.L[U].Q and FCVT.Q.L[U] are RV64-only instructions.
|
q / single-precision-floating-point-compare-instructions
“F” Standard Extension for Single-Precision Floating-Point, Version 2.2 / Single-Precision Floating-Point Compare Instructions
Operation | Arguments | Description |
flt.q | rd, rs1, rs2 |
FLT.S and FLE.S perform what the IEEE 754-2008 standard refers to as signaling comparisons: that is, they set the invalid operation exception flag if either input is NaN
|
feq.q | rd, rs1, rs2 |
Floating-point compare instructions (FEQ.S, FLT.S, FLE.S) perform the specified comparison between floating-point registers (
FEQ.S performs a quiet comparison: it only sets the invalid operation exception flag if either input is a signaling NaN
|
q / quad-precision-floating-point-classify-instruction
“Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / Quad-Precision Floating-Point Classify Instruction
Operation | Arguments | Description |
fclass.q | rd, rs1 |
The quad-precision floating-point classify instruction, FCLASS.Q, is defined analogously to its double-precision counterpart, but operates on quad-precision operands.
|
q / quad-precision-load-and-store-instructions
“Q” Standard Extension for Quad-Precision Floating-Point, Version 2.2 / Quad-Precision Load and Store Instructions
Operation | Arguments | Description |
flq | rd, rs1, imm12 |
FLQ and FSQ are only guaranteed to execute atomically if the effective address is naturally aligned and XLEN=128.
FLQ and FSQ do not modify the bits being transferred; in particular, the payloads of non-canonical NaNs are preserved.
|
v
v / _vector_length_register_code_vl_code
- Vector Extension Programmer’s Model / 3.4. Vector Length Register
Operation | Arguments | Description |
vsetvli | zimm11, rs1, rd |
vl The XLEN -bit-wide read-only vl CSR can only be updated by the vsetvli and vsetvl instructions, and the fault-only-first vector load instruction variants.
|
v / _vector_type_register_code_vtype_code
- Vector Extension Programmer’s Model / 3.3. Vector type register,
Operation | Arguments | Description |
vsetvl | rs2, rs1, rd |
vtype The read-only XLEN-wide vector type CSR, vtype provides the default type used to interpret the contents of the vector register file, and can only be updated by vsetvl{i} instructions
Allowing updates only via the vsetvl{i} vtype register state
|
v / _vector_unit_stride_instructions
- Vector Loads and Stores / 7.4. Vector Unit-Stride Instructions
Operation | Arguments | Description |
vlb.v | rs1, vd |
vlb.v vd, (rs1), vm # 8b signed
|
vlw.v | rs1, vd |
vlw.v vd, (rs1), vm # 32b signed
|
vle.v | rs1, vd |
vle.v vd, (rs1), vm # SEW
|
vlbu.v | rs1, vd |
vlbu.v vd, (rs1), vm # 8b unsigned
|
vlhu.v | rs1, vd |
vlhu.v vd, (rs1), vm # 16b unsigned
|
vlwu.v | rs1, vd |
vlwu.v vd, (rs1), vm # 32b unsigned
|
vsb.v | rs1, vs3 |
vsb.v vs3, (rs1), vm # 8b store
|
vsh.v | rs1, vs3 |
vsh.v vs3, (rs1), vm # 16b store
|
vse.v | rs1, vs3 |
vse.v vs3, (rs1), vm # SEW store
|
v / _examples
- Configuration-Setting Instructions / 6.4. Examples
Operation | Arguments | Description |
vlh.v | rs1, vd |
vlh.v v8, (a1) # Sign-extend 16b load values to 32b elements
vlh.v v4, (a1) # Get 16b vector
|
vsw.v | rs1, vs3 |
vsw.v v8, (a2) # Store vector of 32b results
vsw.v v8, (a2) # Store vector of 32b
|
vsrl.vx | vs2, rs1, vd |
vsrl.vi v8, v8, 3 # Shift elements
vsrl.vi v8, v8, 3
|
vsrl.vv | vs2, rs1, vd |
vsrl.vi v8, v8, 3 # Shift elements
vsrl.vi v8, v8, 3
|
vsrl.vi | vs2, simm5, vd |
vsrl.vi v8, v8, 3 # Shift elements
vsrl.vi v8, v8, 3
|
vmul.vv | vs2, vs1, vd |
vmul.vx v8, v8, x10 # 32b multiply result
|
vwmul.vv | vs2, vs1, vd |
vwmul.vx v8, v4, x10 # 32b in <v8--v15>
|
vmul.vx | vs2, rs1, vd |
vmul.vx v8, v8, x10 # 32b multiply result
|
vwmul.vx | vs2, rs1, vd |
vwmul.vx v8, v4, x10 # 32b in <v8--v15>
|
v / _vector_strided_instructions
- Vector Loads and Stores / 7.5. Vector Strided Instructions
Operation | Arguments | Description |
vlsb.v | rs2, rs1, vd |
vlsb.v vd, (rs1), rs2, vm # 8b
|
vlsh.v | rs2, rs1, vd |
vlsh.v vd, (rs1), rs2, vm # 16b
|
vlsw.v | rs2, rs1, vd |
vlsw.v vd, (rs1), rs2, vm # 32b
|
vlse.v | rs2, rs1, vd |
vlse.v vd, (rs1), rs2, vm # SEW
|
vlsbu.v | rs2, rs1, vd |
vlsbu.v vd, (rs1), rs2, vm # unsigned 8b
|
vlshu.v | rs2, rs1, vd |
vlshu.v vd, (rs1), rs2, vm # unsigned 16b
|
vlswu.v | rs2, rs1, vd |
vlswu.v vd, (rs1), rs2, vm # unsigned 32b
|
vssb.v | rs2, rs1, vs3 |
vssb.v vs3, (rs1), rs2, vm # 8b
|
vssh.v | rs2, rs1, vs3 |
vssh.v vs3, (rs1), rs2, vm # 16b
|
vssw.v | rs2, rs1, vs3 |
vssw.v vs3, (rs1), rs2, vm # 32b
|
vsse.v | rs2, rs1, vs3 |
vsse.v vs3, (rs1), rs2, vm # SEW
|
v / _vector_indexed_instructions
- Vector Loads and Stores / 7.6. Vector Indexed Instructions
Operation | Arguments | Description |
vlxb.v | vs2, rs1, vd |
vlxb.v vd, (rs1), vs2, vm # 8b
|
vlxh.v | vs2, rs1, vd |
vlxh.v vd, (rs1), vs2, vm # 16b
|
vlxw.v | vs2, rs1, vd |
vlxw.v vd, (rs1), vs2, vm # 32b
|
vlxe.v | vs2, rs1, vd |
vlxe.v vd, (rs1), vs2, vm # SEW
|
vlxbu.v | vs2, rs1, vd |
vlxbu.v vd, (rs1), vs2, vm # 8b unsigned
|
vlxhu.v | vs2, rs1, vd |
vlxhu.v vd, (rs1), vs2, vm # 16b unsigned
|
vlxwu.v | vs2, rs1, vd |
vlxwu.v vd, (rs1), vs2, vm # 32b unsigned
|
vsxb.v | vs2, rs1, vs3 |
vsxb.v vs3, (rs1), vs2, vm # 8b
|
vsxh.v | vs2, rs1, vs3 |
vsxh.v vs3, (rs1), vs2, vm # 16b
|
vsxw.v | vs2, rs1, vs3 |
vsxw.v vs3, (rs1), vs2, vm # 32b
|
vsxe.v | vs2, rs1, vs3 |
vsxe.v vs3, (rs1), vs2, vm # SEW
|
vsuxb.v | vs2, rs1, vs3 |
vsuxb.v vs3, (rs1), vs2, vm # 8b
|
vsuxh.v | vs2, rs1, vs3 |
vsuxh.v vs3, (rs1), vs2, vm # 16b
|
vsuxw.v | vs2, rs1, vs3 |
vsuxw.v vs3, (rs1), vs2, vm # 32b
|
vsuxe.v | vs2, rs1, vs3 |
vsuxe.v vs3, (rs1), vs2, vm # SEW
|
v / _unit_stride_fault_only_first_loads
- Vector Loads and Stores / 7.7. Unit-stride Fault-Only-First Loads
Operation | Arguments | Description |
vlbff.v | rs1, vd |
vlbff.v vd, (rs1), vm # 8b
vlbff.v v1, (a3) # Load bytes
|
vlhff.v | rs1, vd |
vlhff.v vd, (rs1), vm # 16b
|
vlwff.v | rs1, vd |
vlwff.v vd, (rs1), vm # 32b
|
vleff.v | rs1, vd |
vleff.v vd, (rs1), vm # SEW
|
vlbuff.v | rs1, vd |
vlbuff.v vd, (rs1), vm # unsigned 8b
|
vlhuff.v | rs1, vd |
vlhuff.v vd, (rs1), vm # unsigned 16b
|
vlwuff.v | rs1, vd |
vlwuff.v vd, (rs1), vm # unsigned 32b
|
v / _vector_single_width_floating_point_add_subtract_instructions
- Vector Floating-Point Instructions / 14.2. Vector Single-Width Floating-Point Add/Subtract Instructions
Operation | Arguments | Description |
vfadd.vf | vs2, rs1, vd |
vfadd.vv vd, vs2, vs1, vm # Vector-vector
vfadd.vf vd, vs2, rs1, vm # vector-scalar
|
vfsub.vf | vs2, rs1, vd |
vfsub.vv vd, vs2, vs1, vm # Vector-vector
vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1]
|
vfadd.vv | vs2, vs1, vd |
vfadd.vv vd, vs2, vs1, vm # Vector-vector
vfadd.vf vd, vs2, rs1, vm # vector-scalar
|
vfsub.vv | vs2, vs1, vd |
vfsub.vv vd, vs2, vs1, vm # Vector-vector
vfsub.vf vd, vs2, rs1, vm # Vector-scalar vd[i] = vs2[i] - f[rs1]
|
v / _vector_floating_point_min_max_instructions
- Vector Floating-Point Instructions / 14.9. Vector Floating-Point MIN/MAX Instructions
Operation | Arguments | Description |
vfmin.vf | vs2, rs1, vd |
The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension.
vfmin.vv vd, vs2, vs1, vm # Vector-vector
vfmin.vf vd, vs2, rs1, vm # vector-scalar
|
vfmax.vf | vs2, rs1, vd |
vfmax.vv vd, vs2, vs1, vm # Vector-vector
vfmax.vf vd, vs2, rs1, vm # vector-scalar
|
vfmin.vv | vs2, vs1, vd |
The vector floating-point vfmin and vfmax instructions have the same behavior as the corresponding scalar floating-point instructions in version 2.2 of the RISC-V F/D/Q extension.
vfmin.vv vd, vs2, vs1, vm # Vector-vector
vfmin.vf vd, vs2, rs1, vm # vector-scalar
|
vfmax.vv | vs2, vs1, vd |
vfmax.vv vd, vs2, vs1, vm # Vector-vector
vfmax.vf vd, vs2, rs1, vm # vector-scalar
|
v / _vector_floating_point_sign_injection_instructions
- Vector Floating-Point Instructions / 14.10. Vector Floating-Point Sign-Injection Instructions
Operation | Arguments | Description |
vfsgnj.vf | vs2, rs1, vd |
vfsgnj.vv vd, vs2, vs1, vm # Vector-vector
vfsgnj.vf vd, vs2, rs1, vm # vector-scalar
|
vfsgnjn.vf | vs2, rs1, vd |
vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector
vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar
|
vfsgnjx.vf | vs2, rs1, vd |
vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector
vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar
|
vfsgnj.vv | vs2, vs1, vd |
vfsgnj.vv vd, vs2, vs1, vm # Vector-vector
vfsgnj.vf vd, vs2, rs1, vm # vector-scalar
|
vfsgnjn.vv | vs2, vs1, vd |
vfsgnjn.vv vd, vs2, vs1, vm # Vector-vector
vfsgnjn.vf vd, vs2, rs1, vm # vector-scalar
|
vfsgnjx.vv | vs2, vs1, vd |
vfsgnjx.vv vd, vs2, vs1, vm # Vector-vector
vfsgnjx.vf vd, vs2, rs1, vm # vector-scalar
|
v / _vector_floating_point_move_instruction
- Vector Floating-Point Instructions / 14.14. Vector Floating-Point Move Instruction
Operation | Arguments | Description |
vfmv.s.f | rs1, vd |
Note vfmv.v.f instruction shares the encoding with the vfmerge.vfm vm=1 and vs2=v0
Note vfmv.v.f substitutes a canonical NaN for f[rs1] if the latter is not properly NaN-boxed
vfmv.v.f vd, rs1 # vd[i] = f[rs1]
|
vfmv.v.f | rs1, vd |
Note vfmv.v.f instruction shares the encoding with the vfmerge.vfm vm=1 and vs2=v0
Note vfmv.v.f substitutes a canonical NaN for f[rs1] if the latter is not properly NaN-boxed
vfmv.v.f vd, rs1 # vd[i] = f[rs1]
|
vfmv.f.s | vs2, rd |
Note vfmv.v.f instruction shares the encoding with the vfmerge.vfm vm=1 and vs2=v0
Note vfmv.v.f substitutes a canonical NaN for f[rs1] if the latter is not properly NaN-boxed
vfmv.v.f vd, rs1 # vd[i] = f[rs1]
|
v / _vector_floating_point_merge_instruction
- Vector Floating-Point Instructions / 14.13. Vector Floating-Point Merge Instruction
Operation | Arguments | Description |
vfmerge.vfm | vs2, rs1, vd |
The vfmerge.vfm instruction is always masked ( vm=0 )
Note vfmerge.vfm substitutes a canonical NaN for f[rs1] if the latter is not properly NaN-boxed
vfmerge.vfm vd, vs2, rs1, v0 # vd[i] = v0[i].LSB ? f[rs1] : vs2[i]
|
v / _introduction
- Introduction /
Operation | Arguments | Description |
vfeq.vf | vs2, rs1, vd | |
vfle.vf | vs2, rs1, vd | |
vford.vf | vs2, rs1, vd | |
vflt.vf | vs2, rs1, vd | |
vfne.vf | vs2, rs1, vd | |
vfgt.vf | vs2, rs1, vd | |
vfge.vf | vs2, rs1, vd | |
vfeq.vv | vs2, vs1, vd | |
vfle.vv | vs2, vs1, vd | |
vford.vv | vs2, vs1, vd | |
vflt.vv | vs2, vs1, vd | |
vfne.vv | vs2, vs1, vd | |
vfunary0.vv | vs2, vs1, vd | |
vfunary1.vv | vs2, vs1, vd | |
vseq.vx | vs2, rs1, vd | |
vsne.vx | vs2, rs1, vd | |
vsltu.vx | vs2, rs1, vd | |
vslt.vx | vs2, rs1, vd | |
vsleu.vx | vs2, rs1, vd | |
vsle.vx | vs2, rs1, vd | |
vsgtu.vx | vs2, rs1, vd | |
vsgt.vx | vs2, rs1, vd | |
vwsmaccu.vx | vs2, rs1, vd | |
vwsmacc.vx | vs2, rs1, vd | |
vwsmaccsu.vx | vs2, rs1, vd | |
vwsmaccus.vx | vs2, rs1, vd | |
vseq.vv | vs2, rs1, vd | |
vsne.vv | vs2, rs1, vd | |
vsltu.vv | vs2, rs1, vd | |
vslt.vv | vs2, rs1, vd | |
vsleu.vv | vs2, rs1, vd | |
vsle.vv | vs2, rs1, vd | |
vwsmaccu.vv | vs2, rs1, vd | |
vwsmacc.vv | vs2, rs1, vd | |
vwsmaccsu.vv | vs2, rs1, vd | |
vseq.vi | vs2, simm5, vd | |
vsne.vi | vs2, simm5, vd | |
vsleu.vi | vs2, simm5, vd | |
vsle.vi | vs2, simm5, vd | |
vsgtu.vi | vs2, simm5, vd | |
vsgt.vi | vs2, simm5, vd | |
vext.x.v | vs2, rs1, vd | |
vmpopc.m | vs2, vs1, rd | |
vmfirst.m | vs2, vs1, rd |
v / _vector_single_width_floating_point_multiply_divide_instructions
- Vector Floating-Point Instructions / 14.4. Vector Single-Width Floating-Point Multiply/Divide Instructions
Operation | Arguments | Description |
vfdiv.vf | vs2, rs1, vd |
vfdiv.vv vd, vs2, vs1, vm # Vector-vector
vfdiv.vf vd, vs2, rs1, vm # vector-scalar
|
vfrdiv.vf | vs2, rs1, vd |
vfrdiv.vf vd, vs2, rs1, vm # scalar-vector, vd[i] = f[rs1]/vs2[i]
|
vfmul.vf | vs2, rs1, vd |
vfmul.vv vd, vs2, vs1, vm # Vector-vector
vfmul.vf vd, vs2, rs1, vm # vector-scalar
|
vfdiv.vv | vs2, vs1, vd |
vfdiv.vv vd, vs2, vs1, vm # Vector-vector
vfdiv.vf vd, vs2, rs1, vm # vector-scalar
|
vfmul.vv | vs2, vs1, vd |
vfmul.vv vd, vs2, vs1, vm # Vector-vector
vfmul.vf vd, vs2, rs1, vm # vector-scalar
|
v / _vector_single_width_floating_point_fused_multiply_add_instructions
- Vector Floating-Point Instructions / 14.6. Vector Single-Width Floating-Point Fused Multiply-Add Instructions
Operation | Arguments | Description |
vfmadd.vf | vs2, rs1, vd |
vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i]
vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i]
|
vfnmadd.vf | vs2, rs1, vd |
vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i]
vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i]
|
vfmsub.vf | vs2, rs1, vd |
vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i]
vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i]
|
vfnmsub.vf | vs2, rs1, vd |
vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i]
|
vfmacc.vf | vs2, rs1, vd |
vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
|
vfnmacc.vf | vs2, rs1, vd |
vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i]
vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i]
|
vfmsac.vf | vs2, rs1, vd |
vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i]
vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i]
|
vfnmsac.vf | vs2, rs1, vd |
vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
|
vfmadd.vv | vs2, vs1, vd |
vfmadd.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) + vs2[i]
vfmadd.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) + vs2[i]
|
vfnmadd.vv | vs2, vs1, vd |
vfnmadd.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) - vs2[i]
vfnmadd.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) - vs2[i]
|
vfmsub.vv | vs2, vs1, vd |
vfmsub.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vd[i]) - vs2[i]
vfmsub.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vd[i]) - vs2[i]
|
vfnmsub.vv | vs2, vs1, vd |
vfnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
vfnmsub.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vd[i]) + vs2[i]
|
vfmacc.vv | vs2, vs1, vd |
vfmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vfmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
|
vfnmacc.vv | vs2, vs1, vd |
vfnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i]
vfnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i]
|
vfmsac.vv | vs2, vs1, vd |
vfmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i]
vfmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i]
|
vfnmsac.vv | vs2, vs1, vd |
vfnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
vfnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
|
v / _vector_widening_floating_point_add_subtract_instructions
- Vector Floating-Point Instructions / 14.3. Vector Widening Floating-Point Add/Subtract Instructions
Operation | Arguments | Description |
vfwadd.vf | vs2, rs1, vd |
vfwadd.vv vd, vs2, vs1, vm # vector-vector
vfwadd.vf vd, vs2, rs1, vm # vector-scalar
vfwadd.wv vd, vs2, vs1, vm # vector-vector
vfwadd.wf vd, vs2, rs1, vm # vector-scalar
|
vfwsub.vf | vs2, rs1, vd |
vfwsub.vv vd, vs2, vs1, vm # vector-vector
vfwsub.vf vd, vs2, rs1, vm # vector-scalar
vfwsub.wv vd, vs2, vs1, vm # vector-vector
vfwsub.wf vd, vs2, rs1, vm # vector-scalar
|
vfwadd.wf | vs2, rs1, vd |
vfwadd.vv vd, vs2, vs1, vm # vector-vector
vfwadd.vf vd, vs2, rs1, vm # vector-scalar
vfwadd.wv vd, vs2, vs1, vm # vector-vector
vfwadd.wf vd, vs2, rs1, vm # vector-scalar
|
vfwsub.wf | vs2, rs1, vd |
vfwsub.vv vd, vs2, vs1, vm # vector-vector
vfwsub.vf vd, vs2, rs1, vm # vector-scalar
vfwsub.wv vd, vs2, vs1, vm # vector-vector
vfwsub.wf vd, vs2, rs1, vm # vector-scalar
|
vfwadd.vv | vs2, vs1, vd |
vfwadd.vv vd, vs2, vs1, vm # vector-vector
vfwadd.vf vd, vs2, rs1, vm # vector-scalar
vfwadd.wv vd, vs2, vs1, vm # vector-vector
vfwadd.wf vd, vs2, rs1, vm # vector-scalar
|
vfwsub.vv | vs2, vs1, vd |
vfwsub.vv vd, vs2, vs1, vm # vector-vector
vfwsub.vf vd, vs2, rs1, vm # vector-scalar
vfwsub.wv vd, vs2, vs1, vm # vector-vector
vfwsub.wf vd, vs2, rs1, vm # vector-scalar
|
vfwadd.wv | vs2, vs1, vd |
vfwadd.vv vd, vs2, vs1, vm # vector-vector
vfwadd.vf vd, vs2, rs1, vm # vector-scalar
vfwadd.wv vd, vs2, vs1, vm # vector-vector
vfwadd.wf vd, vs2, rs1, vm # vector-scalar
|
vfwsub.wv | vs2, vs1, vd |
vfwsub.vv vd, vs2, vs1, vm # vector-vector
vfwsub.vf vd, vs2, rs1, vm # vector-scalar
vfwsub.wv vd, vs2, vs1, vm # vector-vector
vfwsub.wf vd, vs2, rs1, vm # vector-scalar
|
v / _vector_widening_floating_point_multiply
- Vector Floating-Point Instructions / 14.5. Vector Widening Floating-Point Multiply
Operation | Arguments | Description |
vfwmul.vf | vs2, rs1, vd |
vfwmul.vv vd, vs2, vs1, vm # vector-vector
vfwmul.vf vd, vs2, rs1, vm # vector-scalar
|
vfwmul.vv | vs2, vs1, vd |
vfwmul.vv vd, vs2, vs1, vm # vector-vector
vfwmul.vf vd, vs2, rs1, vm # vector-scalar
|
v / _vector_widening_floating_point_fused_multiply_add_instructions
- Vector Floating-Point Instructions / 14.7. Vector Widening Floating-Point Fused Multiply-Add Instructions
Operation | Arguments | Description |
vfwmacc.vf | vs2, rs1, vd |
vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
|
vfwnmacc.vf | vs2, rs1, vd |
vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i]
vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i]
|
vfwmsac.vf | vs2, rs1, vd |
vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i]
vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i]
|
vfwnmsac.vf | vs2, rs1, vd |
vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
|
vfwmacc.vv | vs2, vs1, vd |
vfwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vfwmacc.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) + vd[i]
|
vfwnmacc.vv | vs2, vs1, vd |
vfwnmacc.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) - vd[i]
vfwnmacc.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) - vd[i]
|
vfwmsac.vv | vs2, vs1, vd |
vfwmsac.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) - vd[i]
vfwmsac.vf vd, rs1, vs2, vm # vd[i] = +(f[rs1] * vs2[i]) - vd[i]
|
vfwnmsac.vv | vs2, vs1, vd |
vfwnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
vfwnmsac.vf vd, rs1, vs2, vm # vd[i] = -(f[rs1] * vs2[i]) + vd[i]
|
v / _vector_single_width_floating_point_reduction_instructions
- Vector Reduction Operations / 15.3. Vector Single-Width Floating-Point Reduction Instructions
Operation | Arguments | Description |
vfredsum.vs | vs2, vs1, vd |
vfredsum.vs vd, vs2, vs1, vm # Unordered sum
|
vfredosum.vs | vs2, vs1, vd |
vfredosum.vs vd, vs2, vs1, vm # Ordered sum
|
vfredmin.vs | vs2, vs1, vd |
vfredmin.vs vd, vs2, vs1, vm # Minimum value
|
vfredmax.vs | vs2, vs1, vd |
vfredmax.vs vd, vs2, vs1, vm # Maximum value
|
v / _vector_widening_floating_point_reduction_instructions
- Vector Reduction Operations / 15.4. Vector Widening Floating-Point Reduction Instructions
Operation | Arguments | Description |
vfwredsum.vs | vs2, vs1, vd |
vfwredsum.vs vd, vs2, vs1, vm # Unordered sum
|
vfwredosum.vs | vs2, vs1, vd |
vfwredosum.vs vd, vs2, vs1, vm # Ordered sum
|
v / _vector_floating_point_dot_product_instruction
- Divided Element Extension (‘Zvediv’) / 19.4. Vector Floating-Point Dot Product Instruction
Operation | Arguments | Description |
vfdot.vv | vs2, vs1, vd |
The floating-point dot-product reduction vfdot.vv performs an element-wise multiplication between the source sub-elements then accumulates the results into the destination vector element
vfdot.vv vd, vs2, vs1, vm # Vector-vector
vfdot.vv vd, vs2, vs1, vm # vd[i][31:0] += vs2[i][31:16] * vs1[i][31:16]
vfdot.vv v1, v2, v3 # v1[i][31:0] += v2[i][31:16]*v3[i][31:16] + v2[i][16:0]*v3[i][16:0]
vfdot.vv v1, v2, v3
|
v / _vector_single_width_integer_add_and_subtract
- Vector Integer Arithmetic Instructions / 12.1. Vector Single-Width Integer Add and Subtract
Operation | Arguments | Description |
vadd.vx | vs2, rs1, vd |
vadd.vv vd, vs2, vs1, vm # Vector-vector
vadd.vx vd, vs2, rs1, vm # vector-scalar
vadd.vi vd, vs2, imm, vm # vector-immediate
|
vsub.vx | vs2, rs1, vd |
vsub.vv vd, vs2, vs1, vm # Vector-vector
vsub.vx vd, vs2, rs1, vm # vector-scalar
|
vrsub.vx | vs2, rs1, vd |
vrsub.vx vd, vs2, rs1, vm # vd[i] = rs1 - vs2[i]
vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]
|
vadd.vv | vs2, rs1, vd |
vadd.vv vd, vs2, vs1, vm # Vector-vector
vadd.vx vd, vs2, rs1, vm # vector-scalar
vadd.vi vd, vs2, imm, vm # vector-immediate
|
vsub.vv | vs2, rs1, vd |
vsub.vv vd, vs2, vs1, vm # Vector-vector
vsub.vx vd, vs2, rs1, vm # vector-scalar
|
vadd.vi | vs2, simm5, vd |
vadd.vv vd, vs2, vs1, vm # Vector-vector
vadd.vx vd, vs2, rs1, vm # vector-scalar
vadd.vi vd, vs2, imm, vm # vector-immediate
|
vrsub.vi | vs2, simm5, vd |
vrsub.vx vd, vs2, rs1, vm # vd[i] = rs1 - vs2[i]
vrsub.vi vd, vs2, imm, vm # vd[i] = imm - vs2[i]
|
v / _vector_integer_min_max_instructions
- Vector Integer Arithmetic Instructions / 12.8. Vector Integer Min/Max Instructions
Operation | Arguments | Description |
vminu.vx | vs2, rs1, vd |
vminu.vv vd, vs2, vs1, vm # Vector-vector
vminu.vx vd, vs2, rs1, vm # vector-scalar
|
vmin.vx | vs2, rs1, vd |
vmin.vv vd, vs2, vs1, vm # Vector-vector
vmin.vx vd, vs2, rs1, vm # vector-scalar
|
vmaxu.vx | vs2, rs1, vd |
vmaxu.vv vd, vs2, vs1, vm # Vector-vector
vmaxu.vx vd, vs2, rs1, vm # vector-scalar
|
vmax.vx | vs2, rs1, vd |
vmax.vv vd, vs2, vs1, vm # Vector-vector
vmax.vx vd, vs2, rs1, vm # vector-scalar
|
vminu.vv | vs2, rs1, vd |
vminu.vv vd, vs2, vs1, vm # Vector-vector
vminu.vx vd, vs2, rs1, vm # vector-scalar
|
vmin.vv | vs2, rs1, vd |
vmin.vv vd, vs2, vs1, vm # Vector-vector
vmin.vx vd, vs2, rs1, vm # vector-scalar
|
vmaxu.vv | vs2, rs1, vd |
vmaxu.vv vd, vs2, vs1, vm # Vector-vector
vmaxu.vx vd, vs2, rs1, vm # vector-scalar
|
vmax.vv | vs2, rs1, vd |
vmax.vv vd, vs2, vs1, vm # Vector-vector
vmax.vx vd, vs2, rs1, vm # vector-scalar
|
v / _vector_bitwise_logical_instructions
- Vector Integer Arithmetic Instructions / 12.4. Vector Bitwise Logical Instructions
Operation | Arguments | Description |
vand.vx | vs2, rs1, vd |
vand.vv vd, vs2, vs1, vm # Vector-vector
vand.vx vd, vs2, rs1, vm # vector-scalar
vand.vi vd, vs2, imm, vm # vector-immediate
|
vor.vx | vs2, rs1, vd |
vor.vv vd, vs2, vs1, vm # Vector-vector
vor.vx vd, vs2, rs1, vm # vector-scalar
vor.vi vd, vs2, imm, vm # vector-immediate
|
vxor.vx | vs2, rs1, vd |
Note vxor vnot.v
vxor.vv vd, vs2, vs1, vm # Vector-vector
vxor.vx vd, vs2, rs1, vm # vector-scalar
vxor.vi vd, vs2, imm, vm # vector-immediate
|
vand.vv | vs2, rs1, vd |
vand.vv vd, vs2, vs1, vm # Vector-vector
vand.vx vd, vs2, rs1, vm # vector-scalar
vand.vi vd, vs2, imm, vm # vector-immediate
|
vor.vv | vs2, rs1, vd |
vor.vv vd, vs2, vs1, vm # Vector-vector
vor.vx vd, vs2, rs1, vm # vector-scalar
vor.vi vd, vs2, imm, vm # vector-immediate
|
vxor.vv | vs2, rs1, vd |
Note vxor vnot.v
vxor.vv vd, vs2, vs1, vm # Vector-vector
vxor.vx vd, vs2, rs1, vm # vector-scalar
vxor.vi vd, vs2, imm, vm # vector-immediate
|
vand.vi | vs2, simm5, vd |
vand.vv vd, vs2, vs1, vm # Vector-vector
vand.vx vd, vs2, rs1, vm # vector-scalar
vand.vi vd, vs2, imm, vm # vector-immediate
|
vor.vi | vs2, simm5, vd |
vor.vv vd, vs2, vs1, vm # Vector-vector
vor.vx vd, vs2, rs1, vm # vector-scalar
vor.vi vd, vs2, imm, vm # vector-immediate
|
vxor.vi | vs2, simm5, vd |
Note vxor vnot.v
vxor.vv vd, vs2, vs1, vm # Vector-vector
vxor.vx vd, vs2, rs1, vm # vector-scalar
vxor.vi vd, vs2, imm, vm # vector-immediate
|
v / _vector_register_gather_instruction
- Vector Permutation Instructions / 17.4. Vector Register Gather Instruction
Operation | Arguments | Description |
vrgather.vx | vs2, rs1, vd |
For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, including the mask register if the operation is masked, otherwise an illegal instruction exception is raised.
Note vrgather.vv can only reference vector elements 0-255.
vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];
vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]]
vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]
|
vrgather.vv | vs2, rs1, vd |
For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, including the mask register if the operation is masked, otherwise an illegal instruction exception is raised.
Note vrgather.vv can only reference vector elements 0-255.
vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];
vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]]
vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]
|
vrgather.vi | vs2, simm5, vd |
For any vrgather instruction, the destination vector register group cannot overlap with the source vector register groups, including the mask register if the operation is masked, otherwise an illegal instruction exception is raised.
Note vrgather.vv can only reference vector elements 0-255.
vrgather.vv vd, vs2, vs1, vm # vd[i] = (vs1[i] >= VLMAX) ? 0 : vs2[vs1[i]];
vrgather.vx vd, vs2, rs1, vm # vd[i] = (x[rs1] >= VLMAX) ? 0 : vs2[x[rs1]]
vrgather.vi vd, vs2, uimm, vm # vd[i] = (uimm >= VLMAX) ? 0 : vs2[uimm]
|
v / _vector_slide_instructions
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vslideup.vx | vs2, rs1, vd |
Note vslideup and vslidedown
For all of the vslideup , vslidedown , vslide1up , and vslide1down instructions, if vstart ≥ vl , the instruction performs no operation and leaves the destination vector register unchanged.
|
vslideup.vi | vs2, simm5, vd |
Note vslideup and vslidedown
For all of the vslideup , vslidedown , vslide1up , and vslide1down instructions, if vstart ≥ vl , the instruction performs no operation and leaves the destination vector register unchanged.
|
v / _vector_slidedown_instructions
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vslidedown.vx | vs2, rs1, vd |
For vslidedown , the value in vl specifies the number of destination elements that are written.
vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1]
vslidedown.vi vd, vs2, uimm[4:0], vm # vd[i] = vs2[i+uimm]
vslidedown behavior for source elements for element i in slide
vslidedown behavior for destination element i in slide
|
vslidedown.vi | vs2, simm5, vd |
For vslidedown , the value in vl specifies the number of destination elements that are written.
vslidedown.vx vd, vs2, rs1, vm # vd[i] = vs2[i+rs1]
vslidedown.vi vd, vs2, uimm[4:0], vm # vd[i] = vs2[i+uimm]
vslidedown behavior for source elements for element i in slide
vslidedown behavior for destination element i in slide
|
v / _vector_integer_add_with_carry_subtract_with_borrow_instructions
- Vector Integer Arithmetic Instructions / 12.3. Vector Integer Add-with-Carry / Subtract-with-Borrow Instructions
Operation | Arguments | Description |
vadc.vxm | vs2, rs1, vd |
. Due to encoding constraints, the carry input must come from the implicit v0 vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd
For vadc and vsbc , an illegal instruction exception is raised if the destination vector register is v0 and LMUL > 1.
vadc.vvm vd, vs2, vs1, v0 # Vector-vector
vadc.vxm vd, vs2, rs1, v0 # Vector-scalar
vadc.vim vd, vs2, imm, v0 # Vector-immediate
vadc.vvm v4, v4, v8, v0 # Calc new sum
|
vmadc.vxm | vs2, rs1, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked ( vm=0 ), and write the result back to mask register vd
For vmadc and vmsbc , an illegal instruction exception is raised if the destination vector register overlaps a source vector register group and LMUL > 1.
vmadc.vvm vd, vs2, vs1, v0 # Vector-vector
vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar
vmadc.vim vd, vs2, imm, v0 # Vector-immediate
vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in
vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in
vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in
vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1
|
vsbc.vxm | vs2, rs1, vd |
The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction
vsbc.vvm vd, vs2, vs1, v0 # Vector-vector
vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar
|
vmsbc.vxm | vs2, rs1, vd |
For vmsbc , the borrow is defined to be 1 iff the difference, prior to truncation, is negative.
vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector
vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar
vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in
vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in
|
vadc.vvm | vs2, rs1, vd |
. Due to encoding constraints, the carry input must come from the implicit v0 vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd
For vadc and vsbc , an illegal instruction exception is raised if the destination vector register is v0 and LMUL > 1.
vadc.vvm vd, vs2, vs1, v0 # Vector-vector
vadc.vxm vd, vs2, rs1, v0 # Vector-scalar
vadc.vim vd, vs2, imm, v0 # Vector-immediate
vadc.vvm v4, v4, v8, v0 # Calc new sum
|
vmadc.vvm | vs2, rs1, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked ( vm=0 ), and write the result back to mask register vd
For vmadc and vmsbc , an illegal instruction exception is raised if the destination vector register overlaps a source vector register group and LMUL > 1.
vmadc.vvm vd, vs2, vs1, v0 # Vector-vector
vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar
vmadc.vim vd, vs2, imm, v0 # Vector-immediate
vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in
vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in
vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in
vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1
|
vsbc.vvm | vs2, rs1, vd |
The subtract with borrow instruction vsbc performs the equivalent function to support long word arithmetic for subtraction
vsbc.vvm vd, vs2, vs1, v0 # Vector-vector
vsbc.vxm vd, vs2, rs1, v0 # Vector-scalar
|
vmsbc.vvm | vs2, rs1, vd |
For vmsbc , the borrow is defined to be 1 iff the difference, prior to truncation, is negative.
vmsbc.vvm vd, vs2, vs1, v0 # Vector-vector
vmsbc.vxm vd, vs2, rs1, v0 # Vector-scalar
vmsbc.vv vd, vs2, vs1 # Vector-vector, no borrow-in
vmsbc.vx vd, vs2, rs1 # Vector-scalar, no borrow-in
|
vadc.vim | vs2, simm5, vd |
. Due to encoding constraints, the carry input must come from the implicit v0 vadc and vsbc add or subtract the source operands and the carry-in or borrow-in, and write the result to vector register vd
For vadc and vsbc , an illegal instruction exception is raised if the destination vector register is v0 and LMUL > 1.
vadc.vvm vd, vs2, vs1, v0 # Vector-vector
vadc.vxm vd, vs2, rs1, v0 # Vector-scalar
vadc.vim vd, vs2, imm, v0 # Vector-immediate
vadc.vvm v4, v4, v8, v0 # Calc new sum
|
vmadc.vim | vs2, simm5, vd |
vmadc and vmsbc add or subtract the source operands, optionally add the carry-in or subtract the borrow-in if masked ( vm=0 ), and write the result back to mask register vd
For vmadc and vmsbc , an illegal instruction exception is raised if the destination vector register overlaps a source vector register group and LMUL > 1.
vmadc.vvm vd, vs2, vs1, v0 # Vector-vector
vmadc.vxm vd, vs2, rs1, v0 # Vector-scalar
vmadc.vim vd, vs2, imm, v0 # Vector-immediate
vmadc.vv vd, vs2, vs1 # Vector-vector, no carry-in
vmadc.vx vd, vs2, rs1 # Vector-scalar, no carry-in
vmadc.vi vd, vs2, imm # Vector-immediate, no carry-in
vmadc.vvm v1, v4, v8, v0 # Get carry into temp register v1
|
v / _vector_integer_merge_instructions
- Vector Integer Arithmetic Instructions / 12.15. Vector Integer Merge Instructions
Operation | Arguments | Description |
vmerge.vxm | vs2, rs1, vd |
The vmerge instructions are always masked ( vm=0 )
vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0[i].LSB ? vs1[i] : vs2[i]
vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0[i].LSB ? x[rs1] : vs2[i]
vmerge.vim vd, vs2, imm, v0 # vd[i] = v0[i].LSB ? imm : vs2[i]
|
vmerge.vvm | vs2, rs1, vd |
The vmerge instructions are always masked ( vm=0 )
vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0[i].LSB ? vs1[i] : vs2[i]
vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0[i].LSB ? x[rs1] : vs2[i]
vmerge.vim vd, vs2, imm, v0 # vd[i] = v0[i].LSB ? imm : vs2[i]
|
vmerge.vim | vs2, simm5, vd |
The vmerge instructions are always masked ( vm=0 )
vmerge.vvm vd, vs2, vs1, v0 # vd[i] = v0[i].LSB ? vs1[i] : vs2[i]
vmerge.vxm vd, vs2, rs1, v0 # vd[i] = v0[i].LSB ? x[rs1] : vs2[i]
vmerge.vim vd, vs2, imm, v0 # vd[i] = v0[i].LSB ? imm : vs2[i]
|
v / _vector_integer_move_instructions
- Vector Integer Arithmetic Instructions / 12.16. Vector Integer Move Instructions
Operation | Arguments | Description |
vmv.v.x | rs1, vd |
This instruction copies the vs1 , rs1 , or immediate operand to the first vl Note vmv.v.i vd, 0; vmerge.vim vd, vd, 1, v0
vmv.v.v vd, vs1 # vd[i] = vs1[i]
vmv.v.x vd, rs1 # vd[i] = rs1
vmv.v.i vd, imm # vd[i] = imm
|
vmv.v.v | rs1, vd |
This instruction copies the vs1 , rs1 , or immediate operand to the first vl Note vmv.v.i vd, 0; vmerge.vim vd, vd, 1, v0
vmv.v.v vd, vs1 # vd[i] = vs1[i]
vmv.v.x vd, rs1 # vd[i] = rs1
vmv.v.i vd, imm # vd[i] = imm
|
vmv.v.i | simm5, vd |
This instruction copies the vs1 , rs1 , or immediate operand to the first vl Note vmv.v.i vd, 0; vmerge.vim vd, vd, 1, v0
vmv.v.v vd, vs1 # vd[i] = vs1[i]
vmv.v.x vd, rs1 # vd[i] = rs1
vmv.v.i vd, imm # vd[i] = imm
|
vmv.s.x | rs1, vd |
This instruction copies the vs1 , rs1 , or immediate operand to the first vl Note vmv.v.i vd, 0; vmerge.vim vd, vd, 1, v0
vmv.v.v vd, vs1 # vd[i] = vs1[i]
vmv.v.x vd, rs1 # vd[i] = rs1
vmv.v.i vd, imm # vd[i] = imm
|
v / _vector_single_width_saturating_add_and_subtract
- Vector Fixed-Point Arithmetic Instructions / 13.1. Vector Single-Width Saturating Add and Subtract
Operation | Arguments | Description |
vsaddu.vx | vs2, rs1, vd |
vsaddu.vv vd, vs2, vs1, vm # Vector-vector
vsaddu.vx vd, vs2, rs1, vm # vector-scalar
vsaddu.vi vd, vs2, imm, vm # vector-immediate
|
vsadd.vx | vs2, rs1, vd |
vsadd.vv vd, vs2, vs1, vm # Vector-vector
vsadd.vx vd, vs2, rs1, vm # vector-scalar
vsadd.vi vd, vs2, imm, vm # vector-immediate
|
vssubu.vx | vs2, rs1, vd |
vssubu.vv vd, vs2, vs1, vm # Vector-vector
vssubu.vx vd, vs2, rs1, vm # vector-scalar
|
vssub.vx | vs2, rs1, vd |
vssub.vv vd, vs2, vs1, vm # Vector-vector
vssub.vx vd, vs2, rs1, vm # vector-scalar
|
vsaddu.vv | vs2, rs1, vd |
vsaddu.vv vd, vs2, vs1, vm # Vector-vector
vsaddu.vx vd, vs2, rs1, vm # vector-scalar
vsaddu.vi vd, vs2, imm, vm # vector-immediate
|
vsadd.vv | vs2, rs1, vd |
vsadd.vv vd, vs2, vs1, vm # Vector-vector
vsadd.vx vd, vs2, rs1, vm # vector-scalar
vsadd.vi vd, vs2, imm, vm # vector-immediate
|
vssubu.vv | vs2, rs1, vd |
vssubu.vv vd, vs2, vs1, vm # Vector-vector
vssubu.vx vd, vs2, rs1, vm # vector-scalar
|
vssub.vv | vs2, rs1, vd |
vssub.vv vd, vs2, vs1, vm # Vector-vector
vssub.vx vd, vs2, rs1, vm # vector-scalar
|
vsaddu.vi | vs2, simm5, vd |
vsaddu.vv vd, vs2, vs1, vm # Vector-vector
vsaddu.vx vd, vs2, rs1, vm # vector-scalar
vsaddu.vi vd, vs2, imm, vm # vector-immediate
|
vsadd.vi | vs2, simm5, vd |
vsadd.vv vd, vs2, vs1, vm # Vector-vector
vsadd.vx vd, vs2, rs1, vm # vector-scalar
vsadd.vi vd, vs2, imm, vm # vector-immediate
|
v / _vector_single_width_averaging_add_and_subtract
- Vector Fixed-Point Arithmetic Instructions / 13.2. Vector Single-Width Averaging Add and Subtract
Operation | Arguments | Description |
vaadd.vx | vs2, rs1, vd |
For vaaddu , vaadd , and vasub , there can be no overflow in the result
vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1)
vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1)
|
vasub.vx | vs2, rs1, vd |
vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1)
vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1)
|
vaadd.vv | vs2, rs1, vd |
For vaaddu , vaadd , and vasub , there can be no overflow in the result
vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1)
vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1)
|
vasub.vv | vs2, rs1, vd |
vasub.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] - vs1[i], 1)
vasub.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] - x[rs1], 1)
|
vaadd.vi | vs2, simm5, vd |
For vaaddu , vaadd , and vasub , there can be no overflow in the result
vaadd.vv vd, vs2, vs1, vm # roundoff_signed(vs2[i] + vs1[i], 1)
vaadd.vx vd, vs2, rs1, vm # roundoff_signed(vs2[i] + x[rs1], 1)
|
v / _vector_single_width_bit_shift_instructions
- Vector Integer Arithmetic Instructions / 12.5. Vector Single-Width Bit Shift Instructions
Operation | Arguments | Description |
vsll.vx | vs2, rs1, vd |
vsll.vv vd, vs2, vs1, vm # Vector-vector
vsll.vx vd, vs2, rs1, vm # vector-scalar
vsll.vi vd, vs2, uimm, vm # vector-immediate
|
vsra.vx | vs2, rs1, vd |
vsra.vv vd, vs2, vs1, vm # Vector-vector
vsra.vx vd, vs2, rs1, vm # vector-scalar
vsra.vi vd, vs2, uimm, vm # vector-immediate
|
vsll.vv | vs2, rs1, vd |
vsll.vv vd, vs2, vs1, vm # Vector-vector
vsll.vx vd, vs2, rs1, vm # vector-scalar
vsll.vi vd, vs2, uimm, vm # vector-immediate
|
vsra.vv | vs2, rs1, vd |
vsra.vv vd, vs2, vs1, vm # Vector-vector
vsra.vx vd, vs2, rs1, vm # vector-scalar
vsra.vi vd, vs2, uimm, vm # vector-immediate
|
vsll.vi | vs2, simm5, vd |
vsll.vv vd, vs2, vs1, vm # Vector-vector
vsll.vx vd, vs2, rs1, vm # vector-scalar
vsll.vi vd, vs2, uimm, vm # vector-immediate
|
vsra.vi | vs2, simm5, vd |
vsra.vv vd, vs2, vs1, vm # Vector-vector
vsra.vx vd, vs2, rs1, vm # vector-scalar
vsra.vi vd, vs2, uimm, vm # vector-immediate
|
v / _vector_single_width_fractional_multiply_with_rounding_and_saturation
- Vector Fixed-Point Arithmetic Instructions / 13.3. Vector Single-Width Fractional Multiply with Rounding and Saturation
Operation | Arguments | Description |
vsmul.vx | vs2, rs1, vd |
vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1))
vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1))
|
vsmul.vv | vs2, rs1, vd |
vsmul.vv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*vs1[i], SEW-1))
vsmul.vx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i]*x[rs1], SEW-1))
|
v / _vector_single_width_scaling_shift_instructions
- Vector Fixed-Point Arithmetic Instructions / 13.4. Vector Single-Width Scaling Shift Instructions
Operation | Arguments | Description |
vssrl.vx | vs2, rs1, vd |
The scaling right shifts have both zero-extending ( vssrl ) and sign-extending ( vssra ) forms
vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i])
vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1])
vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm)
|
vssra.vx | vs2, rs1, vd |
vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i])
vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1])
vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)
|
vssrl.vv | vs2, rs1, vd |
The scaling right shifts have both zero-extending ( vssrl ) and sign-extending ( vssra ) forms
vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i])
vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1])
vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm)
|
vssra.vv | vs2, rs1, vd |
vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i])
vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1])
vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)
|
vssrl.vi | vs2, simm5, vd |
The scaling right shifts have both zero-extending ( vssrl ) and sign-extending ( vssra ) forms
vssrl.vv vd, vs2, vs1, vm # vd[i] = roundoff_unsigned(vs2[i], vs1[i])
vssrl.vx vd, vs2, rs1, vm # vd[i] = roundoff_unsigned(vs2[i], x[rs1])
vssrl.vi vd, vs2, uimm, vm # vd[i] = roundoff_unsigned(vs2[i], uimm)
|
vssra.vi | vs2, simm5, vd |
vssra.vv vd, vs2, vs1, vm # vd[i] = roundoff_signed(vs2[i],vs1[i])
vssra.vx vd, vs2, rs1, vm # vd[i] = roundoff_signed(vs2[i], x[rs1])
vssra.vi vd, vs2, uimm, vm # vd[i] = roundoff_signed(vs2[i], uimm)
|
v / _vector_narrowing_integer_right_shift_instructions
- Vector Integer Arithmetic Instructions / 12.6. Vector Narrowing Integer Right Shift Instructions
Operation | Arguments | Description |
vnsrl.vx | vs2, rs1, vd |
vnsrl.wv vd, vs2, vs1, vm # vector-vector
vnsrl.wx vd, vs2, rs1, vm # vector-scalar
vnsrl.wi vd, vs2, uimm, vm # vector-immediate
|
vnsrl.vv | vs2, rs1, vd |
vnsrl.wv vd, vs2, vs1, vm # vector-vector
vnsrl.wx vd, vs2, rs1, vm # vector-scalar
vnsrl.wi vd, vs2, uimm, vm # vector-immediate
|
vnsrl.vi | vs2, simm5, vd |
vnsrl.wv vd, vs2, vs1, vm # vector-vector
vnsrl.wx vd, vs2, rs1, vm # vector-scalar
vnsrl.wi vd, vs2, uimm, vm # vector-immediate
|
v / sec-narrowing
- Vector Arithmetic Instruction Formats / 11.3. Narrowing Vector Arithmetic Instructions
Operation | Arguments | Description |
vnsra.vx | vs2, rs1, vd |
The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv )
|
vnsra.vv | vs2, rs1, vd |
The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv )
|
vnsra.vi | vs2, simm5, vd |
The double-width source vector register group is signified by a w in the source operand suffix (e.g., vnsra.wv )
|
v / _vector_narrowing_fixed_point_clip_instructions
- Vector Fixed-Point Arithmetic Instructions / 13.5. Vector Narrowing Fixed-Point Clip Instructions
Operation | Arguments | Description |
vnclipu.vx | vs2, rs1, vd |
For vnclipu / vnclip , the rounding mode is specified in the vxrm For vnclipu , the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer.
vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i]))
vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1]))
vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm5))
|
vnclip.vx | vs2, rs1, vd |
The vnclip instructions are used to pack a fixed-point value into a narrower destination
For vnclip , the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.
vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i]))
vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1]))
vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm5))
|
vnclipu.vv | vs2, rs1, vd |
For vnclipu / vnclip , the rounding mode is specified in the vxrm For vnclipu , the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer.
vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i]))
vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1]))
vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm5))
|
vnclip.vv | vs2, rs1, vd |
The vnclip instructions are used to pack a fixed-point value into a narrower destination
For vnclip , the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.
vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i]))
vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1]))
vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm5))
|
vnclipu.vi | vs2, simm5, vd |
For vnclipu / vnclip , the rounding mode is specified in the vxrm For vnclipu , the shifted rounded source value is treated as an unsigned integer and saturates if the result would overflow the destination viewed as an unsigned integer.
vnclipu.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], vs1[i]))
vnclipu.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_unsigned(vs2[i], x[rs1]))
vnclipu.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_unsigned(vs2[i], uimm5))
|
vnclip.vi | vs2, simm5, vd |
The vnclip instructions are used to pack a fixed-point value into a narrower destination
For vnclip , the shifted rounded source value is treated as a signed integer and saturates if the result would overflow the destination viewed as a signed integer.
vnclip.wv vd, vs2, vs1, vm # vd[i] = clip(roundoff_signed(vs2[i], vs1[i]))
vnclip.wx vd, vs2, rs1, vm # vd[i] = clip(roundoff_signed(vs2[i], x[rs1]))
vnclip.wi vd, vs2, uimm, vm # vd[i] = clip(roundoff_signed(vs2[i], uimm5))
|
v / _vector_widening_integer_reduction_instructions
- Vector Reduction Operations / 15.2. Vector Widening Integer Reduction Instructions
Operation | Arguments | Description |
vwredsumu.vs | vs2, rs1, vd |
The unsigned vwredsumu.vs instruction zero-extends the SEW-wide vector elements before summing them, then adds the 2*SEW-width scalar element, and stores the result in a 2*SEW-width scalar element.
vwredsumu.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(zero-extend(SEW))
|
vwredsum.vs | vs2, rs1, vd |
The vwredsum.vs instruction sign-extends the SEW-wide vector elements before summing them.
vwredsum.vs vd, vs2, vs1, vm # 2*SEW = 2*SEW + sum(sign-extend(SEW))
|
v / _vector_integer_dot_product_instruction
- Divided Element Extension (‘Zvediv’) / 19.3. Vector Integer Dot-Product Instruction
Operation | Arguments | Description |
vdotu.vv | vs2, rs1, vd |
vdotu.vv vd, vs2, vs1, vm # Vector-vector
|
vdot.vv | vs2, rs1, vd |
The integer dot-product reduction vdot.vv performs an element-wise multiplication between the source sub-elements then accumulates the results into the destination vector element
vdot.vv vd, vs2, vs1, vm # Vector-vector
vdot.vv vd, vs2, vs1, vm # vd[i][31:0] += vs2[i][31:0] * vs1[i][31:0]
vdot.vv vd, vs2, vs1, vm # vd[i][31:0] += vs2[i][31:16] * vs1[i][31:16]
vdot.vv vd, vs2, vs1, vm # vd[i][31:0] += vs2[i][31:24] * vs1[i][31:24]
|
v / _vector_single_width_integer_reduction_instructions
- Vector Reduction Operations / 15.1. Vector Single-Width Integer Reduction Instructions
Operation | Arguments | Description |
vredsum.vs | vs2, vs1, vd |
vredsum.vs vd, vs2, vs1, vm # vd[0] = sum( vs1[0] , vs2[*] )
|
vredand.vs | vs2, vs1, vd |
vredand.vs vd, vs2, vs1, vm # vd[0] = and( vs1[0] , vs2[*] )
|
vredor.vs | vs2, vs1, vd |
vredor.vs vd, vs2, vs1, vm # vd[0] = or( vs1[0] , vs2[*] )
|
vredxor.vs | vs2, vs1, vd |
vredxor.vs vd, vs2, vs1, vm # vd[0] = xor( vs1[0] , vs2[*] )
|
vredminu.vs | vs2, vs1, vd |
vredminu.vs vd, vs2, vs1, vm # vd[0] = minu( vs1[0] , vs2[*] )
|
vredmin.vs | vs2, vs1, vd |
vredmin.vs vd, vs2, vs1, vm # vd[0] = min( vs1[0] , vs2[*] )
|
vredmaxu.vs | vs2, vs1, vd |
vredmaxu.vs vd, vs2, vs1, vm # vd[0] = maxu( vs1[0] , vs2[*] )
|
vredmax.vs | vs2, vs1, vd |
vredmax.vs vd, vs2, vs1, vm # vd[0] = max( vs1[0] , vs2[*] )
|
v / _vector_compress_instruction
- Vector Permutation Instructions / 17.5. Vector Compress Instruction
Operation | Arguments | Description |
vcompress.vm | vs2, vs1, vd |
vcompress is encoded as an unmasked instruction ( vm=1 )
A trap on a vcompress instruction is always reported with a vstart of 0. Executing a vcompress instruction with a non-zero vstart raises an illegal instruction exception.
Note vcompress is one of the more difficult instructions to restart with a non-zero vstart , so assumption is implementations will choose not do that but will instead restart from element 0. This does mean elements in destination register after vstart will already have been updated
vcompress.vm vd, vs2, vs1 # Compress into vd elements of vs2 where vs1 is enabled
Example use of vcompress instruction
vcompress.vm v2, v1, v0
|
v / _vector_integer_comparison_instructions
- Vector Integer Arithmetic Instructions / 12.7. Vector Integer Comparison Instructions
Operation | Arguments | Description |
vmandnot.mm | vs2, vs1, vd |
expansion: vmslt{u}.vx vt, va, x; vmandnot.mm vd, vd, vt
|
vmxor.mm | vs2, vs1, vd |
expansion: vmslt{u}.vx vd, va, x, v0.t; vmxor.mm vd, vd, v0
|
vmnand.mm | vs2, vs1, vd |
expansion: vmslt{u}.vx vd, va, x; vmnand.mm vd, vd, vd
|
v / _vector_floating_point_compare_instructions
- Vector Floating-Point Instructions / 14.11. Vector Floating-Point Compare Instructions
Operation | Arguments | Description |
vmand.mm | vs2, vs1, vd |
Note vmfeq vmand instruction, but this more efficient sequence incorrectly fails to raise the invalid exception when an element of va contains a quiet NaN and the corresponding element in vb contains a signaling NaN
vmand.mm v0, v0, v1 # Only set where A and B are ordered,
|
v / sec-mask-register-logical
- Vector Mask Instructions / 16.1. Vector Mask-Register Logical Instructions
Operation | Arguments | Description |
vmor.mm | vs2, vs1, vd |
vmor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || vs1[i].LSB
|
vmornot.mm | vs2, vs1, vd |
vmornot.mm vd, src2, src1
vmornot.mm vd, src1, src2
vmornot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || !vs1[i].LSB
|
vmnor.mm | vs2, vs1, vd |
vmnor.mm vd, src1, src2
vmnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i[.LSB || vs1[i].LSB)
|
vmxnor.mm | vs2, vs1, vd |
vmxnor.mm vd, src1, src2
vmxnor.mm vd, vd, vd
vmxnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB ^^ vs1[i].LSB)
vmset.m vd => vmxnor.mm vd, vd, vd # Set mask register
|
v / __code_vfirst_code_find_first_set_mask_bit
- Vector Mask Instructions / find-first-set mask bit
Operation | Arguments | Description |
vmsbf.m | vs2, vd |
vmsbf.m
|
v / __code_vmsif_m_code_set_including_first_mask_bit
- Vector Mask Instructions / set-including-first mask bit
Operation | Arguments | Description |
vmsof.m | vs2, vd |
vmsof.m
|
v / __code_vmsbf_m_code_set_before_first_mask_bit
- Vector Mask Instructions / set-before-first mask bit
Operation | Arguments | Description |
vmsif.m | vs2, vd |
vmsif.m
|
v / _vector_iota_instruction
- Vector Mask Instructions / 16.8. Vector Iota Instruction
Operation | Arguments | Description |
viota.m | vs2, vd |
The viota.m instruction reads a source vector mask register and writes to each element of the destination vector register group the sum of all the least-significant bits of elements in the mask register whose index is less than the element, e.g., a parallel prefix sum of the mask values.
Traps on viota.m are always reported with a vstart of 0, and execution is always restarted from the beginning when resuming after a trap handler
The viota.m instruction can be combined with memory scatter instructions (indexed stores) to perform vector compress functions.
viota.m vd, vs2, vm
viota.m v4, v2 # Unmasked
viota.m v4, v2, v0.t # Masked
viota.m v16, v0 # Get destination offsets of active elements
|
v / _vector_element_index_instruction
- Vector Mask Instructions / 16.9. Vector Element Index Instruction
Operation | Arguments | Description |
vid.v | vs2, vd |
The vid.v instruction writes each element’s index to the destination vector register group, from 0 to vl -1.
Note vid.v instruction using the same datapath as viota.m but with an implicit set mask source
vid.v vd, vm # Write element ID to destination.
|
v / _vector_integer_divide_instructions
- Vector Integer Arithmetic Instructions / 12.10. Vector Integer Divide Instructions
Operation | Arguments | Description |
vdivu.vv | vs2, vs1, vd |
vdivu.vv vd, vs2, vs1, vm # Vector-vector
vdivu.vx vd, vs2, rs1, vm # vector-scalar
|
vdiv.vv | vs2, vs1, vd |
vdiv.vv vd, vs2, vs1, vm # Vector-vector
vdiv.vx vd, vs2, rs1, vm # vector-scalar
|
vremu.vv | vs2, vs1, vd |
vremu.vv vd, vs2, vs1, vm # Vector-vector
vremu.vx vd, vs2, rs1, vm # vector-scalar
|
vrem.vv | vs2, vs1, vd |
vrem.vv vd, vs2, vs1, vm # Vector-vector
vrem.vx vd, vs2, rs1, vm # vector-scalar
|
vdivu.vx | vs2, rs1, vd |
vdivu.vv vd, vs2, vs1, vm # Vector-vector
vdivu.vx vd, vs2, rs1, vm # vector-scalar
|
vdiv.vx | vs2, rs1, vd |
vdiv.vv vd, vs2, vs1, vm # Vector-vector
vdiv.vx vd, vs2, rs1, vm # vector-scalar
|
vremu.vx | vs2, rs1, vd |
vremu.vv vd, vs2, vs1, vm # Vector-vector
vremu.vx vd, vs2, rs1, vm # vector-scalar
|
vrem.vx | vs2, rs1, vd |
vrem.vv vd, vs2, vs1, vm # Vector-vector
vrem.vx vd, vs2, rs1, vm # vector-scalar
|
v / _vector_single_width_integer_multiply_instructions
- Vector Integer Arithmetic Instructions / 12.9. Vector Single-Width Integer Multiply Instructions
Operation | Arguments | Description |
vmulhu.vv | vs2, vs1, vd |
vmulhu.vv vd, vs2, vs1, vm # Vector-vector
vmulhu.vx vd, vs2, rs1, vm # vector-scalar
|
vmulhsu.vv | vs2, vs1, vd |
vmulhsu.vv vd, vs2, vs1, vm # Vector-vector
vmulhsu.vx vd, vs2, rs1, vm # vector-scalar
|
vmulh.vv | vs2, vs1, vd |
Note vmulh* opcodes perform simple fractional multiplies, but with no option to scale, round, and/or saturate the result
Can consider changing definition of vmulh , vmulhu , vmulhsu to use vxrm rounding mode when discarding low half of product
vmulh.vv vd, vs2, vs1, vm # Vector-vector
vmulh.vx vd, vs2, rs1, vm # vector-scalar
|
vmulhu.vx | vs2, rs1, vd |
vmulhu.vv vd, vs2, vs1, vm # Vector-vector
vmulhu.vx vd, vs2, rs1, vm # vector-scalar
|
vmulhsu.vx | vs2, rs1, vd |
vmulhsu.vv vd, vs2, vs1, vm # Vector-vector
vmulhsu.vx vd, vs2, rs1, vm # vector-scalar
|
vmulh.vx | vs2, rs1, vd |
Note vmulh* opcodes perform simple fractional multiplies, but with no option to scale, round, and/or saturate the result
Can consider changing definition of vmulh , vmulhu , vmulhsu to use vxrm rounding mode when discarding low half of product
vmulh.vv vd, vs2, vs1, vm # Vector-vector
vmulh.vx vd, vs2, rs1, vm # vector-scalar
|
v / _vector_single_width_integer_multiply_add_instructions
- Vector Integer Arithmetic Instructions / 12.12. Vector Single-Width Integer Multiply-Add Instructions
Operation | Arguments | Description |
vmadd.vv | vs2, vs1, vd |
vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i]
vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i]
|
vnmsub.vv | vs2, vs1, vd |
Similarly for the "vnmsub" opcode
vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i]
|
vmacc.vv | vs2, vs1, vd |
The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend ( vmacc , vnmsac ) and one that overwrites the first multiplicand ( vmadd , vnmsub ).
vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
|
vnmsac.vv | vs2, vs1, vd |
vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i]
|
vmadd.vx | vs2, rs1, vd |
vmadd.vv vd, vs1, vs2, vm # vd[i] = (vs1[i] * vd[i]) + vs2[i]
vmadd.vx vd, rs1, vs2, vm # vd[i] = (x[rs1] * vd[i]) + vs2[i]
|
vnmsub.vx | vs2, rs1, vd |
Similarly for the "vnmsub" opcode
vnmsub.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vd[i]) + vs2[i]
vnmsub.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vd[i]) + vs2[i]
|
vmacc.vx | vs2, rs1, vd |
The integer multiply-add instructions are destructive and are provided in two forms, one that overwrites the addend or minuend ( vmacc , vnmsac ) and one that overwrites the first multiplicand ( vmadd , vnmsub ).
vmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
|
vnmsac.vx | vs2, rs1, vd |
vnmsac.vv vd, vs1, vs2, vm # vd[i] = -(vs1[i] * vs2[i]) + vd[i]
vnmsac.vx vd, rs1, vs2, vm # vd[i] = -(x[rs1] * vs2[i]) + vd[i]
|
v / _vector_widening_integer_add_subtract
- Vector Integer Arithmetic Instructions / 12.2. Vector Widening Integer Add/Subtract
Operation | Arguments | Description |
vwaddu.vv | vs2, vs1, vd |
vwaddu.vv vd, vs2, vs1, vm # vector-vector
vwaddu.vx vd, vs2, rs1, vm # vector-scalar
vwaddu.wv vd, vs2, vs1, vm # vector-vector
vwaddu.wx vd, vs2, rs1, vm # vector-scalar
|
vwadd.vv | vs2, vs1, vd |
Can define assembly pseudoinstructions vwcvt.x.x.v vd,vs,vm = vwadd.vx vd,vs,x0,vm and vwcvtu.x.x.v vd,vs,vm = vwaddu.vx vd,vs,x0,vm
vwadd.vv vd, vs2, vs1, vm # vector-vector
vwadd.vx vd, vs2, rs1, vm # vector-scalar
vwadd.wv vd, vs2, vs1, vm # vector-vector
vwadd.wx vd, vs2, rs1, vm # vector-scalar
|
vwsubu.vv | vs2, vs1, vd |
vwsubu.vv vd, vs2, vs1, vm # vector-vector
vwsubu.vx vd, vs2, rs1, vm # vector-scalar
vwsubu.wv vd, vs2, vs1, vm # vector-vector
vwsubu.wx vd, vs2, rs1, vm # vector-scalar
|
vwsub.vv | vs2, vs1, vd |
vwsub.vv vd, vs2, vs1, vm # vector-vector
vwsub.vx vd, vs2, rs1, vm # vector-scalar
vwsub.wv vd, vs2, vs1, vm # vector-vector
vwsub.wx vd, vs2, rs1, vm # vector-scalar
|
vwaddu.wv | vs2, vs1, vd |
vwaddu.vv vd, vs2, vs1, vm # vector-vector
vwaddu.vx vd, vs2, rs1, vm # vector-scalar
vwaddu.wv vd, vs2, vs1, vm # vector-vector
vwaddu.wx vd, vs2, rs1, vm # vector-scalar
|
vwadd.wv | vs2, vs1, vd |
Can define assembly pseudoinstructions vwcvt.x.x.v vd,vs,vm = vwadd.vx vd,vs,x0,vm and vwcvtu.x.x.v vd,vs,vm = vwaddu.vx vd,vs,x0,vm
vwadd.vv vd, vs2, vs1, vm # vector-vector
vwadd.vx vd, vs2, rs1, vm # vector-scalar
vwadd.wv vd, vs2, vs1, vm # vector-vector
vwadd.wx vd, vs2, rs1, vm # vector-scalar
|
vwsubu.wv | vs2, vs1, vd |
vwsubu.vv vd, vs2, vs1, vm # vector-vector
vwsubu.vx vd, vs2, rs1, vm # vector-scalar
vwsubu.wv vd, vs2, vs1, vm # vector-vector
vwsubu.wx vd, vs2, rs1, vm # vector-scalar
|
vwsub.wv | vs2, vs1, vd |
vwsub.vv vd, vs2, vs1, vm # vector-vector
vwsub.vx vd, vs2, rs1, vm # vector-scalar
vwsub.wv vd, vs2, vs1, vm # vector-vector
vwsub.wx vd, vs2, rs1, vm # vector-scalar
|
vwaddu.vx | vs2, rs1, vd |
vwaddu.vv vd, vs2, vs1, vm # vector-vector
vwaddu.vx vd, vs2, rs1, vm # vector-scalar
vwaddu.wv vd, vs2, vs1, vm # vector-vector
vwaddu.wx vd, vs2, rs1, vm # vector-scalar
|
vwadd.vx | vs2, rs1, vd |
Can define assembly pseudoinstructions vwcvt.x.x.v vd,vs,vm = vwadd.vx vd,vs,x0,vm and vwcvtu.x.x.v vd,vs,vm = vwaddu.vx vd,vs,x0,vm
vwadd.vv vd, vs2, vs1, vm # vector-vector
vwadd.vx vd, vs2, rs1, vm # vector-scalar
vwadd.wv vd, vs2, vs1, vm # vector-vector
vwadd.wx vd, vs2, rs1, vm # vector-scalar
|
vwsubu.vx | vs2, rs1, vd |
vwsubu.vv vd, vs2, vs1, vm # vector-vector
vwsubu.vx vd, vs2, rs1, vm # vector-scalar
vwsubu.wv vd, vs2, vs1, vm # vector-vector
vwsubu.wx vd, vs2, rs1, vm # vector-scalar
|
vwsub.vx | vs2, rs1, vd |
vwsub.vv vd, vs2, vs1, vm # vector-vector
vwsub.vx vd, vs2, rs1, vm # vector-scalar
vwsub.wv vd, vs2, vs1, vm # vector-vector
vwsub.wx vd, vs2, rs1, vm # vector-scalar
|
vwaddu.wx | vs2, rs1, vd |
vwaddu.vv vd, vs2, vs1, vm # vector-vector
vwaddu.vx vd, vs2, rs1, vm # vector-scalar
vwaddu.wv vd, vs2, vs1, vm # vector-vector
vwaddu.wx vd, vs2, rs1, vm # vector-scalar
|
vwadd.wx | vs2, rs1, vd |
Can define assembly pseudoinstructions vwcvt.x.x.v vd,vs,vm = vwadd.vx vd,vs,x0,vm and vwcvtu.x.x.v vd,vs,vm = vwaddu.vx vd,vs,x0,vm
vwadd.vv vd, vs2, vs1, vm # vector-vector
vwadd.vx vd, vs2, rs1, vm # vector-scalar
vwadd.wv vd, vs2, vs1, vm # vector-vector
vwadd.wx vd, vs2, rs1, vm # vector-scalar
|
vwsubu.wx | vs2, rs1, vd |
vwsubu.vv vd, vs2, vs1, vm # vector-vector
vwsubu.vx vd, vs2, rs1, vm # vector-scalar
vwsubu.wv vd, vs2, vs1, vm # vector-vector
vwsubu.wx vd, vs2, rs1, vm # vector-scalar
|
vwsub.wx | vs2, rs1, vd |
vwsub.vv vd, vs2, vs1, vm # vector-vector
vwsub.vx vd, vs2, rs1, vm # vector-scalar
vwsub.wv vd, vs2, vs1, vm # vector-vector
vwsub.wx vd, vs2, rs1, vm # vector-scalar
|
v / _vector_widening_integer_multiply_instructions
- Vector Integer Arithmetic Instructions / 12.11. Vector Widening Integer Multiply Instructions
Operation | Arguments | Description |
vwmulu.vv | vs2, vs1, vd |
vwmulu.vv vd, vs2, vs1, vm # vector-vector
vwmulu.vx vd, vs2, rs1, vm # vector-scalar
|
vwmulsu.vv | vs2, vs1, vd |
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar
|
vwmulu.vx | vs2, rs1, vd |
vwmulu.vv vd, vs2, vs1, vm # vector-vector
vwmulu.vx vd, vs2, rs1, vm # vector-scalar
|
vwmulsu.vx | vs2, rs1, vd |
vwmulsu.vv vd, vs2, vs1, vm # vector-vector
vwmulsu.vx vd, vs2, rs1, vm # vector-scalar
|
v / _vector_widening_integer_multiply_add_instructions
- Vector Integer Arithmetic Instructions / 12.13. Vector Widening Integer Multiply-Add Instructions
Operation | Arguments | Description |
vwmaccu.vv | vs2, vs1, vd |
vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
|
vwmacc.vv | vs2, vs1, vd |
vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
|
vwmaccsu.vv | vs2, vs1, vd |
vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i]
vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i]
|
vwmaccu.vx | vs2, rs1, vd |
vwmaccu.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vwmaccu.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
|
vwmacc.vx | vs2, rs1, vd |
vwmacc.vv vd, vs1, vs2, vm # vd[i] = +(vs1[i] * vs2[i]) + vd[i]
vwmacc.vx vd, rs1, vs2, vm # vd[i] = +(x[rs1] * vs2[i]) + vd[i]
|
vwmaccsu.vx | vs2, rs1, vd |
vwmaccsu.vv vd, vs1, vs2, vm # vd[i] = +(signed(vs1[i]) * unsigned(vs2[i])) + vd[i]
vwmaccsu.vx vd, rs1, vs2, vm # vd[i] = +(signed(x[rs1]) * unsigned(vs2[i])) + vd[i]
|
vwmaccus.vx | vs2, rs1, vd |
vwmaccus.vx vd, rs1, vs2, vm # vd[i] = +(unsigned(x[rs1]) * signed(vs2[i])) + vd[i]
|
v / _vector_slide1up
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vslide1up.vx | vs2, rs1, vd |
The vslide1up instruction places the x register argument at location 0 of the destination vector register group, provided that element 0 is active, otherwise the destination element is unchanged
The vslide1up instruction requires that the destination vector register group does not overlap the source vector register group or the mask register
vslide1up.vx vd, vs2, rs1, vm # vd[0]=x[rs1], vd[i+1] = vs2[i]
vslide1up behavior
|
v / _vector_slide1down_instruction
- Vector Permutation Instructions / 17.3. Vector Slide Instructions
Operation | Arguments | Description |
vslide1down.vx | vs2, rs1, vd |
The vslide1down instruction copies the first vl -1 active elements values from index i +1 in the source vector register group to index i in the destination vector register group.
The vslide1down instruction places the x register argument at location vl -1 in the destination vector register, provided that element vl-1 is active, otherwise the destination element is unchanged
Note vslide1down instruction can be used to load values into a vector register without using memory and without disturbing other vector registers
This provides a path for debuggers to modify the contents of a vector register, albeit slowly, with multiple repeated vslide1down invocations
vslide1down.vx vd, vs2, rs1, vm # vd[i] = vs2[i+1], vd[vl-1]=x[rs1]
vslide1down behavior
|
custom
custom /
/
Operation | Arguments | Description |
@custom0 | rd, rs1, imm12 | |
@custom0.rs1 | rd, rs1, imm12 | |
@custom0.rs1.rs2 | rd, rs1, imm12 | |
@custom0.rd | rd, rs1, imm12 | |
@custom0.rd.rs1 | rd, rs1, imm12 | |
@custom0.rd.rs1.rs2 | rd, rs1, imm12 | |
@custom1 | rd, rs1, imm12 | |
@custom1.rs1 | rd, rs1, imm12 | |
@custom1.rs1.rs2 | rd, rs1, imm12 | |
@custom1.rd | rd, rs1, imm12 | |
@custom1.rd.rs1 | rd, rs1, imm12 | |
@custom1.rd.rs1.rs2 | rd, rs1, imm12 | |
@custom2 | rd, rs1, imm12 | |
@custom2.rs1 | rd, rs1, imm12 | |
@custom2.rs1.rs2 | rd, rs1, imm12 | |
@custom2.rd | rd, rs1, imm12 | |
@custom2.rd.rs1 | rd, rs1, imm12 | |
@custom2.rd.rs1.rs2 | rd, rs1, imm12 | |
@custom3 | rd, rs1, imm12 | |
@custom3.rs1 | rd, rs1, imm12 | |
@custom3.rs1.rs2 | rd, rs1, imm12 | |
@custom3.rd | rd, rs1, imm12 | |
@custom3.rd.rs1 | rd, rs1, imm12 | |
@custom3.rd.rs1.rs2 | rd, rs1, imm12 |
csr
csr / csr-instructions
“Zicsr”, Control and Status Register (CSR) Instructions, Version 2.0 / CSR Instructions
Operation | Arguments | Description |
csrrw | rd, rs1, imm12 |
The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in the CSRs and integer registers
CSRRW reads the old value of the CSR, zero-extends the value to XLEN bits, then writes it to integer register rd
A CSRRW with rs1 = x0 will attempt to write zero to the destination CSR.
The assembler pseudoinstruction to write a CSR, CSRW csr, rs1 , is encoded as CSRRW x0, csr, rs1 , while CSRWI csr, uimm , is encoded as CSRRWI x0, csr, uimm .
|
csrrs | rd, rs1, imm12 |
The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd
For both CSRRS and CSRRC, if rs1 = x0 , then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs
Both CSRRS and CSRRC always read the addressed CSR and cause any read side effects regardless of rs1 and rd fields
The CSRRS and CSRRC instructions have same behavior so are shown as CSRR
The assembler pseudoinstruction to read a CSR, CSRR rd, csr , is encoded as CSRRS rd, csr, x0
|
csrrc | rd, rs1, imm12 |
The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the value of the CSR, zero-extends the value to XLEN bits, and writes it to integer register rd
|
csrrwi | rd, rs1, imm12 |
The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and CSRRC respectively, except they update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register
For CSRRWI, if rd = x0 , then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read
|
csrrsi | rd, rs1, imm12 |
For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write
Both CSRRSI and CSRRCI will always read the CSR and cause any read side effects regardless of rd and rs1 fields.
|
supervisor
sstatus | sec:satp |
supervisor / sstatus
Supervisor-Level ISA, Version 1.12 / Supervisor CSRs
Operation | Arguments | Description |
sret |
When an SRET instruction (see SectionÂ
When a trap is taken into supervisor mode, SPIE is set to SIE, and SIE is set to 0. When an SRET instruction is executed, SIE is set to SPIE, then SPIE is set to 1.
|
supervisor / sec:satp
Supervisor-Level ISA, Version 1.12 / Supervisor CSRs
Operation | Arguments | Description |
sfence.vma | rs1, rs2 |
If the new address spaceâs page tables have been modified, or if an ASID is reused, it may be necessary to execute an SFENCE.VMA instruction (see SectionÂ
|
hypervisor
sec:hinterruptregs | sec:tinst vals | hypervisor status register hstatus | wfi in virtual operating modes | sec:hgatp |
hypervisor / sec:hinterruptregs
Hypervisor Extension, Version 0.5 / Hypervisor and Virtual Supervisor CSRs
Operation | Arguments | Description |
or | rd, rs1, rs2 |
VS-level external interrupts are made pending based on the logical-OR of:
When hip is read with a CSR instruction, the value of the VSEIP bit returned in the rd destination register is the logical-OR of all the sources listed above
|
hypervisor / sec:tinst-vals
Hypervisor Extension, Version 0.5 / Traps
Operation | Arguments | Description |
sb | imm12hi, rs1, rs2, imm12lo |
For a standard store instruction that is not a compressed instruction and is one of SB, SH, SW, SD, FSW, FSD, or FSQ, the transformed instruction has the format shown in FigureÂ
Transformed noncompressed store instruction (SB, SH, SW, SD, FSW, FSD, or FSQ)
|
hypervisor / hypervisor-status-register-hstatus
Hypervisor Extension, Version 0.5 / Hypervisor and Virtual Supervisor CSRs
Operation | Arguments | Description |
mret |
An MRET or SRET instruction that changes the operating mode to U-mode, VS-mode, or VU-mode also sets SPRV=0.
|
hypervisor / wfi-in-virtual-operating-modes
Hypervisor Extension, Version 0.5 / WFI in Virtual Operating Modes
Operation | Arguments | Description |
wfi |
Executing instruction WFI when V=1 causes an illegal instruction exception, unless it completes within an implementation-specific, bounded time limit.
The behavior required of WFI in VS-mode and VU-mode is the same as required of it in U-mode when S-mode exists.
|
hypervisor / sec:hgatp
Hypervisor Extension, Version 0.5 / Hypervisor and Virtual Supervisor CSRs
Operation | Arguments | Description |
hfence.gvma | rs1, rs2 |
If the new virtual machineâs guest physical page tables have been modified, it may be necessary to execute an HFENCE.GVMA instruction (see SectionÂ
|