RISC-V Assembler: Multiply Divide
The base RISC-V instruction set includes integer add, subtract, and logical operations. Integer multiply and divide instructions form the optional M extension. RISC-V extensions allow the customisation of a CPU design, from tiny microcontrollers to powerful server chips. Making multiplication and division optional keeps the base instruction set simple and reduces the size of the smallest RISC-V core. This post includes a brief overview of common RISC-V extensions.
In the last few years, we’ve seen an explosion of RISC-V CPU designs on FPGA and ASIC, including the RP2350 found on the Raspberry Pi Pico 2. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.
RISC-V Assembler: Arithmetic | Logical | Shift | Load and Store | Branch and Set | Jump and Function | Multiply and Divide | Assembler Cheat Sheet
Multiply
When you multiply two 32-bit integers, you get a 64-bit product.
The mul instruction calculates the lower 32 bits of the product:
mul rd, rs1, rs2 # rd = rs1 * rs2 (lower 32 bits)
You can use mul for signed and unsigned numbers, just as you would with add and sub.
li t0, 2 # t0 = 2
li t1, 46 # t1 = 46
li t2, 10 # t2 = 10
mul t3, t0, t0 # t3 = 2 * 2 = 4
mul t4, t0, t1 # t4 = 2 * 46 = 92
mul t4, t4, t2 # t4 = 92 * 10 = 920 ; t4 is a source and the destination
ProTip: You can use shift instructions to multiply and divide by powers of two.
Sign Up
Often, you only care about the lower 32 bits of the product, so mul is enough. If you need the full 64-bit product, you need to know the sign of your operands.
There are three possible combinations and three multiply-high instructions:
- mulh - signed × signed
- mulhu - unsigned × unsigned
- mulhsu - signed × unsigned
All three instructions have the same form:
mulh rd, rs1, rs2 # rd = rs1 * rs2 (upper 32 bits)
It might seem unnecessary to have instruction for signed × unsigned, but mulhsu improves the performance of multi-word multiplication.
Example function to calculate the full 64-bit product of signed integers:
# 32-bit signed integer multiplication returning 64-bit product
# arguments:
# a0: x
# a1: y
# return:
# a0: x*y lower 32 bits
# a1: x*y upper 32 bits
#
mul_signed_full:
mulh t0, a1, a0
mul a0, a1, a0
mv a1, t0
ret
Divide
Division is straightforward. The div instruction performs signed integer division, rounding towards zero, while rem calculates the remainder.
div rd, rs1, rs2 # rd = rs1 / rs2
rem rd, rs1, rs2 # rd = rs1 % rs2
divu and remu work the same way, but treat the operands as unsigned.
divu rd, rs1, rs2 # rd = rs1 / rs2 (unsigned)
remu rd, rs1, rs2 # rd = rs1 % rs2 (unsigned)
li t0, 2 # t0 = 2
li t1, 46 # t1 = 46
li t2, 10 # t2 = 10
div t3, t1, t0 # t3 = 46 / 2 = 23
div t4, t1, t2 # t4 = 46 / 10 = 4 ; rounds towards zero
rem t5, t1, t2 # t5 = 46 % 10 = 6
If you want the divisor and the remainder, then it can be faster to use mul
to calculate the remainder. It depends on the speed of the division and whether the CPU fuses the div and rem instructions.
Divide by Zero
RISC-V doesn’t raise an exception on divide by zero. The result of dividing by zero is all 1s, 0xFFFFFFFF
in hexadecimal. For unsigned numbers, this is the largest integer; for signed numbers, this is -1.
Use beqz if you need to catch divide by zero; see branch zero.
beqz t2, div_by_zero
div t0, t1, t2
# continue normal execution
div_by_zero:
# handle exception here
FPGA Support
FPGAs include DSP blocks that perform low-latency multiplication, and synthesis tools can infer multiplication; see Multiplication with DSPs. Integer division requires its own implementation; see Division in Verilog.
RISC-V Extensions
The 32-bit base instruction set RV32I contains 40 instructions, most of which we’ve met in previous posts. RISC-V extensions add additional functionality to the base instruction set. Things started simply with these “classic” extensions:
- M - multiplication and division (covered in this post)
- A - atomic
- F - single-precision floating point
- D - double-precision floating point
- C - compressed
For example, a 32-bit CPU with M and C extensions is described as RV32IMC.
When developing hardware on FPGA, you can choose the CPU and extensions you want. For example, PicoRV32 optionally supports M and C extensions, while VexRiscv optionally supports M, A, F, D, and C extensions.
Complexity Intensifies?
As single letters became scarce, new general-purpose extensions started using the Z prefix.
Fence and CSR instructions were originally in the base instruction set but have been moved to:
- Zicsr - control and status registers (CSR)
- Zifencei - fence
Other general-purpose extensions include the four bit-manipulation extensions: Zba, Zbb, Zbc, and Zbs, and three extensions for floating point using integer registers: Zfinx, Zdinx, and Zhinx.
Time will tell how well the RISC-V ecosystem evolves, but I fear the simplicity and elegance of the original RISC-V approach will be buried under a mountain of extensions. Will compilers and library developers have to grapple with thousands of possible extension combinations?
What’s Next?
The next post is the RISC-V Assembler Cheat Sheet for a summary of 32-bit instructions. Or check out all my FPGA & RISC-V Tutorials.
Share your thoughts with me on Mastodon or X. If you enjoy my work, please sponsor me. Sponsors help me create new projects for everyone, and they get early access to blog posts and source code. 🙏
References
- RISC-V Technical Specifications (riscv.org)