RISC-V Assembler: Multiply Divide

Published 17 May 2024 · Updated 04 Oct 2024

The base RISC-V instruction set includes integer add, subtract, and logical operations. Integer multiply and divide instructions form the optional M extension. RISC-V extensions allow the customisation of a CPU design, from tiny microcontrollers to powerful server chips. Making multiplication and division optional keeps the base instruction set simple and reduces the size of the smallest RISC-V core. This post includes a brief overview of common RISC-V extensions.

In the last few years, we’ve seen an explosion of RISC-V CPU designs on FPGA and ASIC, including the RP2350 found on the Raspberry Pi Pico 2. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.

Multiply

When you multiply two 32-bit integers, you get a 64-bit product.

The mul instruction calculates the lower 32 bits of the product:

mul rd, rs1, rs2  # rd = rs1 * rs2 (lower 32 bits)

You can use mul for signed and unsigned numbers, just as you would with add and sub.

li  t0, 2       # t0 =  2
li  t1, 46      # t1 = 46
li  t2, 10      # t2 = 10

mul t3, t0, t0  # t3 =  2 *  2 =   4
mul t4, t0, t1  # t4 =  2 * 46 =  92
mul t4, t4, t2  # t4 = 92 * 10 = 920  ; t4 is a source and the destination

ProTip: You can use shift instructions to multiply and divide by powers of two.

Often, you only care about the lower 32 bits of the product, so mul is enough. If you need the full 64-bit product, you need to know the sign of your operands.

There are three possible combinations and three multiply-high instructions:

mulh - signed × signed
mulhu - unsigned × unsigned
mulhsu - signed × unsigned

All three instructions have the same form:

mulh rd, rs1, rs2  # rd = rs1 * rs2 (upper 32 bits)

It might seem unnecessary to have instruction for signed × unsigned, but mulhsu improves the performance of multi-word multiplication.

Example function to calculate the full 64-bit product of signed integers:

# 32-bit signed integer multiplication returning 64-bit product
#   arguments:
#       a0: x
#       a1: y
#   return:
#       a0: x*y lower 32 bits
#       a1: x*y upper 32 bits
#
mul_signed_full:
    mulh    t0, a1, a0
    mul     a0, a1, a0
    mv      a1, t0
    ret

Divide

Division is straightforward. The div instruction performs signed integer division, rounding towards zero, while rem calculates the remainder.

div rd, rs1, rs2  # rd = rs1 / rs2
rem rd, rs1, rs2  # rd = rs1 % rs2

divu and remu work the same way, but treat the operands as unsigned.

divu rd, rs1, rs2  # rd = rs1 / rs2 (unsigned)
remu rd, rs1, rs2  # rd = rs1 % rs2 (unsigned)

li  t0, 2       # t0 =  2
li  t1, 46      # t1 = 46
li  t2, 10      # t2 = 10

div t3, t1, t0  # t3 = 46 /  2 = 23
div t4, t1, t2  # t4 = 46 / 10 =  4  ; rounds towards zero
rem t5, t1, t2  # t5 = 46 % 10 =  6

If you want the divisor and the remainder, then it can be faster to use mul to calculate the remainder. It depends on the speed of the division and whether the CPU fuses the div and rem instructions.

Divide by Zero

RISC-V doesn’t raise an exception on divide by zero. The result of dividing by zero is all 1s, 0xFFFFFFFF in hexadecimal. For unsigned numbers, this is the largest integer; for signed numbers, this is -1.

Use beqz if you need to catch divide by zero; see branch zero.

    beqz t2, div_by_zero
    div t0, t1, t2
    # continue normal execution

div_by_zero:
    # handle exception here

FPGA Support

FPGAs include DSP blocks that perform low-latency multiplication, and synthesis tools can infer multiplication; see Multiplication with DSPs. Integer division requires its own implementation; see Division in Verilog.

RISC-V Extensions

The 32-bit base instruction set RV32I contains 40 instructions, most of which we’ve met in previous posts. RISC-V extensions add additional functionality to the base instruction set. Things started simply with these “classic” extensions:

M - multiplication and division (covered in this post)
A - atomic
F - single-precision floating point
D - double-precision floating point
C - compressed

For example, a 32-bit CPU with M and C extensions is described as RV32IMC.

When developing hardware on FPGA, you can choose the CPU and extensions you want. For example, PicoRV32 optionally supports M and C extensions, while VexRiscv optionally supports M, A, F, D, and C extensions.

Complexity Intensifies?

As single letters became scarce, new general-purpose extensions started using the Z prefix.

Fence and CSR instructions were originally in the base instruction set but have been moved to:

Zicsr - control and status registers (CSR)
Zifencei - fence

Other general-purpose extensions include the four bit-manipulation extensions: Zba, Zbb, Zbc, and Zbs, and three extensions for floating point using integer registers: Zfinx, Zdinx, and Zhinx.

Time will tell how well the RISC-V ecosystem evolves, but I fear the simplicity and elegance of the original RISC-V approach will be buried under a mountain of extensions. Will compilers and library developers have to grapple with thousands of possible extension combinations?

What’s Next?

The next post is the RISC-V Assembler Cheat Sheet for a summary of 32-bit instructions.

References

RISC-V Technical Specifications (riscv.org)