RISC-V Assembler: Branch Set

Published 19 Mar 2024 · Updated 04 Oct 2024

This RISC-V assembler post covers branch and set instructions, such as beq, bltu, bgez, and slt. RISC-V takes a different approach to branching, even compared to other RISC processors. We’ll also cover the zero register, program counter, condition codes, and multi-word addition. Branch and set instructions are included in RV32I, the base integer instruction set.

In the last few years, we’ve seen an explosion of RISC-V CPU designs on FPGA and ASIC, including the RP2350 found on the Raspberry Pi Pico 2. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.

Branch

Conditional branches control the flow of execution in a program. A conditional branch jumps to another program address if a condition is true. In high-level programming languages, this can take the form of a for-loop, if-then-else, or switch/case statement.

RISC-V is unusual because branch instructions include the comparison and branch target in one instruction. This makes branching simple and fast, but it has trade-offs, which we will consider later.

Let’s start by looking at the six regular branch instructions:

beq   # equal
bne   # not equal

blt   # less than
bgt   # greater than
ble   # less than or equal to
bge   # greater than or equal to

Branch instructions have a consistent format and always compare two registers:

branch rs1, rs2, imm

Where rs1 and rs2 are the registers to compare, and imm is the immediate offset to the program counter (discussed shortly).

These instructions use signed comparisons: a register with the contents 0xFFFFFFFF is treated as -1. We’ll cover unsigned comparisons in the next section.

Branch offsets are signed 12-bit immediates but in units of two bytes. RISC-V instructions are four bytes long, so why are offsets in units of two bytes? Compressed instructions (to be covered in a future post) are only two bytes long, so branch offsets need to be in units of two bytes.

Offsets are sign extended, so you can easily branch backwards in your code. With 12-bit offsets in units of two bytes, branch instructions have a range of ±4 KiB.

In practice, you never write an offset directly; you use a label instead.

For example, we can create a wait loop with bne:

    li  t0, 1000  # time to wait
.L_timer:  # local label
    lw  t1, TIMER_WAIT(t6)  # load hardware timer into t1
    bne t0, t1, .L_timer    # branch (loop) if t1 isn't equal to t0

ProTip: .L_name is a common naming convention for local labels.

Some of these branch instructions are pseudoinstructions, but this doesn’t matter to you as a programmer: they always assemble to one instruction. For example, bgt (greater than) assembles to blt (less than) with the source registers swapped. Use whichever branch instruction you prefer, and let the assembler worry about the underlying instruction.

Branch Unsigned

If your numbers are unsigned, add a “u” to the end of the instruction name:

bltu  # less than unsigned
bgtu  # greater than unsigned
bleu  # less than or equal to unsigned
bgeu  # greater than or equal to unsigned

Equal and not equal aren’t affected by sign, so there aren’t unsigned versions of them.

With unsigned comparison, a register with the contents 0xFFFFFFFF is treated as 4294967295.

Branch Zero

You often want to compare a register to zero, for example, to check for the end of a loop or null-terminated string. RISC-V provides a set of handy pseudoinstructions for this:

beqz  # equal to zero
bnez  # not equal to zero

bltz  # less than zero
bgtz  # greater than zero
blez  # less than or equal to zero
bgez  # greater than or equal to zero

They’re the same as the regular branch instructions with a “z” at the end of the instruction name.

With these instructions, you only specify a single register because the second register is x0:

beqz rs1, imm

For example, you could implement the absolute function with bgez:

abs:
    bgez a0, .L_abs_end  # branch to .L_abs_end if a0 is greater or equal to zero
    neg a0, a0  # make negative a0 value positive
.L_abs_end:
    ret  # return from function (a0 holds the return value)

We cover the neg instruction under subtraction and learn about calling functions in the next post.

The Power of Zero

RISC-V dedicates the zero (x0) register to the number zero. At first glance, this appears wasteful, but zero is used in many places and having it permanently available simplifies the instruction set. Other architectures, such as MIPS and ARM64, have a zero register, and mainframe computers, such as the CDC 6600 and IBM System/360 used a zero register in the 1960s!

As we’ve seen, many branch pseudoinstructions use the zero register, but you’ll find the zero register used across RISC-V.

Branch Instruction Summary

The following table summarises all 16 RISC-V branch (pseudo)instructions:

Comparison	Registers	Signed	Unsigned	Zero
equal (eq)	rs1 = rs2	beq	beq	beqz
not equal (ne)	rs1 ≠ rs2	bne	bne	bnez
less than (lt)	rs1 < rs2	blt	bltu	bltz
greater than (gt)	rs1 > rs2	bgt	bgtu	bgtz
less or equal (le)	rs1 ≤ rs2	ble	bleu	blez
greater or equal (ge)	rs1 ≥ rs2	bge	bgeu	bgez

NB. equal and not equal are the same for signed and unsigned comparisons.

Program Counter

Branch offsets are relative to the program counter (PC). The CPU uses the program counter to track its location in the code, for example when fetching the next instruction. Usually, the CPU adds four to the PC during instruction execution: addresses are in bytes, and each RISC-V instruction is four bytes long. When a branch is taken, the CPU updates the program counter to point to the branch target address instead.

Learn more about memory addresses, alignment, and addressing modes.

ProTip: In x86 land, the PC is known as the instruction pointer (IP), a much more descriptive name.

auipc

The program counter is also used to calculate the address of memory locations. RISC-V includes an instruction to help with position-independent code: auipc (add upper immediate to PC). auipc works just like lui (load upper immediate) but adds a 20-bit immediate value to the program counter.

auipc rd, imm  # rd = pc + imm << 12

With auipc, you can use PC-relative addressing to reach a symbol anywhere in 32-bit memory space. For example, combine auipc and load instructions to load data from a distant memory location or auipc and jalr to far call a function.

The program counter (pc) is not one of the general-purpose registers, so you can’t access it directly. However, you can copy the pc register using auipc with an immediate of zero:

auipc t0, 0  # copy program counter into register t0

Set

Earlier, we noted that RISC-V combines comparison and branching into a single instruction. However, sometimes you want to compare then do something other than branch.

Most CPUs set condition codes or status flags such as zero, carry, and overflow based on the result of arithmetic or a dedicated compare instruction. These condition codes can be used for branching, but also for arithmetic and general comparisons.

RISC-V doesn’t have condition codes, but the set instructions can handle many of the same situations, such as checking for zero, carry, or overflow. Set instructions compare two registers or a register to an immediate and write 1 to the destination register if true.

There are only four set instructions, all variants of set less than:

slt  rd, rs1, rs2  # set less than:                     rd = rs1 < rs2
sltu rd, rs1, rs2  # set less than unsigned:            rd = rs1 < rs2 (unsigned)

slti  rd, rs, imm  # set less than immediate:           rd = rs1 < imm
sltiu rd, rs, imm  # set less than immediate unsigned:  rd = rs1 < imm (unsigned)

These immediates are 12-bit sign-extended values that can represent -2048 to 2047 inclusive. See arithmetic sign extension for further details.

There aren’t standard pseudoinstructions for “set greater than”, which seems like an oversight. Recent versions of GCC do allow sgt, but this isn’t supported by other assemblers.

li   t0, 2       # t0 =  2
li   t1, -2      # t1 = -2
li   t2, 42      # t2 = 42

slt  t3, t0, t2  # t3 = 1 because 2 < 42
sltu t4, t0, t2  # t4 = 1 because 2 < 42

slt  t5, t1, t2  # t5 = 1 because -2 < 42
sltu t6, t1, t2  # t6 = 0 because 4294967294 > 42

Note how treating t1 as unsigned produces a completely different result! Negative numbers are stored using two’s complement, with -1 being 0xFFFFFFFF. Treating 0xFFFFFFFF as unsigned we get 2³²-1 or 4,294,967,295.

li   t0, 2        # t0 =  2
li   t1, -2       # t1 = -2

slti  t3, t0, -1  # t3 = 0 because 2 > -1
sltiu t4, t0, -1  # t4 = 1 because 2 < 4294967295

slti  t5, t1, -1  # t5 = 1 because -2 < -1
sltiu t6, t1, -1  # t6 = 1 because 4294967294 < 4294967295

Set Less Than or Equal To

For “less than or equal to” you need two instructions.

For example, to set if a0 is less than or equal to a1:

slt     t0, a1, a0
xori    t0, t0, 1

We check if a1 is less than a0, then invert the set bit with xori because:

(a0 <= a1) == !(a1 > a0)

Learn more about xor and xori (exclusive OR) in my post on logical instructions.

Set Zero

RISC-V provides pseudoinstructions for comparing with zero:

seqz rd, rs  # set equal zero:         rd = rs == 0
snez rd, rs  # set not equal zero:     rd = rs != 0
sltz rd, rs  # set less than zero:     rd = rs < 0
sgtz rd, rs  # set greater than zero:  rd = rs > 0

Examples of set zero comparisons:

li   t0, -2  # t0 =  -2

seqz t3, t0  # t3 = 0 because -2 != 0
snez t4, t0  # t4 = 1 because -2 != 0
sltz t5, t0  # t5 = 1 because -2 <  0
sgtz t6, t0  # t6 = 0 because  0 > -2

Multi-Word Addition

You can use a set instruction to carry out multi-word addition in place of a carry flag. For example, to add 64-bit integers on 32-bit RISC-V:

# 64-bit integer addition
#   arguments:
#       a0: x lower 32 bits
#       a1: x upper 32 bits
#       a2: y lower 32 bits
#       a3: y upper 32 bits
#   return:
#       a0: x+y lower 32 bits
#       a1: x+y upper 32 bits
#
add64:
    add  a0, a0, a2  # add lower 32 bits
    add  t0, a1, a3  # add upper 32 bits
    sltu t1, a0, a2  # if lower 32-bit sum < a2 then set t1=1 (carry bit)
    add  a1, t0, t1  # upper 32 bits of answer (upper sum + carry bit)
    ret

If the sum of the lower 32 bits is less than a2, then we need to carry a bit. The sltu instruction tests for this and sets t1=1 when it occurs. We add t1 to the sum of the upper 32 bits to create the correct answer.

32-bit CPUs with a carry flag can add 64 bits in just two instructions. For example:

# multi-word addition on arm
add64_arm:
    adds r0, r0, r2  # add and set flags (including carry)
    adc  r1, r1, r3  # add with carry
    bx   lr

At first glance, RISC-V’s approach seems inferior. However, avoiding condition codes simplifies hardware design, especially on modern out-of-order CPUs. RISC-V CPUs can also fuse multiple instructions into one internally, so the set instruction in multi-word add don’t necessarily increase the number of instructions executed.

CPU design is a trade-off. While a load-store architecture and zero register are almost universally admired, not everyone appreciates RISC-V’s lack of condition codes.

To learn more about RISC-V addition and subtraction, see the post on arithmetic.

What’s Next?

The next post looks at RISC-V Jump and Function Instructions, including the stack and RISC-V ABI. You can also check out the RISC-V Assembler Cheat Sheet.

References

RISC-V Technical Specifications (riscv.org)