Project F

RISC-V Assembler: Branch Set

Published · Updated

This post looks at RISC-V branch and set instructions, such as beq, bltu, bgez, and slt. RISC-V takes a different approach to branching, even compared to other RISC processors. We’ll also cover the zero register, program counter, condition codes, and multi-word addition. Branch and set instructions are included in RV32I, the base integer instruction set. New to the assembler series? Check out the first part on RISC-V arithmetic instructions.

In the last few years, we’ve seen an explosion of RISC-V CPU designs, especially on FPGA. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions (RV32) and the RISC-V ABI.

Share your thoughts with @WillFlux on Mastodon or Twitter. If you like what I do, sponsor me. 🙏

Branch

Conditional branches control the flow of execution in a program. A conditional branch jumps to another program address if a condition is true. In high-level programming languages, this can take the form of a for loop, if-then-else, or switch/case statement.

RISC-V is unusual because branch instructions include the comparison and branch destination in one instruction. This makes branching simple, but it has trade-offs, which we will consider later.

Let’s start by looking at the six regular branch instructions:

beq   # equal
bne   # not equal

blt   # less than
bgt   # greater than
ble   # less than or equal to
bge   # greater than or equal to

Branch instructions have a consistent format and always compare two registers:

branch rs1, rs2, offset

Where rs1 and rs2 are the registers to compare, and offset is the program counter offset. The program counter (PC) points to the next instruction to execute. We’ll discuss the PC in more detail later in this post.

These instructions use signed comparisons: a register with the contents 0xFFFFFFFF is treated as -1. We’ll cover unsigned comparisons in the next section.

Branch offsets are signed 12-bit immediates but in units of two bytes. RISC-V instructions are four bytes long, so why are offsets in units of two bytes? Compressed instructions are only two bytes long, so branch offsets need to be in units of two bytes.

Offsets are sign extended, so you can easily branch backwards in your code. With 12-bit offsets in units of two bytes, branch instructions have a range of ±4 KiB.

In practice, you never write an offset directly; you use a label instead.

For example, we can create a wait loop with bne:

    li  t0, 1000  # time to wait
.L_timer:  # local label
    lw  t1, TIMER_WAIT(t6)  # load hardware timer into t1
    bne t0, t1, .L_timer    # branch (loop) if t1 isn't equal to t0

ProTip: .L_name is a common naming convention for local labels.

Some of these branch instructions are pseudoinstructions, but as a programmer, this doesn’t matter: they always assemble to one instruction. For example, bgt (greater than) assembles to blt (less than) with the operands swapped. Use whichever branch instruction you prefer, and let the assembler worry about the underlying instruction.

Unsigned Branching

If your numbers are unsigned, you add a “u” to the end of the instruction name:

bltu  # less than unsigned
bgtu  # greater than unsigned
bleu  # less than or equal to unsigned
bgeu  # greater than or equal to unsigned

Equal and not equal aren’t affected by sign, so there aren’t unsigned versions of them.

With unsigned comparison, a register with the contents 0xFFFFFFFF is treated as 4294967295.

Branching with Zero

You often want to compare a register to zero, for example, to check for the end of a loop or null-terminated string. RISC-V provides a set of handy pseudoinstructions for this:

beqz  # equal to zero
bnez  # not equal to zero

bltz  # less than zero
bgtz  # greater than zero
blez  # less than or equal to zero
bgez  # greater than or equal to zero

They’re the same as the regular branch instructions with a “z” at the end of the instruction name.

With these instructions, you only specify a single register because the second register is x0:

beqz rs1, offset

For example, you could implement the absolute function with bgez:

abs:
    bgez a0, .L_abs_end  # branch to .L_abs_end if a0 is greater or equal to zero
    neg a0, a0  # make negative a0 value positive
.L_abs_end:
    ret  # return from function (a0 holds the return value)

We cover the neg instruction under subtraction. In the next post, we’ll examine functions in detail.

The Power of Zero

RISC-V dedicates the register x0 to zero. At first glance, this appears wasteful, but zero is used in many places and having it permanently available simplifies the instruction set. Other architectures, such as MIPS and ARM64, have a zero register, and mainframe computers, such as the CDC 6600 and IBM System/360 used a zero register in the 1960s!

As we’ve seen, many branch pseudoinstructions use the zero register, but you’ll find the zero register used across RISC-V.

Branch Instruction Summary

The following table summarises all 16 RISC-V branch (pseudo)instructions:

ComparisonRegistersSignedUnsignedZero
equal (eq)rs1 = rs2beqbeqbeqz
not equal (ne)rs1 ≠ rs2bnebnebnez
less (lt)rs1 < rs2bltbltubltz
greater (gt)rs1 > rs2bgtbgtubgtz
less or equal (le)rs1 ≤ rs2blebleublez
greater or equal (ge)rs1 ≥ rs2bgebgeubgez

NB. equal and not equal are the same for signed and unsigned comparisons.

Program Counter

Branch offsets are relative to the program counter (PC). The program counter points to the next instruction the CPU will execute. Usually, the CPU adds 4 to the PC when executing an instruction: addresses are in bytes, and each instruction is 4 bytes long. When you take a branch, the CPU updates the program counter to point to the branch offset instead.

Learn more about memory addresses, alignment, and addressing modes.

ProTip: In x86 land, the PC is known as the instruction pointer (IP), a much more descriptive name.

auipc

RISC-V includes an instruction to help with position independent code: auipc (add upper immediate to PC). auipc works just like lui (load upper immediate) but adds a 20-bit immediate value to the program counter.

auipc rd, imm  # rd = pc + imm << 12

With auipc, you can use PC-relative addressing to reach a symbol anywhere in 32-bit memory space. For example, combine auipc and load instructions to load data from a distant memory location.

The program counter is not one of the general-purpose registers, so you can’t access it directly. However, you can copy the PC using auipc with an immediate of zero:

auipc t0, 0  # copy program counter into register t0

Set

Earlier, we noted that RISC-V handles comparison and branching in a single instruction. This worked well for branching, but you don’t always want to compare then branch.

Most CPUs use condition codes or status flags such as zero, carry, and overflow. These condition codes can be used for branching, but also for arithmetic and general comparisons.

RISC-V doesn’t have condition codes, but the set instructions can handle many of the same situations, such as checking for zero, carry, or overflow. The set instructions compare two registers or a register to an immediate and write 1 to the destination register if true.

There are only four set instructions, all variants of set less than:

slt  rd, rs1, rs2  # set less than:                     rd = rs1 < rs2
sltu rd, rs1, rs2  # set less than unsigned:            rd = rs1 < rs2 (unsigned)

slti  rd, rs, imm  # set less than immediate:           rd = rs1 < imm
sltiu rd, rs, imm  # set less than immediate unsigned:  rd = rs1 < imm (unsigned)

These immediates are 12-bit sign-extended values that can represent -2048 to 2047 inclusive. See arithmetic sign extension for further details.

There aren’t standard pseudoinstructions for “set greater than”, which seems like an oversight. Recent versions of GCC do allow sgt, but this isn’t supported by other assemblers.

Register-register set examples:

li   t0, 2       # t0 =  2
li   t1, -2      # t1 = -2
li   t2, 42      # t2 = 42

slt  t3, t0, t2  # t3 = 1 because 2 < 42
sltu t4, t0, t2  # t4 = 1 because 2 < 42

slt  t5, t1, t2  # t5 = 1 because -2 < 42
sltu t6, t1, t2  # t6 = 0 because 4294967294 > 42

Note how treating t1 as unsigned produces a completely different result! Negative numbers are stored using two’s complement, with -1 being 0xFFFFFFFF. Treating 0xFFFFFFFF as unsigned we get 232-1 or 4,294,967,295.

Register-immediate set examples:

li   t0, 2        # t0 =  2
li   t1, -2       # t1 = -2

slti  t3, t0, -1  # t3 = 0 because 2 > -1
sltiu t4, t0, -1  # t4 = 1 because 2 < 4294967295

slti  t5, t1, -1  # t5 = 1 because -2 < -1
sltiu t6, t1, -1  # t6 = 1 because 4294967294 < 4294967295

Set Less Than or Equal To

For “less than or equal to” you need two instructions.

For example, to set if a0 is less than or equal to a1:

slt     t0, a1, a0
xori    t0, t0, 1

We check if a1 is less than a0, then invert the set bit with xori because:

(a0 <= a1) == !(a1 > a0)

Learn more about xor and xori (exclusive OR) in my post on logical instructions.

Set Zero

RISC-V provides pseudoinstructions for comparing with zero:

seqz rd, rs  # set equal zero:         rd = rs == 0
snez rd, rs  # set not equal zero:     rd = rs != 0
sltz rd, rs  # set less than zero:     rd = rs < 0
sgtz rd, rs  # set greater than zero:  rd = rs > 0

Examples of set zero comparisons:

li   t0, -2  # t0 =  -2

seqz t3, t0  # t3 = 0 because -2 != 0
snez t4, t0  # t4 = 1 because -2 != 0
sltz t5, t0  # t5 = 1 because -2 <  0
sgtz t6, t0  # t6 = 0 because  0 > -2

Multi-Word Addition

You can use a set instruction to carry out multi-word addition in place of a carry flag. For example, to add 64-bit integers on 32-bit RISC-V:

# 64-bit integer addition
#   arguments:
#       a0: x lower 32 bits
#       a1: x upper 32 bits
#       a2: y lower 32 bits
#       a3: y upper 32 bits
#   return:
#       a0: x+y lower 32 bits
#       a1: x+y upper 32 bits
#
add64:
    add  a0, a0, a2  # add lower 32 bits
    add  t0, a1, a3  # add upper 32 bits
    sltu t1, a0, a2  # if lower 32-bit sum < a2 then set t1=1 (carry bit)
    add  a1, t0, t1  # upper 32 bits of answer (upper sum + carry bit)
    ret

If the sum of the lower 32 bits is less than a2, then we need to carry a bit. The sltu instruction tests for this and sets t1=1 when it occurs. We add t1 to the sum of the upper 32 bits to create the correct answer.

32-bit CPUs with a carry flag can add 64 bits in just two instructions. For example:

# multi-word addition on arm
add64_arm:
    adds r0, r0, r2  # add and set flags (including carry)
    adc  r1, r1, r3  # add with carry
    bx   lr

At first glance, RISC-V’s approach seems inferior. However, avoiding condition codes simplifies hardware design, especially on modern out-of-order CPUs. RISC-V CPUs can also fuse multiple instructions into one internally, so the set instruction in multi-word add don’t necessarily increase the number of instructions executed.

CPU design is a trade-off. While a load-store architecture and zero register are almost universally admired, not everyone appreciates RISC-V’s lack of condition codes.

To learn more about RISC-V addition and subtraction, see the post on arithmetic.

What’s Next?

We’ve almost completed our tour of the RISC-V base instruction set. The next instalment of RISC-V Assembler is all about jumping, functions, and the ABI (coming soon). In the meantime, read my posts on RISC-V Arithmetic, Logical, Shift, and Load Store.

Check out my FPGA & RISC-V Tutorials and my series on early Macintosh History.

References