RISC-V Assembler: Branch Set
This RISC-V assembler post covers branch and set instructions, such as beq, bltu, bgez, and slt. RISC-V takes a different approach to branching, even compared to other RISC processors. We’ll also cover the zero register, program counter, condition codes, and multi-word addition. Branch and set instructions are included in RV32I, the base integer instruction set.
In the last few years, we’ve seen an explosion of RISC-V CPU designs on FPGA and ASIC, including the RP2350 found on the Raspberry Pi Pico 2. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.
RISC-V Assembler: Arithmetic | Logical | Shift | Load and Store | Branch and Set | Jump and Function | Multiply and Divide | Assembler Cheat Sheet
Branch
Conditional branches control the flow of execution in a program. A conditional branch jumps to another program address if a condition is true. In high-level programming languages, this can take the form of a for-loop, if-then-else, or switch/case statement.
RISC-V is unusual because branch instructions include the comparison and branch target in one instruction. This makes branching simple and fast, but it has trade-offs, which we will consider later.
Let’s start by looking at the six regular branch instructions:
beq # equal
bne # not equal
blt # less than
bgt # greater than
ble # less than or equal to
bge # greater than or equal to
Branch instructions have a consistent format and always compare two registers:
branch rs1, rs2, imm
Where rs1 and rs2 are the registers to compare, and imm is the immediate offset to the program counter (discussed shortly).
These instructions use signed comparisons: a register with the contents 0xFFFFFFFF
is treated as -1
. We’ll cover unsigned comparisons in the next section.
Branch offsets are signed 12-bit immediates but in units of two bytes. RISC-V instructions are four bytes long, so why are offsets in units of two bytes? Compressed instructions (to be covered in a future post) are only two bytes long, so branch offsets need to be in units of two bytes.
Offsets are sign extended, so you can easily branch backwards in your code. With 12-bit offsets in units of two bytes, branch instructions have a range of ±4 KiB.
In practice, you never write an offset directly; you use a label instead.
For example, we can create a wait loop with bne:
li t0, 1000 # time to wait
.L_timer: # local label
lw t1, TIMER_WAIT(t6) # load hardware timer into t1
bne t0, t1, .L_timer # branch (loop) if t1 isn't equal to t0
ProTip: .L_name
is a common naming convention for local labels.
Some of these branch instructions are pseudoinstructions, but this doesn’t matter to you as a programmer: they always assemble to one instruction. For example, bgt (greater than) assembles to blt (less than) with the source registers swapped. Use whichever branch instruction you prefer, and let the assembler worry about the underlying instruction.
Branch Unsigned
If your numbers are unsigned, add a “u” to the end of the instruction name:
bltu # less than unsigned
bgtu # greater than unsigned
bleu # less than or equal to unsigned
bgeu # greater than or equal to unsigned
Equal and not equal aren’t affected by sign, so there aren’t unsigned versions of them.
With unsigned comparison, a register with the contents 0xFFFFFFFF
is treated as 4294967295
.
Branch Zero
You often want to compare a register to zero, for example, to check for the end of a loop or null-terminated string. RISC-V provides a set of handy pseudoinstructions for this:
beqz # equal to zero
bnez # not equal to zero
bltz # less than zero
bgtz # greater than zero
blez # less than or equal to zero
bgez # greater than or equal to zero
They’re the same as the regular branch instructions with a “z” at the end of the instruction name.
With these instructions, you only specify a single register because the second register is x0:
beqz rs1, imm
For example, you could implement the absolute function with bgez:
abs:
bgez a0, .L_abs_end # branch to .L_abs_end if a0 is greater or equal to zero
neg a0, a0 # make negative a0 value positive
.L_abs_end:
ret # return from function (a0 holds the return value)
We cover the neg instruction under subtraction and learn about calling functions in the next post.
The Power of Zero
RISC-V dedicates the zero (x0) register to the number zero. At first glance, this appears wasteful, but zero is used in many places and having it permanently available simplifies the instruction set. Other architectures, such as MIPS and ARM64, have a zero register, and mainframe computers, such as the CDC 6600 and IBM System/360 used a zero register in the 1960s!
As we’ve seen, many branch pseudoinstructions use the zero register, but you’ll find the zero register used across RISC-V.
Branch Instruction Summary
The following table summarises all 16 RISC-V branch (pseudo)instructions:
Comparison | Registers | Signed | Unsigned | Zero |
---|---|---|---|---|
equal (eq) | rs1 = rs2 | beq | beq | beqz |
not equal (ne) | rs1 ≠ rs2 | bne | bne | bnez |
less than (lt) | rs1 < rs2 | blt | bltu | bltz |
greater than (gt) | rs1 > rs2 | bgt | bgtu | bgtz |
less or equal (le) | rs1 ≤ rs2 | ble | bleu | blez |
greater or equal (ge) | rs1 ≥ rs2 | bge | bgeu | bgez |
NB. equal and not equal are the same for signed and unsigned comparisons.
Program Counter
Branch offsets are relative to the program counter (PC). The CPU uses the program counter to track its location in the code, for example when fetching the next instruction. Usually, the CPU adds four to the PC during instruction execution: addresses are in bytes, and each RISC-V instruction is four bytes long. When a branch is taken, the CPU updates the program counter to point to the branch target address instead.
Learn more about memory addresses, alignment, and addressing modes.
ProTip: In x86 land, the PC is known as the instruction pointer (IP), a much more descriptive name.
auipc
The program counter is also used to calculate the address of memory locations. RISC-V includes an instruction to help with position-independent code: auipc (add upper immediate to PC). auipc works just like lui (load upper immediate) but adds a 20-bit immediate value to the program counter.
auipc rd, imm # rd = pc + imm << 12
With auipc, you can use PC-relative addressing to reach a symbol anywhere in 32-bit memory space. For example, combine auipc and load instructions to load data from a distant memory location or auipc and jalr to far call a function.
The program counter (pc) is not one of the general-purpose registers, so you can’t access it directly. However, you can copy the pc register using auipc with an immediate of zero:
auipc t0, 0 # copy program counter into register t0
Set
Earlier, we noted that RISC-V combines comparison and branching into a single instruction. However, sometimes you want to compare then do something other than branch.
Most CPUs set condition codes or status flags such as zero, carry, and overflow based on the result of arithmetic or a dedicated compare instruction. These condition codes can be used for branching, but also for arithmetic and general comparisons.
RISC-V doesn’t have condition codes, but the set instructions can handle many of the same situations, such as checking for zero, carry, or overflow. Set instructions compare two registers or a register to an immediate and write 1 to the destination register if true.
There are only four set instructions, all variants of set less than:
slt rd, rs1, rs2 # set less than: rd = rs1 < rs2
sltu rd, rs1, rs2 # set less than unsigned: rd = rs1 < rs2 (unsigned)
slti rd, rs, imm # set less than immediate: rd = rs1 < imm
sltiu rd, rs, imm # set less than immediate unsigned: rd = rs1 < imm (unsigned)
These immediates are 12-bit sign-extended values that can represent -2048 to 2047 inclusive. See arithmetic sign extension for further details.
There aren’t standard pseudoinstructions for “set greater than”, which seems like an oversight. Recent versions of GCC do allow sgt, but this isn’t supported by other assemblers.
Register-register set examples:
li t0, 2 # t0 = 2
li t1, -2 # t1 = -2
li t2, 42 # t2 = 42
slt t3, t0, t2 # t3 = 1 because 2 < 42
sltu t4, t0, t2 # t4 = 1 because 2 < 42
slt t5, t1, t2 # t5 = 1 because -2 < 42
sltu t6, t1, t2 # t6 = 0 because 4294967294 > 42
Note how treating t1 as unsigned produces a completely different result! Negative numbers are stored using two’s complement, with -1 being 0xFFFFFFFF. Treating 0xFFFFFFFF as unsigned we get 232-1 or 4,294,967,295.
Register-immediate set examples:
li t0, 2 # t0 = 2
li t1, -2 # t1 = -2
slti t3, t0, -1 # t3 = 0 because 2 > -1
sltiu t4, t0, -1 # t4 = 1 because 2 < 4294967295
slti t5, t1, -1 # t5 = 1 because -2 < -1
sltiu t6, t1, -1 # t6 = 1 because 4294967294 < 4294967295
Set Less Than or Equal To
For “less than or equal to” you need two instructions.
For example, to set if a0 is less than or equal to a1:
slt t0, a1, a0
xori t0, t0, 1
We check if a1 is less than a0, then invert the set bit with xori
because:
(a0 <= a1) == !(a1 > a0)
Learn more about xor and xori (exclusive OR) in my post on logical instructions.
Set Zero
RISC-V provides pseudoinstructions for comparing with zero:
seqz rd, rs # set equal zero: rd = rs == 0
snez rd, rs # set not equal zero: rd = rs != 0
sltz rd, rs # set less than zero: rd = rs < 0
sgtz rd, rs # set greater than zero: rd = rs > 0
Examples of set zero comparisons:
li t0, -2 # t0 = -2
seqz t3, t0 # t3 = 0 because -2 != 0
snez t4, t0 # t4 = 1 because -2 != 0
sltz t5, t0 # t5 = 1 because -2 < 0
sgtz t6, t0 # t6 = 0 because 0 > -2
Multi-Word Addition
You can use a set instruction to carry out multi-word addition in place of a carry flag. For example, to add 64-bit integers on 32-bit RISC-V:
# 64-bit integer addition
# arguments:
# a0: x lower 32 bits
# a1: x upper 32 bits
# a2: y lower 32 bits
# a3: y upper 32 bits
# return:
# a0: x+y lower 32 bits
# a1: x+y upper 32 bits
#
add64:
add a0, a0, a2 # add lower 32 bits
add t0, a1, a3 # add upper 32 bits
sltu t1, a0, a2 # if lower 32-bit sum < a2 then set t1=1 (carry bit)
add a1, t0, t1 # upper 32 bits of answer (upper sum + carry bit)
ret
If the sum of the lower 32 bits is less than a2, then we need to carry a bit. The sltu instruction tests for this and sets t1=1
when it occurs. We add t1
to the sum of the upper 32 bits to create the correct answer.
32-bit CPUs with a carry flag can add 64 bits in just two instructions. For example:
# multi-word addition on arm
add64_arm:
adds r0, r0, r2 # add and set flags (including carry)
adc r1, r1, r3 # add with carry
bx lr
At first glance, RISC-V’s approach seems inferior. However, avoiding condition codes simplifies hardware design, especially on modern out-of-order CPUs. RISC-V CPUs can also fuse multiple instructions into one internally, so the set instruction in multi-word add don’t necessarily increase the number of instructions executed.
CPU design is a trade-off. While a load-store architecture and zero register are almost universally admired, not everyone appreciates RISC-V’s lack of condition codes.
To learn more about RISC-V addition and subtraction, see the post on arithmetic.
What’s Next?
The next post looks at RISC-V Jump and Function Instructions, including the stack and RISC-V ABI.
Check out the RISC-V Assembler Cheat Sheet and all my FPGA & RISC-V Tutorials.
Share your thoughts with me on Mastodon or X. If you enjoy my work, please sponsor me. Sponsors help me create new projects for everyone, and they get early access to blog posts and source code. 🙏
References
- RISC-V Technical Specifications (riscv.org)