RISC-V Assembler: Jump and Function
This RISC-V assembler post begins by examining the RISC-V jump instructions: jal and jalr. Jump instructions are the basis of functions, so we’ll then dig into function calls, the RISC-V ABI, calling convention, and how to use the stack. Jump instructions are included in RV32I, the base integer instruction set. You can also check out my other RISC-V posts.
In the last few years, we’ve seen an explosion of RISC-V CPU designs, especially on FPGA. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.
Share your thoughts with @WillFlux on Mastodon or X. If you like what I do, sponsor me. 🙏
Jump
The main operation of the jump instructions is to update the program counter (PC).
The PC keeps track of the CPU’s location in the code. Usually, the CPU adds 4 to the PC when executing an instruction, as each instruction is 4 bytes long. However, with a jump instruction, the CPU updates the PC to point at the jump target instead.
The jump instructions are unconditional. For conditional jumps, see my post on branching.
Just the Two of Us
At one level, this post is about two instructions: jal (jump and link) and jalr (jump and link register). But these two instructions are exciting because they enable functions (also known as subroutines or procedure calls).
jal rd, imm # rd = pc+4; pc += imm
jalr rd, rs1, imm # rd = pc+4; pc = rs1+imm
Before updating the PC, a jump instruction writes the address of the following instruction into a register. By saving this return address, we can return to it and continue execution where we left off.
jal uses a 20-bit signed immediate for the jump destination, while jalr uses a register plus 12-bit signed offset in a similar way to the load and store instructions.
jal range is ±1MiB
in units of two bytes for greater range with compressed instruction support.
jalr range is ±2KiB
, but you can combine jalr with lui or auipc to reach any 32-bit address.
Jump!
Before we tackle functions (subroutines), let’s consider a plain old jump:
j imm # pc += imm
For example, use j in an infinite loop:
.L_forever:
la a0, message # load address of message
call printstr # call a function (discussed below)
j .L_forever # jump to .L_forever label
You could also use the j instruction in case/switch code.
The assembler translates j into jal with the return address register set to zero (x0).
Use j for your unconditional jumps in preference to branches. Jumps make your intent clear, have greater range, and avoid branch prediction.
Functions
To call a function, we must jump to a new address while remembering where we came from. The jal instruction can do this. We need to choose where to save the return address. By convention, this is the x1 register, known in the ABI as the return address register or ra.
The use of ra for the return address is part of the standard RISC-V ABI (application binary interface). The ABI ensures programs written by different programmers and with different tools can interoperate. For example, the ABI allows a program written in C to call a function written in assembler.
How do we get back once we’ve finished executing our function? We have the return address in ra, so jalr (jump and link register) can take us back.
Let’s take a look at a trivial example, calling a function that adds two integers:
li a0, 7 # 1st argument in a0
li a1, 8 # 2nd argument in a1
jal ra, add_int # save address in register ra (x1) and jump to label add_int
ebreak # stop execution
add_int:
add a0, a0, a1 # a0 = a0 + a1
jalr zero, 0(ra) # jump to address in register ra with 0 offset
Does this seem unnecessarily fiddly? If we always use the ra register for the return address, why do we need to provide it? Plus, it’s not immediately obvious what the purpose of these jump instructions is.
Pseudoinstructions call and ret to the rescue!
call label # call function at 'label', saving return address in ra
ret # return from function using address in ra
This makes for simpler and clearer code:
li a0, 7 # 1st argument in a0
li a1, 8 # 2nd argument in a1
call add_int # call function
ebreak # stop execution
add_int:
add a0, a0, a1 # a0 = a0 + a1
ret # return from function
Far Calls
In the above example we used jal to call our function, but it’s limited to ±1MiB
relative to the PC. For far calls we can combine jalr with auipc to reach anywhere in 32-bit memory space. Use the call pseudoinstruction and the assembler will choose the correct instruction(s) for you.
In and Out
Most functions take arguments and return something: this is where the a0-a7 registers come in.
Before calling a function, you put the first argument in a0, the 2nd argument in a1, etc. When it comes time to return our result, we put it in a0. Just like the convention of using ra for the return address, this ensures different code can easily work together.
We have already seen an example of arguments and return values with add_int
(above).
Functions Calling Functions
Functions that don’t call other functions are known as leaf functions. After a leaf function executes it uses the return address in ra to return.
However, functions can easily call other functions and it’s here things get interesting. When we call a function, the call instruction writes the return address into ra, overwriting the previous return address!
A function that calls a function must save its own return address before making the function call. We save the existing value of ra on the stack.
Stack
The stack is an area of memory set aside for use by functions and local variables (not discussed here).
The stack pointer (sp) points to the bottom of the stack, which grows downwards to lower addresses. When the CPU is reset, the stack pointer is typically set to the very top of memory.
We allocate memory on the stack by decrementing the stack pointer sp. We can then save registers onto the stack using the sw (store word) instruction. See Load Store for coverage of RISC-V memory instructions.
For example, function fun_one
calls function fun_two
, so it must save its return address on the stack:
fun_one:
addi sp, sp, -16 # allocate 16 bytes on stack
sw ra, 12(sp) # store return address on stack
# do some fun stuff
call fun_two # call another function
# do some more fun stuff
lw ra, 12(sp) # load return address from stack
addi sp, sp, 16 # restore stack pointer
ret # return from fun_one
Note how we store and then later load the return address from the same offset (12
) to the stack pointer.
Stack Alignment
Why did we allocate 16 bytes on the stack when our return address is only 4 bytes long?
The RISC-V calling convention says:
The stack grows downwards (towards lower addresses) and the stack pointer shall be aligned to a 128-bit boundary upon procedure entry.
This is another example of the ABI ensuring interoperability. We ensure all data types are correctly aligned by aligning the stack pointer to 16 bytes.
Ignorance is Bliss
A function doesn’t know what happened before it was called or what will happen after it returns. A function caller doesn’t know what happens inside a function, just what it passes in and gets returned. A well-written function is an example of a black box.
RISC-V gives us 32 general-purpose registers, but if every function used them indiscriminately, they’d overwrite each other’s data. We can solve this problem by pushing existing registers onto the stack before calling the function and popping them off the stack after the function returns. However, pushing registers onto the stack makes functions slower. A simple function could spend more CPU cycles pushing and popping register from the stack than doing useful work.
The RISC-V ABI lets us have fast functions while preserving some register values.
There are three main categories of general-purpose registers:
- saved registers: s0-s11 - keep their value across function calls (preserved)
- argument registers: a0-a7 - for passing arguments and the return value (not preserved)
- temporary registers: t0-t6 - for internal function use (not preserved)
Understanding how to handle preserved and non-preserved registers is critical to writing RISC-V assembler. I’d go so far as to say its the most important skill beyond a basic knowledge of the instructions. Getting it right results in fast, elegant code. Getting it wrong leads to subtle bugs and much frustration. A good start is to always use ABI names for registers, otherwise it’s really difficult to remember which registers you need to save!
- A function using a preserved register must restore its original value before returning.
- A function using a non-preserved register must assume it’s changed by a function call.
Let’s look at both cases in a little more detail.
Preserved Registers
Preserved registers must be restored to their original value before returning from a function call. If your function uses preserved registers, such as s0-s11, save them on the stack before using them.
For example, fun_foo
uses s1-s4, it saves them on the stack like this:
fun_foo:
addi sp, sp, -16 # allocate space on stack
sw s1, 12(sp) # store saved registers on stack
sw s2, 8(sp)
sw s3, 4(sp)
sw s4, 0(sp)
# we're now free to use s1-s4
# implement incredible algorithm here
lw s1, 12(sp) # restore saved registers from stack
lw s2, 8(sp)
lw s3, 4(sp)
lw s4, 0(sp)
addi sp, sp, 16 # restore stack pointer
ret
Other Registers
With non-preserved registers, you can do what you want, but so can other functions. After you call another function, you must assume the values of the a and t registers have changed.
For example, I’ve written a function to initialize my graphics display. The background colour is passed to gfx_setup
in a0. However, before I set the background colour I need to call frame_wait
.
The frame_wait
function could overwrite a0, so I preserve it on the stack. Of course, I also need to save ra on the stack before calling another function, leading to this design:
gfx_setup:
addi sp, sp, -16 # allocate space on stack
sw ra, 12(sp) # save return address onto stack
sw a0, 8(sp) # save a0 (background colour)
call frame_wait # wait for blanking before graphics setup
li t6, GFX_HWREG # graphics engine address
# background colour
lw a0, 8(sp) # load background colour (a0) from stack
sw a0, DISP_BGRD(t6) # set background colour
# other graphics setup here...
lw ra, 12(sp) # load return address from stack
addi sp, sp, 16 # restore stack pointer
ret
Functions that don’t call other functions (leaf functions), don’t have to worry about non-preserved registers changing. When writing leaf functions, stick to t and a registers, then you don’t have to save anything to the stack: simple and fast.
Many Arguments
In the rare event your function needs more than eight arguments, you can pass them on the stack.
The RISC-V calling convention says:
The first argument passed on the stack is located at offset zero of the stack pointer on function entry; following arguments are stored at correspondingly higher addresses.
For example, a function with 10 arguments receives the first eight arguments in a0-a7 and could handle the remaining arguments like this:
fun_ten:
lw t0, 0(sp) # load 9th argument off stack into t0
lw t1, 4(sp) # load 10th argument off stack into t1
# the first 8 arguments are in a0-a7
NB. We use an offset of 4 for the 10th argument because we’re loading a word (four bytes).
64-bit Variables
RV32 is a 32-bit architecture, but sometimes you need to work with 64-bit values, such as file offsets or UNIX time. In this case, you can combine pairs of registers, such as a0 and a1.
The following function performs 64-bit subtraction, including handling the carry bit:
# 64-bit integer subtraction
# arguments:
# a0: x lower 32 bits
# a1: x upper 32 bits
# a2: y lower 32 bits
# a3: y upper 32 bits
# return:
# a0: x-y lower 32 bits
# a1: x-y upper 32 bits
#
sub64:
sltu t0, a0, a2 # if a0 < a2 then set t1=1 (carry bit)
sub a1, a1, a3 # sub upper 32 bits
sub a1, a1, t0 # sub carry bit from upper 32 bits of answer
sub a0, a0, a2 # sub lower 32 bits
ret
Learn more about multi-word addition with set instructions.
RV32 ABI Registers
Let’s finish by taking a look at all 32 ABI registers.
ABI Name | Register | Description | Preserved |
---|---|---|---|
zero | x0 | always 0 (zero) | n/a |
ra | x1 | return address | no |
sp | x2 | stack pointer | yes |
gp | x3 | global pointer* | n/a |
tp | x4 | thread pointer* | n/a |
t0 | x5 | temporary | no |
t1 | x6 | temporary | no |
t2 | x7 | temporary | no |
fp (s0) | x8 | frame pointer† | yes |
s1 | x9 | saved register | yes |
a0 | x10 | function argument‡ | no |
a1 | x11 | function argument‡ | no |
a2 | x12 | function argument | no |
a3 | x13 | function argument | no |
a4 | x14 | function argument | no |
a5 | x15 | function argument | no |
a6 | x16 | function argument | no |
a7 | x17 | function argument | no |
s2 | x18 | saved register | yes |
s3 | x19 | saved register | yes |
s4 | x20 | saved register | yes |
s5 | x21 | saved register | yes |
s6 | x22 | saved register | yes |
s7 | x23 | saved register | yes |
s8 | x24 | saved register | yes |
s9 | x25 | saved register | yes |
s10 | x26 | saved register | yes |
s11 | x27 | saved register | yes |
t3 | x28 | temporary | no |
t4 | x29 | temporary | no |
t5 | x30 | temporary | no |
t6 | x31 | temporary | no |
*Let the compiler/linker use the global gp and thread tp pointers; ignore them in your own code.
†The frame pointer fp supports local variables but can be used as a regular saved register.
‡Argument registers a0 and a1 also handle the function return value.
What’s Next?
The next post looks at RISC-V Multiply and Divide Instructions and RISC-V extensions.
If you enjoyed this post, please sponsor me. Sponsors help me create more FPGA and RISC-V projects for everyone, and they get early access to blog posts and source code. 🙏
Check out the RISC-V Assembler Cheat Sheet and all my FPGA & RISC-V Tutorials.
References
- RISC-V Technical Specifications (riscv.org)