RISC-V Assembler: Jump and Function

Published 30 Apr 2024 · Updated 14 Jun 2024

This RISC-V assembler post begins by examining the RISC-V jump instructions: jal and jalr. Jump instructions are the basis of functions, so we’ll then dig into function calls, the RISC-V ABI, calling convention, and how to use the stack. Jump instructions are included in RV32I, the base integer instruction set. You can also check out my other RISC-V posts.

In the last few years, we’ve seen an explosion of RISC-V CPU designs, especially on FPGA. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.

Share your thoughts with @WillFlux on Mastodon or X. If you like what I do, sponsor me. 🙏

Jump

The main operation of the jump instructions is to update the program counter (PC).

The PC keeps track of the CPU’s location in the code. Usually, the CPU adds 4 to the PC when executing an instruction, as each instruction is 4 bytes long. However, with a jump instruction, the CPU updates the PC to point at the jump target instead.

The jump instructions are unconditional. For conditional jumps, see my post on branching.

Just the Two of Us

At one level, this post is about two instructions: jal (jump and link) and jalr (jump and link register). But these two instructions are exciting because they enable functions (also known as subroutines or procedure calls).

jal  rd, imm       # rd = pc+4; pc += imm
jalr rd, rs1, imm  # rd = pc+4; pc = rs1+imm

Before updating the PC, a jump instruction writes the address of the following instruction into a register. By saving this return address, we can return to it and continue execution where we left off.

jal uses a 20-bit signed immediate for the jump destination, while jalr uses a register plus 12-bit signed offset in a similar way to the load and store instructions.

jal range is ±1MiB in units of two bytes for greater range with compressed instruction support.

jalr range is ±2KiB, but you can combine jalr with lui or auipc to reach any 32-bit address.

Jump!

Before we tackle functions (subroutines), let’s consider a plain old jump:

j imm  # pc += imm

For example, use j in an infinite loop:

.L_forever:
    la  a0, message  # load address of message
    call printstr    # call a function (discussed below)
    j .L_forever     # jump to .L_forever label

You could also use the j instruction in case/switch code.

The assembler translates j into jal with the return address register set to zero (x0).

Use j for your unconditional jumps in preference to branches. Jumps make your intent clear, have greater range, and avoid branch prediction.

Functions

To call a function, we must jump to a new address while remembering where we came from. The jal instruction can do this. We need to choose where to save the return address. By convention, this is the x1 register, known in the ABI as the return address register or ra.

The use of ra for the return address is part of the standard RISC-V ABI (application binary interface). The ABI ensures programs written by different programmers and with different tools can interoperate. For example, the ABI allows a program written in C to call a function written in assembler.

How do we get back once we’ve finished executing our function? We have the return address in ra, so jalr (jump and link register) can take us back.

Let’s take a look at a trivial example, calling a function that adds two integers:

li  a0, 7  # 1st argument in a0
li  a1, 8  # 2nd argument in a1
jal ra, add_int  # save address in register ra (x1) and jump to label add_int

ebreak  # stop execution

add_int:
    add  a0, a0, a1   # a0 = a0 + a1
    jalr zero, 0(ra)  # jump to address in register ra with 0 offset

Does this seem unnecessarily fiddly? If we always use the ra register for the return address, why do we need to provide it? Plus, it’s not immediately obvious what the purpose of these jump instructions is.

Pseudoinstructions call and ret to the rescue!

call label  # call function at 'label', saving return address in ra
ret         # return from function using address in ra

This makes for simpler and clearer code:

li a0, 7  # 1st argument in a0
li a1, 8  # 2nd argument in a1
call add_int  # call function

ebreak  # stop execution

add_int:
    add  a0, a0, a1  # a0 = a0 + a1
    ret  # return from function

Far Calls

In the above example we used jal to call our function, but it’s limited to ±1MiB relative to the PC. For far calls we can combine jalr with auipc to reach anywhere in 32-bit memory space. Use the call pseudoinstruction and the assembler will choose the correct instruction(s) for you.

In and Out

Most functions take arguments and return something: this is where the a0-a7 registers come in.

Before calling a function, you put the first argument in a0, the 2nd argument in a1, etc. When it comes time to return our result, we put it in a0. Just like the convention of using ra for the return address, this ensures different code can easily work together.

We have already seen an example of arguments and return values with add_int (above).

Functions Calling Functions

Functions that don’t call other functions are known as leaf functions. After a leaf function executes it uses the return address in ra to return.

However, functions can easily call other functions and it’s here things get interesting. When we call a function, the call instruction writes the return address into ra, overwriting the previous return address!

A function that calls a function must save its own return address before making the function call. We save the existing value of ra on the stack.

Stack

The stack is an area of memory set aside for use by functions and local variables (not discussed here).

The stack pointer (sp) points to the bottom of the stack, which grows downwards to lower addresses. When the CPU is reset, the stack pointer is typically set to the very top of memory.

We allocate memory on the stack by decrementing the stack pointer sp. We can then save registers onto the stack using the sw (store word) instruction. See Load Store for coverage of RISC-V memory instructions.

For example, function fun_one calls function fun_two, so it must save its return address on the stack:

fun_one:
    addi sp, sp, -16  # allocate 16 bytes on stack
    sw   ra, 12(sp)   # store return address on stack

    # do some fun stuff

    call fun_two  # call another function

    # do some more fun stuff

    lw   ra, 12(sp)  # load return address from stack
    addi sp, sp, 16  # restore stack pointer

    ret  # return from fun_one

Note how we store and then later load the return address from the same offset (12) to the stack pointer.

Stack Alignment

Why did we allocate 16 bytes on the stack when our return address is only 4 bytes long?

The RISC-V calling convention says:

The stack grows downwards (towards lower addresses) and the stack pointer shall be aligned to a 128-bit boundary upon procedure entry.

This is another example of the ABI ensuring interoperability. We ensure all data types are correctly aligned by aligning the stack pointer to 16 bytes.

Ignorance is Bliss

A function doesn’t know what happened before it was called or what will happen after it returns. A function caller doesn’t know what happens inside a function, just what it passes in and gets returned. A well-written function is an example of a black box.

RISC-V gives us 32 general-purpose registers, but if every function used them indiscriminately, they’d overwrite each other’s data. We can solve this problem by pushing existing registers onto the stack before calling the function and popping them off the stack after the function returns. However, pushing registers onto the stack makes functions slower. A simple function could spend more CPU cycles pushing and popping register from the stack than doing useful work.

The RISC-V ABI lets us have fast functions while preserving some register values.

There are three main categories of general-purpose registers:

saved registers: s0-s11 - keep their value across function calls (preserved)
argument registers: a0-a7 - for passing arguments and the return value (not preserved)
temporary registers: t0-t6 - for internal function use (not preserved)

Understanding how to handle preserved and non-preserved registers is critical to writing RISC-V assembler. I’d go so far as to say its the most important skill beyond a basic knowledge of the instructions. Getting it right results in fast, elegant code. Getting it wrong leads to subtle bugs and much frustration. A good start is to always use ABI names for registers, otherwise it’s really difficult to remember which registers you need to save!

A function using a preserved register must restore its original value before returning.
A function using a non-preserved register must assume it’s changed by a function call.

Let’s look at both cases in a little more detail.

Preserved Registers

Preserved registers must be restored to their original value before returning from a function call. If your function uses preserved registers, such as s0-s11, save them on the stack before using them.

For example, fun_foo uses s1-s4, it saves them on the stack like this:

fun_foo:
    addi sp, sp, -16  # allocate space on stack
    sw   s1, 12(sp)   # store saved registers on stack
    sw   s2,  8(sp)
    sw   s3,  4(sp)
    sw   s4,  0(sp)

    # we're now free to use s1-s4
    # implement incredible algorithm here

    lw   s1, 12(sp)  # restore saved registers from stack
    lw   s2,  8(sp)
    lw   s3,  4(sp)
    lw   s4,  0(sp)
    addi sp, sp, 16  # restore stack pointer
    ret

Other Registers

With non-preserved registers, you can do what you want, but so can other functions. After you call another function, you must assume the values of the a and t registers have changed.

For example, I’ve written a function to initialize my graphics display. The background colour is passed to gfx_setup in a0. However, before I set the background colour I need to call frame_wait.

The frame_wait function could overwrite a0, so I preserve it on the stack. Of course, I also need to save ra on the stack before calling another function, leading to this design:

gfx_setup:
    addi sp, sp, -16  # allocate space on stack
    sw   ra, 12(sp)   # save return address onto stack
    sw   a0,  8(sp)   # save a0 (background colour)

    call frame_wait  # wait for blanking before graphics setup

    li t6, GFX_HWREG  # graphics engine address

    # background colour
    lw a0, 8(sp)  # load background colour (a0) from stack
    sw a0, DISP_BGRD(t6)  # set background colour

    # other graphics setup here...

    lw   ra, 12(sp)  # load return address from stack
    addi sp, sp, 16  # restore stack pointer
    ret

Functions that don’t call other functions (leaf functions), don’t have to worry about non-preserved registers changing. When writing leaf functions, stick to t and a registers, then you don’t have to save anything to the stack: simple and fast.

Many Arguments

In the rare event your function needs more than eight arguments, you can pass them on the stack.

The RISC-V calling convention says:

The first argument passed on the stack is located at offset zero of the stack pointer on function entry; following arguments are stored at correspondingly higher addresses.

For example, a function with 10 arguments receives the first eight arguments in a0-a7 and could handle the remaining arguments like this:

fun_ten:
    lw t0, 0(sp)  # load 9th argument off stack into t0
    lw t1, 4(sp)  # load 10th argument off stack into t1

    # the first 8 arguments are in a0-a7

NB. We use an offset of 4 for the 10th argument because we’re loading a word (four bytes).

64-bit Variables

RV32 is a 32-bit architecture, but sometimes you need to work with 64-bit values, such as file offsets or UNIX time. In this case, you can combine pairs of registers, such as a0 and a1.

The following function performs 64-bit subtraction, including handling the carry bit:

# 64-bit integer subtraction
#   arguments:
#       a0: x lower 32 bits
#       a1: x upper 32 bits
#       a2: y lower 32 bits
#       a3: y upper 32 bits
#   return:
#       a0: x-y lower 32 bits
#       a1: x-y upper 32 bits
#
sub64:
    sltu    t0, a0, a2  # if a0 < a2 then set t1=1 (carry bit)
    sub     a1, a1, a3  # sub upper 32 bits
    sub     a1, a1, t0  # sub carry bit from upper 32 bits of answer
    sub     a0, a0, a2  # sub lower 32 bits
    ret

Learn more about multi-word addition with set instructions.

RV32 ABI Registers

Let’s finish by taking a look at all 32 ABI registers.

ABI Name	Register	Description	Preserved
zero	x0	always 0 (zero)	n/a
ra	x1	return address	no
sp	x2	stack pointer	yes
gp	x3	global pointer*	n/a
tp	x4	thread pointer*	n/a
t0	x5	temporary	no
t1	x6	temporary	no
t2	x7	temporary	no
fp (s0)	x8	frame pointer†	yes
s1	x9	saved register	yes
a0	x10	function argument‡	no
a1	x11	function argument‡	no
a2	x12	function argument	no
a3	x13	function argument	no
a4	x14	function argument	no
a5	x15	function argument	no
a6	x16	function argument	no
a7	x17	function argument	no
s2	x18	saved register	yes
s3	x19	saved register	yes
s4	x20	saved register	yes
s5	x21	saved register	yes
s6	x22	saved register	yes
s7	x23	saved register	yes
s8	x24	saved register	yes
s9	x25	saved register	yes
s10	x26	saved register	yes
s11	x27	saved register	yes
t3	x28	temporary	no
t4	x29	temporary	no
t5	x30	temporary	no
t6	x31	temporary	no

*Let the compiler/linker use the global gp and thread tp pointers; ignore them in your own code.
†The frame pointer fp supports local variables but can be used as a regular saved register.
‡Argument registers a0 and a1 also handle the function return value.

What’s Next?

The next post looks at RISC-V Multiply and Divide Instructions and RISC-V extensions.

If you enjoyed this post, please sponsor me. Sponsors help me create more FPGA and RISC-V projects for everyone, and they get early access to blog posts and source code. 🙏

Check out the RISC-V Assembler Cheat Sheet and all my FPGA & RISC-V Tutorials.

References

RISC-V Technical Specifications (riscv.org)