Project F

RISC-V Assembler: Jump and Function

Published · Updated

This post begins by examining the RISC-V jump instructions: jal and jalr. Jump instructions are the basis of functions, so we’ll then dig into function calls, the RISC-V ABI, calling convention, and how to use the stack. Jump instructions are included in RV32I, the base integer instruction set. New to the RISC-V assembler series? Check out the first part on arithmetic instructions.

In the last few years, we’ve seen an explosion of RISC-V CPU designs, especially on FPGA. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions (RV32) and the RISC-V ABI.

Share your thoughts with @WillFlux on Mastodon or Twitter. If you like what I do, sponsor me. 🙏

Jump

The main operation of the jump instructions is to update the program counter (PC).

The PC points to the next instruction the CPU will execute. Usually, the CPU adds 4 to the PC when executing an instruction, as each instruction is 4 bytes long. However, with a jump instruction, the CPU updates the PC to point at the jump destination instead.

The jump instructions are unconditional. For conditional jumps, see my separate post on branching.

Just the Two of Us

At one level, this post is about two instructions: jal and jalr. But these two instructions are exciting because they enable functions (also known as subroutines or procedure calls).

jal   # jump and link
jalr  # jump and link register

Before updating the PC, a jump instruction writes the address of the following instruction into a register. By saving this return address, we can return to it and continue execution where we left off.

jal uses a 20-bit signed immediate for the jump destination, while jalr uses a register plus 12-bit signed offset in a similar way to load/store instructions. See arithmetic sign extension for a reminder of how sign extension works.

jal has a range of ±1MiB - relative to the PC. Jumps are in units of two bytes to support compressed instructions.

jalr offsets are only ±2KiB, but you can combine jalr with lui to reach anywhere in 32-bit address space, or with auipc to reach any PC-relative address. See my previous coverage of auipc.

Jump!

Before we tackle more advanced uses for jump, let’s consider a plain jump with no way back using:

j  # jump to label (no return)

For example, use j in an infinite loop:

.L_forever:
    la  a0, message  # load address of message
    call printstr    # call a function (discussed below)
    j .L_forever     # jump to .L_forever label

You could also use the j instruction in case/switch code.

The assembler translates j into jal with the return address register set to x0 (zero).

ProTip: Use j for unconditional jumps. It makes your intent clear and has a greater range than branches.

Functions

To call a function, we must jump to a new address while remembering where we came from. The jal instruction can do this. We need to choose where to save the return address. By convention, this is the x1 register, known in the ABI as the return address register or ra.

The use of ra for the return address is part of the standard RISC-V ABI (application binary interface). The ABI ensures programs written by different programmers and with different tools can interoperate. For example, the ABI allows a program written in C to call a function written in assembler.

How do we get back once we’ve finished executing our function? We have the return address in ra, so the jalr (jump and link register) instruction can take us back.

Let’s take a look at a trivial example, calling a function that adds two integers:

li  a0, 7  # 1st argument in a0
li  a1, 8  # 2nd argument in a1
jal ra, add_int  # save address in register ra (x1) and jump to label add_int

ebreak  # stop execution

add_int:
    add  a0, a0, a1  # a0 = a0 + a1
    jalr zero, 0(ra)   # jump to address in register ra

Does this seem unnecessarily fiddly? If we always use the ra register for the return address, why do we need to provide it? And it’s not immediately obvious what the jump instructions are doing.

Pseudoinstructions call and ret to the rescue!

call label  # call function at 'label', saving return address in ra
ret         # return from function using address in ra

This makes for simpler and clearer code:

li a0, 7  # 1st argument in a0
li a1, 8  # 2nd argument in a1
call add_int  # call function

ebreak  # stop execution

add_int:
    add  a0, a0, a1  # a0 = a0 + a1
    ret  # return from function

In and Out

Most functions take arguments and return something: this is where the a0-a7 registers come in.

Before calling a function, you put the first argument in a0, the 2nd argument in a1, etc. When it comes time to return our result, we put it in a0. Just like the convention of using ra for the return address, this ensures different code can easily work together.

We have already seen an example of arguments and return values with add_int (above).

Functions Calling Functions

Functions that don’t call other functions are known as leaf functions. After a leaf function executes it uses the return address in ra to return.

However, functions can easily call other functions and it’s here things get interesting. When we call a function, the call instruction writes the return address into ra, overwriting the previous return address!

A function that calls a function must save its own return address before making the function call. We save the existing value of ra on the stack.

Stack

The stack is an area of memory set aside for use by functions and local variables (not discussed here).

The stack pointer (sp) points to the bottom of the stack, which grows downwards to lower addresses. When the CPU is reset, the stack pointer is typically set to the very top of memory.

We allocate memory on the stack by decrementing the stack pointer sp. We can then save registers onto the stack using the sw (store word) instruction. See Load Store for coverage of RISC-V memory instructions.

For example, function fun_one calls function fun_two, so it must save its return address on the stack:

fun_one:
    addi sp, sp, -16  # allocate 16 bytes on stack
    sw   ra, 12(sp)   # store return address on stack

    # do some fun stuff

    call fun_two  # call another function

    # do some more fun stuff

    lw   ra, 12(sp)  # load return address from stack
    addi sp, sp, 16  # restore stack pointer

    ret  # return from fun_one

Note how we store and then later load the return address from the same offset (12) to the stack pointer.

Stack Alignment

Why did we allocate 16 bytes on the stack when our return address is only 4 bytes long?

The RISC-V calling convention says:

The stack grows downwards (towards lower addresses) and the stack pointer shall be aligned to a 128-bit boundary upon procedure entry.

This is another example of the ABI ensuring interoperability. We ensure all data types are correctly aligned by aligning the stack pointer to 16 bytes.

Ignorance is Bliss

A function doesn’t know what happened before it was called or what will happen after it returns. A function caller doesn’t know what happens inside a function, just what it passes in and gets back. A well-written function is an example of a black box.

RISC-V gives us 32 general-purpose registers, but if every function used them indiscriminately, they’d overwrite each other’s data. We can solve this problem by pushing existing register values onto the stack. However, pushing values onto the stack makes functions slower. A simple function could spend more CPU cycles pushing and popping values from the stack than doing useful work.

The RISC-V ABI lets us have fast functions while preserving some register values.

There are three main categories of general-purpose registers:

  • saved registers: s0-s11 - keep their value across function calls (preserved)
  • argument registers: a0-a7 - for passing arguments and the return value (not preserved)
  • temporary registers: t0-t6 - for internal function use (not preserved)

Understanding how to handle preserved and non-preserved registers is critical to writing RISC-V assembler. I’d go so far as to say its the most important skill beyond a basic knowledge of the instructions. Getting it right results in fast, elegant code. Getting it wrong leads to subtle bugs and much frustration. A good start is to always use ABI names for registers, otherwise it’s really difficult to remember which registers you need to save!

  • A function using a preserved register must restore its original value before returning.
  • A function using a non-preserved register must assume it’s changed by a function call.

Let’s look at both cases in a little more detail.

Preserved Registers

Preserved registers must be restored to their original value before returning from a function call. If your function uses preserved registers, such as s0-s11, save their existing values on the stack.

For example, fun_foo uses s1-s4, it saves them on the stack like this:

fun_foo:
    addi sp, sp, -16  # allocate space on stack
    sw   s1, 12(sp)   # store saved registers on stack
    sw   s2,  8(sp)
    sw   s3,  4(sp)
    sw   s4,  0(sp)

    # we're now free to use s1-s4
    # implement incredible algorithm here

    lw   s1, 12(sp)  # restore saved registers from stack
    lw   s2,  8(sp)
    lw   s3,  4(sp)
    lw   s4,  0(sp)
    addi sp, sp, 16  # restore stack pointer
    ret

Other Registers

With non-preserved registers, you can do what you want, but so can other functions. After you call another function, you must assume the values of the a and t registers have changed.

For example, I’ve written a function to initialize my graphics display. The background colour is passed to gfx_setup in a0. However, before I set the background colour I need to call frame_wait.

The frame_wait function could overwrite a0, so I preserve it on the stack. Of course, I also need to save ra on the stack before calling another function, leading to this design:

gfx_setup:
    addi sp, sp, -16  # allocate space on stack
    sw   ra, 12(sp)   # save return address onto stack
    sw   a0,  8(sp)   # save a0 (background colour)

    call frame_wait  # wait for blanking before graphics setup

    li t6, GFX_HWREG  # graphics engine address

    # background colour
    lw a0, 8(sp)  # load background colour (a0) from stack
    sw a0, DISP_BGRD(t6)  # set background colour

    # other graphics setup here...

    lw   ra, 12(sp)  # load return address from stack
    addi sp, sp, 16  # restore stack pointer
    ret

Functions that don’t call other functions (leaf functions), don’t have to worry about non-preserved registers changing. When writing leaf functions, stick to t and a registers, then you don’t have to save anything to the stack: simple and fast.

Many Arguments

In the rare event your function needs more than eight arguments, you can pass them on the stack.

The RISC-V calling convention says:

The first argument passed on the stack is located at offset zero of the stack pointer on function entry; following arguments are stored at correspondingly higher addresses.

For example, a function with 10 arguments receives the first eight arguments in a0-a7 and could handle the remaining arguments like this:

fun_ten:
    lw t0, 0(sp)  # load 9th argument off stack into t0
    lw t1, 4(sp)  # load 10th argument off stack into t1

    # the first 8 arguments are in a0-a7

NB. We use an offset of 4 for the 10th argument because we’re loading a word (four bytes).

64-bit Variables

RV32 is a 32-bit architecture, but sometimes you need to work with 64-bit values, such as file offsets or UNIX time. In this case, you can combine pairs of registers, such as a0 and a1.

The following function performs 64-bit subtraction, including handling the carry bit:

# 64-bit integer subtraction
#   arguments:
#       a0: x lower 32 bits
#       a1: x upper 32 bits
#       a2: y lower 32 bits
#       a3: y upper 32 bits
#   return:
#       a0: x-y lower 32 bits
#       a1: x-y upper 32 bits
#
sub64:
    sltu    t0, a0, a2  # if a0 < a2 then set t1=1 (carry bit)
    sub     a1, a1, a3  # sub upper 32 bits
    sub     a1, a1, t0  # sub carry bit from upper 32 bits of answer
    sub     a0, a0, a2  # sub lower 32 bits
    ret

Learn more on multi-word addition with set instructions.

RV32 ABI Registers

Let’s finish by taking a look at all 32 ABI registers.

ABI NameRegisterDescriptionPreserved
zerox0always 0 (zero)n/a
rax1return addressno
spx2stack pointeryes
gpx3global pointer*n/a
tpx4thread pointer*n/a
t0x5temporaryno
t1x6temporaryno
t2x7temporaryno
fp (s0)x8frame pointer†yes
s1x9saved registeryes
a0x10function argument‡no
a1x11function argument‡no
a2x12function argumentno
a3x13function argumentno
a4x14function argumentno
a5x15function argumentno
a6x16function argumentno
a7x17function argumentno
s2x18saved registeryes
s3x19saved registeryes
s4x20saved registeryes
s5x21saved registeryes
s6x22saved registeryes
s7x23saved registeryes
s8x24saved registeryes
s9x25saved registeryes
s10x26saved registeryes
s11x27saved registeryes
t3x28temporaryno
t4x29temporaryno
t5x30temporaryno
t6x31temporaryno

*Let the compiler/linker use the global gp and thread tp pointers; ignore them in your own code.
†The frame pointer fp supports local variables but can be used as a regular saved register.
‡Argument registers a0 and a1 also handle the function return value.

What’s Next?

If you enjoyed this post, please sponsor me. Sponsors help me create more FPGA and RISC-V projects for everyone, and they get early access to blog posts and source code. 🙏

Other posts in this series include: Arithmetic, Load Store, and Branch Set instructions. Or check out my FPGA & RISC-V Tutorials and my series on early Macintosh History.

References