Project F

RISC-V Assembler: Compiler Explorer

Published

The Godbolt Compiler Explorer is a fantastic tool for assembler programmers. In this post, I show you how to use Compiler Explorer to generate RISC-V assembly code and offer some ideas to make best use of this tool.

In the last few years, we’ve seen an explosion of RISC-V CPU designs on FPGA and ASIC, including the RP2350 found on the Raspberry Pi Pico 2. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.

RISC-V Assembler: Arithmetic | Logical | Shift | Load and Store | Branch and Set | Jump and Function | Multiply and Divide | Compiler Explorer | Assembler Cheat Sheet

Getting Started with Compiler Explorer

The Godbolt Compiler Explorer lives at godbolt.org.

Compiler Explorer lets you see the results of compiling C, C++, Rust and other high-level languages in your browser. Change your high-level code and see the assembled code update immediately. This is invaluable when experimenting and learning, and Compiler Explorer even shows you which assembly instructions correspond to which parts of your high-level code.

I will focus on C and 32-bit RISC-V (RV32), but much of this advice applies to other languages and architectures. My examples use the Hazard3 RISC-V CPU because it’s both open source and available in a low-cost microcontroller, the Raspberry Pi RP2350.

Compiler Explorer Interface

You write your high-level code on the left and the assembler appears on the right. You can choose your programming language A, compiler B, and compiler options C. Click on output D to see compiler output, including errors and warnings.

Choosing a Compiler

For 32-bit RISC-V, you can choose GCC or Clang in many versions. While I normally use GNU assembler (gas) to assemble my RISC-V designs, Clang often generates more readable assembler. The beauty of Compiler Explorer is you can easily use both and compare them.

Add your chosen compilers to your favourites; otherwise, you’ll be doing a lot of scrolling! You do this by clicking on the compiler drop-down menu and selecting the stars next to your chosen compilers.

Functions and Optimisation

The best way to experiment with simple designs is to write a function. That way, the inputs and outputs are clear, and you can plainly see what’s happening.

By default, the generated code is unoptimised. This is probably not what you want because it adds a stack frame to your functions, making it harder to see what your algorithm is doing.

Consider this trivial C function that squares a number:

int square(int num) {
    return num * num;
}

In Clang 18.1, without optimisation, you get 11 instructions!

Compiler Explorer without optimisation

It makes sense if you know that sp is the stack pointer, ra is the return address, and s0 is the frame pointer. However, unless you’re learning about functions, these instructions are just getting in the way.

If we add -O to the compiler options (top right of window), we get more readable code:

Compiler Explorer with optimisation

As expected, squaring a number requires a single multiply instruction.

You can optimise further with -O2 or for size with -Os. See GCC Options That Control Optimization.

Compiler Options

Beyond optimisation, you can pass many more options to the compiler.

For RISC-V, the two key options are:

  • -mabi=ABI-string
  • -march=ISA-string

The ABI specifies the integer and floating-point calling convention. There are three valid 32-bit conventions, all of which have 32-bit int, long, and pointer but differ in their floating-point support:

  • ilp32 - 32 bit without floating point (soft floats)
  • ilp32f - 32 bit with single-precision floating point
  • ilp32d - 32 bit with double-precision floating point

The ISA specifies the size (32 or 64-bit) and RISC-V extensions to use. There are many possible combinations; here are a few of the 32-bit possibilities:

  • rv32i - base 32-bit integer instructions
  • rv32im - 32-bit integer with multiply extension
  • rv32imac - 32-bit integer with multiply, atomic, and compressed extensions

For Hazard3, which lacks floating-point hardware but supports bit-manipulation extensions:

  • -mabi=ilp32
  • -march=rv32imac_zicsr_zifencei_zba_zbb_zbkb_zbs

For a complete list of RISC-V compiler options, see GCC RISC-V Options (works with GCC & Clang).

Compiler support for RISC-V extensions has evolved rapidly in recent years, so if you have trouble assembling some code, make sure your compiler is new enough.

ProTip: Another handy option is -fno-inline, which prevents functions from being inlined.

C Types

C types are broadly architecture-dependent, and C sets a low bar for acceptable implementations. For example, int is signed and must be capable of the range −32767 to +32767.

If you’re doing anything vaguely numerical, you want to be precise with your types using stdint.h.

Types you might want to use include:

  • signed: int8_t, int16_t, int32_t, int64_t
  • unsigned: uint8_t, uint16_t, uint32_t, uint64_t

For example, compare 32-bit and 64-bit addition on 32-bit RISC-V:

#include <stdint.h>
int32_t add32(int32_t a, int32_t b) {
    return a + b;
}

int64_t add64(int64_t a, int64_t b) {
    return a + b;
}

32-bit and 64-bit Addition in Compiler Explorer

Learn more about RISC-V set instructions and multi-word addition.

C’s rules for type conversion can lead to unexpected results. For example, multiplying 32-bit values produces a 64-bit product, but you only get a 64-bit result if you explicitly cast the input values to 64 bits. Contrast the assembled code for these two functions:

#include <stdint.h>
int64_t mul64_broken(int32_t x, int32_t y) {
    return x * y;  // returns 32-bit result!
}

int64_t mul64(int32_t x, int32_t y) {
    return (int64_t)x * (int64_t)y;  // 64-bit cast
}

64-bit Multiplication in Compiler Explorer

RISC-V Extensions

You can easily explore the impact of different RISC-V extensions on code generation.

In this example, let’s reverse the order of bytes in a 32-bit word:

int32_t endian_swap(int32_t word) {
    return ((word>>24) & 0xFF) |
        ((word<<8) & 0xFF0000) |
        ((word>>8) & 0xFF00) |
        ((word<<24) & 0xFF000000);
}

The result from Clang 18.1 with -O:

endian_swap:
        srli    a1, a0, 8
        lui     a2, 16
        addi    a2, a2, -256
        and     a1, a1, a2
        srli    a3, a0, 24
        or      a1, a1, a3
        and     a2, a2, a0
        slli    a2, a2, 8
        slli    a0, a0, 24
        or      a0, a0, a2
        or      a0, a0, a1
        ret

If we use the extensions supported by Hazard3, the generated code is rather shorter.

Clang 18.1 with -O -mabi=ilp32 -march=rv32imac_zicsr_zifencei_zba_zbb_zbkb_zbs:

endian_swap:
        rev8    a0, a0
        ret

The Zbb (basic bit-manipulation) extension includes the rev8 instruction, which reverses the bytes in a word, swapping big endian to little endian and vice-versa.

Comparing Architectures

Compiler Explorer is a great way to compare and contrast architectures and instruction sets. Let’s look at a couple of simple functions compiled for different architectures.

#include <stdint.h>

int64_t add64(int64_t a, int64_t b) {
    return a + b;
}

int32_t greater(int32_t a, int32_t b) {
    if (a > b) return 1;
    else return 0;
}

32-bit RISC-V (Clang 18.1 with -O):

add64:
        add     a1, a1, a3
        add     a0, a0, a2
        sltu    a2, a0, a2
        add     a1, a1, a2
        ret

greater:
        slt     a0, a1, a0
        ret

armv7 (Clang 18.1 with -O):

add64:
        adds    r0, r2, r0
        adc     r1, r3, r1
        bx      lr

greater:
        mov     r2, #0
        cmp     r0, r1
        movgt   r2, #1
        mov     r0, r2
        bx      lr

Or how about the Intel 486 (GCC 14.2 with -O -m32 -march=i486):

add64:
        mov     eax, DWORD PTR [esp+12]
        mov     edx, DWORD PTR [esp+16]
        add     eax, DWORD PTR [esp+4]
        adc     edx, DWORD PTR [esp+8]
        ret
greater:
        mov     eax, DWORD PTR [esp+8]
        cmp     DWORD PTR [esp+4], eax
        setg    al
        and     eax, 255
        ret

You can pass all the usual compiler options to select specific architectures. Refer to your chosen compiler documentation for details.

Compiler Explorer supports an impressive collections of instruction sets: 6502, aarch64, amd64 (inc. i386), arm32, avr, c6x, ebpf, kvx, loongarch, m68k, mips, mrisc32, msp430, powerpc, riscv32, riscv64, s390x, sh (SuperH), sparc, vax, wasm32, and xtensa! I lost a fair few hours learning about some of the more obscure of them.

What’s Next?

Check out the RISC-V Assembler Cheat Sheet and my FPGA & RISC-V Tutorials.

Share your thoughts with me on Mastodon or X. If you enjoy my work, please sponsor me. Sponsors help me create new projects for everyone, and they get early access to blog posts and source code. 🙏