RISC-V Assembler: Load Store
This RISC-V assembler post covers load and store instructions, such as lw, sw, and lbu. We also cover memory alignment, addressing modes, and loading symbol addresses. Load and store instructions are included in RV32I, the base integer instruction set.
In the last few years, we’ve seen an explosion of RISC-V CPU designs on FPGA and ASIC, including the RP2350 found on the Raspberry Pi Pico 2. Thankfully, RISC-V is ideal for assembly programming with its compact, easy-to-learn instruction set. This series will help you learn and understand 32-bit RISC-V instructions and programming.
RISC-V Assembler: Arithmetic | Logical | Shift | Load and Store | Branch and Set | Jump and Function | Multiply and Divide | Compiler Explorer | Assembler Cheat Sheet
Load-Store Architecture
RISC-V is a load-store architecture: load and store instructions access memory, while other instructions work with CPU registers. A load reads a value from memory into a register. A store writes a value from a register into memory.
Data Sizes
RV32 is a 32-bit architecture, and all arithmetic is performed on 32-bit words (there’s no “byte add” instruction, for example). However, loads and stores support 8 and 16-bit data as programmers commonly work with 8 and 16-bit data, such as text.
RISC-V uses consistent names and one-letter abbreviations for data sizes:
- b - byte - 8 bits
- h - half word - 16 bits (2 bytes)
- w - word - 32 bits (4 bytes)
- d - double word - 64 bits (8 bytes)
Being familiar with these one-letter abbreviations is a great help in understanding loads and stores.
ProTip: A word is always 32 bits wide, even on 64-bit RISC-V (RV64).
Load
32-bit RISC-V has five memory load instructions:
lw # rd = mem[rs1+imm] ; load word
lh # rd = mem[rs1+imm][0:15] ; load half word
lhu # rd = mem[rs1+imm][0:15] ; load half word unsigned
lb # rd = mem[rs1+imm][0:7] ; load byte
lbu # rd = mem[rs1+imm][0:7] ; load byte unsigned
NB. I cover load immediate (li) under arithmetic.
Load instructions have a consistent format that we’ll illustrate with load word:
lw rd, imm(rs1)
Where rd is the destination register, source register rs1 holds the memory address, and imm is an address offset. The offset is a 12-bit signed immediate, so can reach addresses -2048 to +2047 bytes from the the address in rs1.
For example, if we want to load the word at address 0x140 into register t0:
li t6, 0x140 # load the immediate 0x140 (address) into register t6
lw t0, 0(t6) # load word from memory address in t6 with 0 byte offset
t0 is loaded with the word at address 0x140.
To load the next word, we increase the address by 4 because addresses are in units of bytes:
lw t1, 4(t6) # load word from memory address in t6 with 4 byte offset
t1 is loaded with the word at address 0x144.
Halves and bytes work in the same way, but the value is sign-extended:
lh t2, 6(t6) # load sign-extended half from memory address in t6 with 6 byte offset
lb t3, 7(t6) # load sign-extended byte from memory address in t6 with 7 byte offset
t2 is loaded with the half word at address 0x146. t3 is loaded with the byte at address 0x147.
Thanks to sign extension, a byte in memory with the value -1 retains the correct value when loaded into a register. See arithmetic sign extension for a reminder of how sign extension works.
RISC-V includes unsigned load half and byte to handle unsigned data, such as UTF-8 text:
lhu t4, 8(t6) # load zero-extended half from memory address in t6 with 8 byte offset
lbu t5, 10(t6) # load zero-extended byte from memory address in t6 with 10 byte offset
Unsigned loads zero-fill the upper part of the register.
Store
The store instructions are straightforward because there’s no need to worry about sign extension:
sw # mem[rs1+imm] = rs2 ; store word
sh # mem[rs1+imm][0:15] = rs2[0:15] ; store half word
sb # mem[rs1+imm][0:7] = rs2[0:7] ; store byte
Store instructions look like the equivalent load instructions:
sw rs2, imm(rs1)
Where rs2 is the source register, rs1 holds the memory address, and imm is an address offset.
Note how the source register is the first operand, which makes stores like loads but is different from other RISC-V instructions. This is best seen with some examples:
li t0, 42 # load the immediate 42 into register t0
li t6, 0x140 # load the immediate 0x140 (address) into register t6
sw t0, 0(t6) # store the word in t0 to memory address in t6 with 0 byte offset
Memory location 0x140 now contains a word with the value 42 (0x0000002A).
If we want to zero a word of memory, we can store the zero (x0) register to it:
sw zero, 4(t6) # store 0 to memory address in t6 with 4 byte offset
Halves and bytes work in the same way, storing the least significant 16 or 8 bits to memory:
sw zero, 4(t6) # store 0 to memory address in t6 with 4 byte offset
li t0, 0xFACE # load the immediate 0xFACE into register t0
sh t0, 4(t6) # store half from t0 to memory address in t6 with 4 byte offset
sb t0, 6(t6) # store byte from t0 to memory address in t6 with 6 byte offset
What state is our memory now in? A good way to think about this is to ask what happens if we load a word from memory address 0x144?
The answer hinges on RISC-V being little endian. A little-endian CPU stores the least significant byte at the lowest address. x86 and ARM are also little endian.
Our sh instruction puts the least significant byte, 0xCE, at address 0x144 and the most significant byte, 0xFA, at address 0x145.
The following sb instruction puts 0xCE at address 0x146. 0x147 is still zero from the previous “sw zero” instruction.
lw t1, 4(t6) # load word from memory address in t6 with 4 byte offset
After this load, t1 contains 0x00CEFACE.
Most of the time, you’ll be accessing data as either words or bytes, in which case you needn’t worry about RISC-V being little endian.
This is a cursory look at endianness, but there’s plenty of material online. Wikipedia’s Endianness article is a decent place to start.
Load Symbol Address
The load and store instructions require a memory address, but what if you want to reference a symbol? This sounds too abstract, so let’s look at a concrete example let’s say “Hello, World!”.
We put our greeting string in the data section with the .ascii assembler directive:
.section .data
.balign 4
greeting:
.ascii "Hello, World!\0" # null-terminated string
Imagine a function called print_string
that displays a null-terminated string. We need to pass the address of our greeting string from the data section, but we don’t know the address!
The la (load address) pseudoinstruction comes to our rescue:
la rd, symbol
Loading our symbol address is simple:
la a0, greeting # load address of greeting label in the data section
call print_string # call print_string function
Note how we pass the (first) argument to a function in register a0. See my post covering function calls for an explanation of the RISC-V calling convention.
Memory Addresses
RISC-V uses byte addressing, the norm for all general-purpose CPUs. With byte addressing, you can access an individual byte in memory even with a 32 or 64-bit CPU.
We’re so used to thinking of data sizes in bytes that we rarely stop to think about it, but there’s no fundamental reason we should divide data into 8-bit chunks. If a CPU is 32-bit, why not address memory in units of 32-bit words? Word addressing would be simpler, and a 32-bit CPU could access 16 GiB or memory vs 4 GiB with byte addressing.
However, the dominant UTF-8 text encoding is byte-based and with good reason. CPU performance depends on cache hits, so efficient storage of frequently used data, such as text, is essential.
The upshot of byte addressing is that if you want to move to the next word, you must add 4 to the address. On 64-bit CPUs, you add 8 to get to the next double word. Accidentally adding 1, rather than 4, to a memory address is a common source of bugs in my personal experience. 😅
Remember, the load and store memory offset is a signed 12-bit value, so you can access memory locations between -2048 and +2047 bytes from the base address in the register.
Memory Alignment
RISC-V doesn’t require data to be naturally aligned; for example, words don’t have to be on a 4-byte boundary. However, not all CPUs support misaligned memory access, and it’s invariably slower on those that do support it. I strongly recommend using natural alignment for your data.
Your code should be correctly aligned by the compiler. You can align your data with the GNU assembler .balign assembler directive.
For example, to align the word with the label “foo” to a 4-byte boundary:
.section .data
.balign 4
foo:
.word 0
NB. The alignment directive applies to the label, so it must appear before the label (not the data)!
Addressing Modes
An addressing mode is how the CPU calculates a memory address. With x86 and 68K, the smart use of addressing modes is critical to writing good code. With RISC-V, addressing modes aren’t really a thing. I will stick my neck out a little and say RISC-V has three addressing modes, but it’s not something you usually need to consider.
- Register Offset (AKA Displacement on x86) - most instructions (including load/store)
- PC Relative - auipc, jal, and branch instructions
- Absolute (AKA Immediate) - lui
PC is the program counter, which keeps track of where the CPU is within the code. In x86 land, this is known as the instruction pointer (IP), which is frankly a much better name. We’ll learn more about the program counter when we discuss branches.
Including variations, the Motorola 68000 has 14 addressing modes! For example address register indirect with post-increment. These help you write compact assembly code but complicate the CPU design. I love 68000 assembler, but I appreciate the simplicity of RISC-V.
What’s Next?
The next post looks at RISC-V Branch and Set Instructions, including the zero register.
Check out the RISC-V Assembler Cheat Sheet and my FPGA & RISC-V Tutorials.
Get in touch on Mastodon, Bluesky, or X. If you enjoy my work, please sponsor me. 🙏
References
- RISC-V Technical Specifications (riscv.org)