2:Isle Bitmap Graphics

Published 31 Aug 2025 (DRAFT)

Generating pixels on the fly, racing the beam, as we did in the previous Display chapter, is fun, but it limits creativity and the types of programs we can design. In this chapter, we'll introduce bitmap graphics, where pixels on screen are held in memory instead of being generated directly. We create video memory, paint our graphics into this vram, then read each pixel for display when needed.

If you're new to the project, read Isle FPGA Computer for an introduction.

Memory/Resolution/Colour

I was determined that Isle would run on small FPGAs, so it was accessible and affordable to as many people as possible. However, I also wanted dedicated vram using FPGA block ram (bram), instead of graphics sharing memory with the CPU. Using dedicated bram for video memory avoids contention with the CPU, but also let's us do some pretty cool things with colour depth and clock domains.

With bram, we can have two memory ports, so it's no problem if the graphics engine is drawing shapes at the same time as the display reads pixels. Each bram port can run at a different frequency, so we can support different display modes (1366x768, 1024x768, 1280x720) without changing our system clock or handling tricky clock domain crossing issues.

FPGA bram is ideal for our video memory, but it's also in short supply, especially on smaller FPGAs, so I've chosen to design Isle with 64 KiB of vram. That sounds tiny, but there's a surprising amount we can do with 64 KiB, especially as block ram is so flexible.

Isle Resolutions

These are the default Isle bitmap resolutions with 64 KiB vram:

These display modes are enough for many types of program, including GUIs and games. Resolution and colour depth are flexible, so don't feel constrained by my choices, though bear in mind how much video memory you'll need. Display modes explains the resolution and colour choices in more detail.

VRAM

Isle vram has a 32-bit data bus, matching our CPU data bus for high performance and simple addressing. However, pixels are 1, 2, 4, or 8 bits. If we use regular memory for our vram, updating a pixel is complex and time-consuming, we need to read a 32-bit word, mask out the bits we want to change, then write it back. However, thanks to the flexibility of block ram, we can have 32-bit write mask to control exactly which bits get updated when writing to memory. For example, with a 4 colour bitmap, we can write a single 2-bit pixel or up to 16 pixels at once.

I'll discuss bram in general, and the vram design in particular, in a forthcoming post on Block RAM.

The Verilog vram design is surprisingly simple mem/vram.v (ref doc):

module vram #(
    parameter WORD=0,       // machine word size (bits)
    parameter ADDRW=0,      // address width ≥14 (bits)
    parameter FILE_BMAP=""  // optional initial bitmap to load
    ) (
    input  wire clk_sys,                // system clock
    input  wire clk_pix,                // pixel clock
    input  wire [WORD-1:0] wmask_sys,   // system write mask
    input  wire [ADDRW-1:0] addr_sys,   // system word address
    input  wire [WORD-1:0] din_sys,     // system data in
    output reg  [WORD-1:0] dout_sys,    // system data out
    input  wire [ADDRW-1:0] addr_disp,  // display word address
    output reg  [WORD-1:0] dout_disp    // display data out
    );

    localparam DEPTH=2**ADDRW;
    reg [WORD-1:0] vram_mem [0:DEPTH-1];

    initial begin
        if (FILE_BMAP != 0) begin
            $display("Load bitmap file '%s' into vram.", FILE_BMAP);
            $readmemh(FILE_BMAP, vram_mem);
        end
    end

    // system port (read-write, write_mode: no change)
    integer i;
    always @(posedge clk_sys) begin
        if (~|wmask_sys) dout_sys <= vram_mem[addr_sys];
        for (i=0; i<WORD; i=i+1) begin
            if (wmask_sys[i]) vram_mem[addr_sys][i] <= din_sys[i];
        end
    end

    // display port (read-only with output register)
    reg [WORD-1:0] dout_disp_reg;
    always @(posedge clk_pix) begin
        dout_disp_reg <= vram_mem[addr_disp];
        dout_disp <= dout_disp_reg;
    end
endmodule

Indexed Colour

Each pixel is represented by an index into the 15-bit colour palette. So, while the number of colours is limited, you still have a wide choice of colours. We send the colour index to the colour lookup table (CLUT), which returns the 15-bit RGB (RGB555) colour.

The CLUT module has a similar dual port design to vram, but without the write mask: mem/clut.v (ref doc).

Canvas & Buffer

We could setup a fixed relationship between memory and screen, memory address 0 is pixel (0,0), address 1 is pixel (1,0) etc. However, this is inflexible and creates problems when our bitmap size and resolution don't match the screen. A canvas is a simple abstraction that gives us more control over bitmap display.

A canvas displays a bitmap image at a particular location on the screen, known as the canvas window. For example, we can display a 256x256 bitmap starting at coordinates (32,48) and ending at (287, 303), so it's offset from the edge of the display.

Canvases support scaling in hardware, so our bitmap graphics can fill the screen even when it's lower resolution than the display. For example, if we scale a 512x384 bitmap by a factor of two, it'll fill a 1024x768 screen.

The actual pixel data is held in a canvas buffer in vram, and a canvas can have multiple buffers, commonly used for double buffering. With double buffering, we're free to paint whatever we like in one buffer, while the display shows the other. When painting is complete, we swap buffers. We'll see examples of double buffering in later chapters. Also in later chapters, we'll see how you combine multiple canvases of different resolutions, or overlay one canvas on top of another for animation and parallax effects.

|-------------------------------|
|  win_start            display |
|        *----------|           |
|        |  canvas  |           |
|        |          |           |
|        |----------*           |
|                    win_end    |
|                               |
|-------------------------------|

How canvas position is controlled by window signals (draft diagram).

Bitmap Display Chain

There are three steps to displaying a bitmap graphic (module):

  1. Calculate the pixel address in memory (canv_disp_agu)
  2. Read the pixel data from memory (vram)
  3. Read the RGB display colour from the palette (clut)

You can see these steps in the chapter 2 root design book/ch02/ch02.v (discussed in more detail shortly). We've already discussed the vram and clut, but address generation is the vital first step.

Each of the three steps in the bitmap display chain takes 2 cycles in the Isle design, so the latency is 6 cycles.

Canvas Display Address

The display needs to know which pixel to show for each display coordinate (dx,dy). We need to account for the canvas window position, scaling, the bit depth, and the latency of the different stages in the bitmap display chain. The canvas display Address Generation Unit (AGU) handles this.

Isle VRAM had a 32 bit data bus, but each canvas pixel might be 1, 2, 4, or 8 bits. The AGU calculates the VRAM address and where in the 32-bit word to find a particular pixel, this is the pixel ID. For example, with a 4 bit (16 colour) canvas, there are eight pixels in each 32-bit word.

I'm not going to dig into the AGU details in this blog post, but you see the Verilog source and reference in git: gfx/canv_disp_agu.v (ref doc).

Testing

I wanted to test the bitmap display logic end-to-end. It's easy to make mistakes with latency in the display chain, leading to pixels appearing in the wrong place or even the wrong colour. I created a test graphic, which has different coloured pixels in the corners of the image. We can use this test graphic two ways: with an automated test bench and to visually check the display on dev boards.

You can find the end-to-end test in isle/hardware/book/test and instructions on running tests in isle/docs/verilog-tests.md.

Here's a sample test run, showing some example results (trimmed for brevity):

$ make ch02
...
Load bitmap file '../../../res/bitmap/latency/latency.mem' into vram.
Load palette file '../../../res/bitmap/latency/latency_palette.mem' into clut.
4389244.00ns INFO     cocotb.ch02                        RGB(14, 8,17) at (   0,   0)
4389284.00ns INFO     cocotb.ch02                        RGB(12,19,31) at (   1,   0)
4389324.00ns INFO     cocotb.ch02                        RGB(31,30, 6) at (   2,   0)
...
16672004.00ns INFO     cocotb.ch02                        RGB(31,30, 6) at ( 669, 383)
16672044.00ns INFO     cocotb.ch02                        RGB(14, 8,17) at ( 670, 383)
16672084.00ns INFO     cocotb.ch02                        RGB(12,19,31) at ( 671, 383)
16704084.00ns INFO     cocotb.regression                  pixel_colour passed

Image is Everything

We don't (yet) have a CPU or a graphics engine, but we can load an image into vram at build time using the Verilog $readmemh function. A sample image lets us visually confirm bitmap graphics output is working, but also allows for automated testing (discussed above). We load the colour palette for the image into the CLUT, which performs the colour palette lookups.

Purple and white crocuses growing from green grass.
Test crocus image (336x192) included with Isle.

Creating Your Own Images

You can convert your own images into $readmemh format using img2fmem. You can find img2fmem in the Project F FPGA Tools repo. Ensure your image has the correct dimensions before conversion. img2fmem does not resize images; use your image editor to do this.

For example, to convert crocus.png to a 4 bit (16 colour) image with a 15-bit palette packed into 32-bit words (as required by Isle):

img2fmem.py crocus.png 4 mem 15 32

For details on installation and command-line options, see the img2fmem README.

Complete Design

For this chapter, I've created a simple design that shows the 16 colour crocus image with a 1366x768 display. However, all the hardware is in place to support 2, 4, 16, and 256 colour images at different resolutions. Create your own images and experiment with parameters in the top module for your board.

Each board has its own top module:

But it's in the chapter 2 root module that the interesting logic happens:

module ch02 #(
    parameter BPC=5,              // bits per colour channel
    parameter CORDW=16,           // signed coordinate width (bits)
    parameter DISPLAY_MODE=0,     // display mode (see display.v for modes)
    parameter BG_COLR=0,          // background colour (RGB555)
    parameter FILE_BMAP="",       // initial bitmap file for framebuffer
    parameter FILE_PAL="",        // initial palette for CLUT
    parameter CANV_BPP=0,         // canvas bits per pixel (4=16 colours)
    parameter CANV_SCALE=16'd0,   // canvas scaling factor
    parameter WIN_WIDTH=16'd0,    // canvas window width (pixel)
    parameter WIN_HEIGHT=16'd0,   // canvas window height (lines)
    parameter WIN_STARTX=16'd0,   // canvas window horizontal position (pixels)
    parameter WIN_STARTY=16'd0    // canvas window vertical position (lines)
    ) (
    input  wire clk,                        // system clock
    input  wire rst,                        // reset
    output reg  signed [CORDW-1:0] disp_x,  // horizontal display position
    output reg  signed [CORDW-1:0] disp_y,  // vertical display position
    output reg  disp_hsync,                 // horizontal display sync
    output reg  disp_vsync,                 // vertical display sync
    output reg  disp_de,                    // display data enable
    output reg  disp_frame,                 // high for one cycle at frame start
    output reg  [BPC-1:0] disp_r,           // red display channel
    output reg  [BPC-1:0] disp_g,           // green display channel
    output reg  [BPC-1:0] disp_b            // blue display channel
    );

    // vram - 16K x 32-bit (64 KiB) with bit write
    //   NB. Due to bit write, minimum depth is 64 KiB with 18 Kb bram
    localparam VRAM_ADDRW = 14;  // vram address width (bits)

    // internal system params
    localparam WORD = 32;  // machine word size (bits)
    localparam CIDX_ADDRW = 8;   // colour index address width 2^8 = 256 colours
    localparam COLRW = 3 * BPC;  // colour width across three channels (bits)
    localparam CANV_SHIFTW = 3;  // max shift is 5 bits (2^5 = 32 bits)
    localparam PIX_IDW=$clog2(WORD);  // pixel ID width (bits)

    // display signals
    wire signed [CORDW-1:0] dx, dy;
    wire hsync, vsync, de;
    wire frame_start, line_start;


    //
    // Video RAM (vram)
    //

    wire [VRAM_ADDRW-1:0] vram_addr_disp;
    wire [WORD-1:0] vram_dout_disp;

    vram #(
        .WORD(WORD),
        .ADDRW(VRAM_ADDRW),
        .FILE_BMAP(FILE_BMAP)
        ) vram_inst (
        .clk_sys(),
        .clk_pix(clk),
        .wmask_sys(),
        .addr_sys(),
        .din_sys(),
        .dout_sys(),
        .addr_disp(vram_addr_disp),
        .dout_disp(vram_dout_disp)
    );


    //
    // Canvas Display Address
    //

    localparam BMAP_LAT = 6;  // bitmap display latency: agu(2) + vram(2) + clut(2)
    wire [CANV_SHIFTW-1:0] disp_addr_shift;  // address shift based on canvas bits per pixel
    wire [VRAM_ADDRW-1:0] disp_addr;  // pixel memory address
    wire [$clog2(WORD)-1:0] disp_pix_id;  // pixel ID within word
    wire canv_paint;

    assign disp_addr_shift = 5 - $clog2(CANV_BPP);

    canv_disp_agu #(
        .CORDW(CORDW),
        .WORD(WORD),
        .ADDRW(VRAM_ADDRW),
        .BMAP_LAT(BMAP_LAT),
        .SHIFTW(CANV_SHIFTW)
    ) canv_disp_agu_inst (
        .clk_pix(clk),
        .rst_pix(rst),
        .frame_start(frame_start),
        .line_start(line_start),
        .dx(dx),
        .dy(dy),
        .addr_base(0),  // fixed base address for now
        .addr_shift(disp_addr_shift),
        .win_start({WIN_STARTY, WIN_STARTX}),
        .win_end({WIN_HEIGHT + WIN_STARTY, WIN_WIDTH + WIN_STARTX}),
        .scale({CANV_SCALE, CANV_SCALE}),
        .addr(disp_addr),
        .pix_id(disp_pix_id),
        .paint(canv_paint)
    );


    //
    // CLUT
    //

    reg  [CIDX_ADDRW-1:0] clut_addr_disp;
    wire [COLRW-1:0] clut_dout_disp;

    clut #(
        .ADDRW(CIDX_ADDRW),
        .DATAW(COLRW),
        .FILE_PAL(FILE_PAL)
    ) clut_inst (
        .clk_sys(),
        .clk_pix(clk),
        .we_sys(),
        .addr_sys(),
        .din_sys(),
        .dout_sys(),
        .addr_disp(clut_addr_disp),
        .dout_disp(clut_dout_disp)
    );


    //
    // Display Controller
    //

    display #(
        .CORDW(CORDW),
        .MODE(DISPLAY_MODE)
    ) display_inst (
        .clk_pix(clk),
        .rst_pix(rst),
        .hres(),
        .vres(),
        .dx(dx),
        .dy(dy),
        .hsync(hsync),
        .vsync(vsync),
        .de(de),
        .frame_start(frame_start),
        .line_start(line_start)
    );


    //
    // Painting & Display Output
    //

    assign vram_addr_disp = disp_addr;

    // CLUT lookup takes two cycles; delay disp_pix_id to match
    reg [PIX_IDW-1:0] pix_id_p1, pix_id_p2;
    always @(posedge clk) begin
        pix_id_p1 <= disp_pix_id;
        pix_id_p2 <= pix_id_p1;
    end

    // select pixel ID from word depending on colour depth
    reg [CIDX_ADDRW-1:0] pcidx_1, pcidx_2, pcidx_4, pcidx_8;
    always @(*) begin
        pcidx_1 = (vram_dout_disp >> pix_id_p2)        & 'b1;
        pcidx_2 = (vram_dout_disp >> (pix_id_p2 << 1)) & 'b11;
        pcidx_4 = (vram_dout_disp >> (pix_id_p2 << 2)) & 'b1111;
        pcidx_8 = (vram_dout_disp >> (pix_id_p2 << 3)) & 'b11111111;
        case (CANV_BPP)
            1: clut_addr_disp = pcidx_1;
            2: clut_addr_disp = pcidx_2;
            4: clut_addr_disp = pcidx_4;
            8: clut_addr_disp = pcidx_8;
            default: clut_addr_disp = pcidx_4;
        endcase
    end

    reg [BPC-1:0] paint_r, paint_g, paint_b;
    always @(*) {paint_r, paint_g, paint_b} = canv_paint ? clut_dout_disp : BG_COLR;

    // register display signals
    always @(posedge clk) begin
        disp_x <= dx;
        disp_y <= dy;
        disp_hsync <= hsync;
        disp_vsync <= vsync;
        disp_de <= de;
        disp_frame <= frame_start;
        disp_r <= (de) ? paint_r : 'h0;  // paint colour but black in blanking
        disp_g <= (de) ? paint_g : 'h0;
        disp_b <= (de) ? paint_b : 'h0;
    end
endmodule

The system port on vram and clut isn't used in this chapter, but will support the graphics engine and CPU.

Extracting a Pixel

In the Painting & Display Output section of ch02.v we extract the pixel colour index from the VRAM data before passing it to the colour lookup table.

A canvas could be 1, 2, 4, or 8 bit. In this chapter, it's hard-coded at design time, but when we introduce a CPU you'll be able to change it at runtime. To handle different bit depths, we extract the pixel for all potential bit depths then select the one we want.

For example, let's say we want the 7th pixel from the 32-bit data returned from vram. For a 1-bit canvas, we right shift the vram data by 7 bits (7 × 1) then AND it with 1 to select a single bit. For a 4-bit canvas, we right shift the vram data by the 28 bits (7 × 4), then AND it with 15 (1111 in binary) to select four bits.

Purple and white crocuses on monitor with dev board.
16-colour crocus image displayed at 1366x768 from Lakritz dev board.

Ready to Draw

This chapter is pretty dry, but with our bitmap graphics in place, we're ready to introduce our 2D graphics engine Earthrise. Earthrise is a simple processor that decodes and executes graphics instructions for pixels, lines, triangles, rects, and circles. Look out for the Earthrise designs and blog post coming soon.

Next step: 2D Drawing (under construction) or Isle Index

You can sponsor me to support Isle development and get early access to new chapters and designs.