28 October 2020

Hardware Sprites

Welcome back to Exploring FPGA Graphics. In the previous part, we recreated Pong. In this part, we learn how to create colourful animated graphics with hardware sprites. Hardware sprites maintain much of the simplicity of our Pong design while offering much greater creative freedom. In the next part, we’ll create a demo that gives a taste of what’s possible with sprites.

In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020. New to the series? Start with Exploring FPGA Graphics.

Updated 2020-11-25. Get in touch with @WillFlux or open an issue on GitHub.

Series Outline

  • Exploring FPGA Graphics - learn how displays work and animate simple shapes
  • FPGA Pong - race the beam to create the arcade classic
  • Hardware Sprites (this post) - fast, colourful, graphics with minimal resources
  • FPGA Ad Astra - demo with hardware sprites and animated starfields
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life

More parts to follow.

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. You should be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing with these boards:

Source

The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.

What is a Sprite?

A sprite is a graphics object that can be moved and animated independently of the background and other sprites. Hardware sprites use dedicated logic for drawing, and until the mid-90s they were an essential part of computer graphics. Hardware sprites are a good fit for an FPGA as they’re easy to control, and we can scale them to fit our game design: whether we want hundreds of tiny sprites or a few huge ones. Hardware sprites are also useful for cursors or pointers in professional applications, providing a responsive UI without complex screen redrawing.

A Simple Sprite

We’re going to start with a small 8x8 pixel sprite with just two colours.

We’ll load the sprite into FPGA memory using a simple text format. Each line is simply composed of eight 1s or 0s separated by spaces. I’m going to start with the letter ‘F’ and a full stop (period) as a sprite. It’s a simple, asymmetric, design, making it easier to spot bugs (such as incorrect orientation or pixels being missed off):

F.

The text file to initialize the sprite memory looks like this [letter_f.mem]:

1 1 1 1 1 1 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 1 1
0 0 0 0 0 0 1 1

We read the binary text format into FPGA memory with $readmemb (note the ‘b’ for binary); you can see this in action in the sprite_v1 module listing, below. If you want to know more about loading data into memory, see Initialize Memory in Verilog.

Simple Sprite Drawing

Before starting to design hardware, we should consider what steps we go through in drawing a sprite at position X,Y on the screen (where X,Y is the top left of the sprite). The screen is drawn from the top left, a line at a time, so the steps are:

  1. Wait for the screen to reach the vertical sprite position (Y)
  2. Start sprite
  3. Wait for the horizontal sprite position (X)
  4. Draw a line of sprite pixels
  5. If we’re not done, then go to step 3
  6. Sprite is complete

This process is well represented by our old friend, the finite state machine (FSM). In fact, our first sprite design is little more than a simple finite state machine and a small memory array [sprite_v1.sv]:

module sprite_v1 #(
    parameter WIDTH=8,            // graphic width in pixels
    parameter HEIGHT=8,           // graphic height in pixels
    parameter SPR_FILE="",        // file to load sprite graphic from
    parameter CORDW=10,           // screen coordinate width in bits
    parameter DEPTH=WIDTH*HEIGHT  // depth of memory array holding graphic
    ) (
    input  wire logic clk,               // clock
    input  wire logic rst,               // reset
    input  wire logic start,             // start control
    input  wire logic [CORDW-1:0] sx,    // horizontal screen position
    input  wire logic [CORDW-1:0] sprx,  // horizontal sprite position
    output      logic pix                // pixel colour to draw
    );

    logic memory [DEPTH];  // 1 bit per pixel

    // load sprite graphic into memory (binary text format)
    initial begin
        if (SPR_FILE != 0) begin
            $display("Creating sprite from file '%s'.", SPR_FILE);
            $readmemb(SPR_FILE, memory);
        end
    end

    // position within memory array
    logic [$clog2(DEPTH)-1:0] pos;

    // position within sprite
    logic [$clog2(WIDTH)-1:0]  ox;
    logic [$clog2(HEIGHT)-1:0] oy;

    enum {
        IDLE,       // awaiting start signal
        START,      // prepare for new sprite drawing
        AWAIT_POS,  // await horizontal position
        DRAW,       // draw pixel
        NEXT_LINE   // prepare for next sprite line
    } state, state_next;

    always_ff @(posedge clk) begin
        state <= state_next;  // advance to next state

        if (state == START) begin
            oy <= 0;
            pos <= 0;
        end

        if (state == AWAIT_POS) begin
            ox <= 0;
        end

        if (state == DRAW) begin
            ox <= ox + 1;
            pos <= pos + 1;
        end

        if (state == NEXT_LINE) begin
            oy <= oy + 1;
        end

        if (rst) begin
            state <= IDLE;
            ox <= 0;
            oy <= 0;
            pos <= 0;
        end
    end

    // output current pixel colour when drawing
    always_comb begin
        pix = (state == DRAW) ? memory[pos] : 0;
    end

    // create status signals
    logic last_pixel, last_line;
    always_comb begin
        last_pixel = (ox == WIDTH-1);
        last_line  = (oy == HEIGHT-1);
    end

    // determine next state
    always_comb begin
        case(state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx) ? DRAW : AWAIT_POS;
            DRAW:       state_next = !last_pixel ? DRAW : (!last_line ? NEXT_LINE : IDLE);
            NEXT_LINE:  state_next = AWAIT_POS;
            default:    state_next = IDLE;
        endcase
    end
endmodule

The module does nothing until it receives a start signal, then it draws the sprite line-by-line. We maintain a separate value pos for the memory array, rather than using an expensive multiplication to calculate the address from ox and oy.

DRAW: state_next = !last_pixel ? DRAW : (!last_line ? NEXT_LINE : IDLE);

In the next-state logic, we’ve nested a conditional operator. Nested conditional operators are confusing, so best avoided. However, in this case, I think it reads naturally. If we’re not reached the last pixel, keep drawing. If we’ve not arrived at the final line, go to the next line. Otherwise idle.

To see our static sprite in action, we need a top module to drive it. The main parts of the sprite top module should be familiar to you from the previous parts: it uses the same clock generator and display timings as Exploring FPGA Graphics.

iCE40 version show below:

module top_sprite_v1 (
    input  wire logic clk_12m,      // 12 MHz clock
    input  wire logic btn_rst,      // reset button (active high)
    output      logic dvi_clk,      // DVI pixel clock
    output      logic dvi_hsync,    // DVI horizontal sync
    output      logic dvi_vsync,    // DVI vertical sync
    output      logic dvi_de,       // DVI data enable
    output      logic [3:0] dvi_r,  // 4-bit DVI red
    output      logic [3:0] dvi_g,  // 4-bit DVI green
    output      logic [3:0] dvi_b   // 4-bit DVI blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_12m),
       .rst(btn_rst),
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic de;
    display_timings timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync(dvi_hsync),
        .vsync(dvi_vsync),
        .de
    );

    // sprite
    localparam SPR_WIDTH  = 8;  // width in pixels
    localparam SPR_HEIGHT = 8;  // number of lines
    localparam SPR_FILE = "../res/simple/letter_f.mem";
    logic spr_start;
    logic spr_pix;

    // draw sprite at position
    localparam DRAW_X = 16;
    localparam DRAW_Y = 16;

    // signal to start sprite drawing
    always_comb begin
        spr_start = (sy == DRAW_Y && sx == 0);
    end

    sprite_v1 #(
        .WIDTH(SPR_WIDTH),
        .HEIGHT(SPR_HEIGHT),
        .SPR_FILE(SPR_FILE)
    ) spr_instance (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(spr_start),
        .sx,
        .sprx(DRAW_X),
        .pix(spr_pix)
    );

    // DVI clock output
    SB_IO #(
        .PIN_TYPE(6'b010000)
    ) dvi_clk_buf (
        .PACKAGE_PIN(dvi_clk),
        .CLOCK_ENABLE(1'b1),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0(1'b0),
        .D_OUT_1(1'b1)
    );

    // DVI output
    always_ff @(posedge clk_pix) begin
        dvi_de <= de;
        dvi_r <= (de && spr_pix) ? 4'hF: 4'h0;
        dvi_g <= (de && spr_pix) ? 4'hC: 4'h0;
        dvi_b <= (de && spr_pix) ? 4'h0: 4'h0;
    end
endmodule

The sprite is rendered in colour #FFCC00 in the video output.

We start the sprite drawing with the following logic (first pixel on line DRAW_Y):

spr_start = (sy == DRAW_Y && sx == 0);

We pass the horizontal position of the screen, sx, to the sprite module, so it can wait for the correct horizontal drawing position.

Building the Designs
In the Hardware Sprites section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

Program your board, and you should see a small golden letter ‘F’ and a dot towards the top left of the screen. From these tiny beginnings, mighty sprites will grow.

One Off

The observant amongst you might have noticed that our horizontal sprite position is off by one. The problem is our state machine waits for the screen to reach the sprite position, then it switches to drawing for the next pixel. You can see that in the next-state logic:

AWAIT_POS: state_next = (sx == sprx) ? DRAW : AWAIT_POS;

The obvious thing is to wait for the screen position sx to be the pixel before the one we want to draw. However, if we want to draw at horizontal coordinate 0, the pixel before is at position 799 on the line before, so we need to correct both our horizontal and vertical sprite positions.

Update the start signal logic in the top module; [xc7/top_sprite_v2.sv] or [ice40/top_sprite_v2.sv]:

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    // draw sprite at position
    localparam DRAW_X = 0;
    localparam DRAW_Y = 0;

    // start sprite in blanking of line before first line drawn
    logic [CORDW-1:0] draw_y_cor;  // corrected for wrapping
    always_comb begin
        draw_y_cor = (DRAW_Y == 0) ? V_RES_FULL - 1 : DRAW_Y - 1;
        spr_start = (sy == draw_y_cor && sx == H_RES);
    end

And the AWAIT_POS logic in the sprite module, creating [sprite_v2.sv]:

    // create status signals and correct horizontal position
    logic last_pixel, last_line;
    logic [CORDW-1:0] sprx_cor;
    always_comb begin
        last_pixel = (ox == WIDTH-1);
        last_line  = (oy == HEIGHT-1);
        sprx_cor = (sprx == 0) ? H_RES_FULL - 1 : sprx - 1;
    end

    // determine next state
    always_comb begin
        case(state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx_cor) ? DRAW : AWAIT_POS;
            DRAW:       state_next = !last_pixel ? DRAW : (!last_line ? NEXT_LINE : IDLE);
    // ...

Build the v2 designs, and you should see the F dot sprite at the very top left of the screen.

Scale Up

Now we can position our sprite correctly, it’s time to make it bigger. We make larger sprites by increasing the size of the design. However, it’s also useful to be able to scale our sprites up when drawing them. A scaled-up sprite will be blocky but will use few resources and allows a design to work at different screen resolutions.

To scale our sprite, we count additional screen pixels and lines when drawing the sprite using cnt_x and cnt_y respectively. The new module is [sprite_v3.sv]:

module sprite_v3 #(
    parameter WIDTH=8,            // graphic width in pixels
    parameter HEIGHT=8,           // graphic height in pixels
    parameter SCALE_X=1,          // sprite width scale-factor
    parameter SCALE_Y=1,          // sprite height scale-factor
    parameter SPR_FILE="",        // file to load sprite graphic from
    parameter CORDW=10,           // screen coordinate width in bits
    parameter H_RES_FULL=800,     // horizontal screen resolution inc. blanking
    parameter DEPTH=WIDTH*HEIGHT  // depth of memory array holding graphic
    ) (
    input  wire logic clk,               // clock
    input  wire logic rst,               // reset
    input  wire logic start,             // start control
    input  wire logic [CORDW-1:0] sx,    // horizontal screen position
    input  wire logic [CORDW-1:0] sprx,  // horizontal sprite position
    output      logic pix                // pixel colour to draw
    );

    logic memory [DEPTH];  // 1 bit per pixel

    // load sprite graphic into memory (binary text format)
    initial begin
        if (SPR_FILE != 0) begin
            $display("Creating sprite from file '%s'.", SPR_FILE);
            $readmemb(SPR_FILE, memory);
        end
    end

    // position within memory array
    logic [$clog2(DEPTH)-1:0] pos;

    // position within sprite
    logic [$clog2(WIDTH)-1:0]  ox;
    logic [$clog2(HEIGHT)-1:0] oy;

    // scale counters
    logic [$clog2(SCALE_X)-1:0] cnt_x;
    logic [$clog2(SCALE_Y)-1:0] cnt_y;

    enum {
        IDLE,       // awaiting start signal
        START,      // prepare for new sprite drawing
        AWAIT_POS,  // await horizontal position
        DRAW,       // draw pixel
        NEXT_LINE   // prepare for next sprite line
    } state, state_next;

    always_ff @(posedge clk) begin
        state <= state_next;  // advance to next state

        if (state == START) begin
            oy <= 0;
            cnt_y <= 0;
            pos <= 0;
        end

        if (state == AWAIT_POS) begin
            ox <= 0;
            cnt_x <= 0;
        end

        if (state == DRAW) begin
            if (SCALE_X <= 1 || cnt_x == SCALE_X-1) begin
                ox <= ox + 1;
                cnt_x <= 0;
                pos <= pos + 1;
            end else begin
                cnt_x <= cnt_x + 1;
            end
        end

        if (state == NEXT_LINE) begin
            if (SCALE_Y <= 1 || cnt_y == SCALE_Y-1) begin
                oy <= oy + 1;
                cnt_y <= 0;
            end else begin
                cnt_y <= cnt_y + 1;
                pos <= pos - WIDTH;  // go back to start of line
            end
        end

        if (rst) begin
            state <= IDLE;
            ox <= 0;
            oy <= 0;
            cnt_x <= 0;
            cnt_y <= 0;
            pos <= 0;
        end
    end

    // output current pixel colour when drawing
    always_comb begin
        pix = (state == DRAW) ? memory[pos] : 0;
    end

    // create status signals and correct horizontal position
    logic last_pixel, last_line;
    logic [CORDW-1:0] sprx_cor;
    always_comb begin
        last_pixel = (ox == WIDTH-1 && cnt_x == SCALE_X-1);
        last_line  = (oy == HEIGHT-1 && cnt_y == SCALE_Y-1);
        sprx_cor = (sprx == 0) ? H_RES_FULL - 1 : sprx - 1;
    end

    // determine next state
    always_comb begin
        case(state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx_cor) ? DRAW : AWAIT_POS;
            DRAW:       state_next = !last_pixel ? DRAW : (!last_line ? NEXT_LINE : IDLE);
            NEXT_LINE:  state_next = AWAIT_POS;
            default:    state_next = IDLE;
        endcase
    end
endmodule

We can then drive this with a small change to our top module:

Build the v3 design with scaling. This design has hard-coded scale parameters, SCALE_X and SCALE_Y, but these could easily be made inputs to the module to allow for changes at run time.

Motion

It’s time we got our sprites moving. Now we have our basic sprite working, I’m going to use a new design, which I’m charitably calling a flying saucer:

Flying Saucer

The memory initilization file is [saucer.mem]:

0 0 1 1 1 1 0 0
0 1 1 0 0 1 1 0
1 1 0 1 1 0 1 1
1 0 1 1 1 1 0 1
1 0 1 1 1 1 0 1
1 1 0 1 1 0 1 1
0 1 1 0 0 1 1 0
0 0 1 1 1 1 0 0

I’m sure you can come up with something better: create your own design following the same format as above. You don’t have to limit yourself to 8x8 pixels, just be sure to update the width and height in the top module (see below).

To make it easy to build your own design I’ve added an empty sprite with the name [user.mem]. If you replace this file in res/simple/ with your own design it will automatically be included in projects and makefiles. See the Hardware Sprites section of the git repo for build instructions.

To move our sprite, I’ve borrowed the horizontal bouncing logic from our first part, yet again, to create:

Build this version of the project. You should see your sprite bounce back and forth across the screen. If you created your own sprite, remember to update the sprite filename, SPR_FILE, in top_sprite_v3a.sv. You can also tweak the sprite height, width, and scale as you like.

Colourful?

The introduction promised “colourful animated graphics”: it’s time to make good on this by increasing our colour depth. Rather than continue to inflict my drawing skills on you, I’m using the adorable hedgehog from the Amiga platformer, Superfrog.

Hedgehog

This graphic is 32x20 pixels in size, so has 640 pixels. The original Amiga game uses 32 colours, of which the hedgehog uses ten, plus one transparent colour. To allow for 11 colours we need four bits per pixel.

The memory requirement for this sprite is: 32 x 20 x 4 = 2,560 or 2.5 kilobits.

The memory file is similar to that for our monochrome sprites, but instead of pixels being 0 or 1, they’re 0 to F. I use a tool called img2fmem to convert a PNG of the image to this text format: hedgehog.mem. To read this hex text data into memory we use $readmemh (note the ‘h’ for hex).

Quick Aside: Indexed Colour
This design was common in older computers, for example, the original Amiga chipset supported 32 colours from a possible 4,096: very similar to our design! The GIF and PNG formats still make use of this approach to squeeze the best quality out of 256-colour images.

More Memory

If we’re going to make larger sprites with more colour bits, we’re going to need to rethink our memory design. A simple register array suffices for small designs but is a resource hog and timing disaster for larger sprites. FPGAs include block ram (BRAM), which is ideal for memories of a few hundred bits to a few tens of kilobits.

Our sprites designs don’t change at runtime, so we can create a ROM using block RAM. The ROM takes a clock and address as input, and outputs the requested data the following clock cycle. This extra latency has implications for correct positioning of our sprite, which we’ll discuss shortly. Check out FPGA Memory Types to learn more about BRAM and other memory.

Synchronous ROM using BRAM [rom_sync.sv]:

module rom_sync #(
    parameter WIDTH=8,
    parameter DEPTH=256,
    parameter INIT_F="",
    localparam ADDRW=$clog2(DEPTH)
    ) (
    input wire logic clk,
    input wire logic [ADDRW-1:0] addr,
    output     logic [WIDTH-1:0] data
    );

    logic [WIDTH-1:0] memory [DEPTH];

    initial begin
        if (INIT_F != 0) begin
            $display("Creating rom_sync from init file '%s'.", INIT_F);
            $readmemh(INIT_F, memory);
        end
    end

    always_ff @(posedge clk) begin
        data <= memory[addr];
    end
endmodule

Once you’ve created the appropriate memory module, we can create the final sprite module for this part [sprite.sv]:

module sprite #(
    parameter WIDTH=8,         // graphic width in pixels
    parameter HEIGHT=8,        // graphic height in pixels
    parameter SCALE_X=1,       // sprite width scale-factor
    parameter SCALE_Y=1,       // sprite height scale-factor
    parameter COLR_BITS=4,     // bits per pixel (2^4=16 colours)
    parameter CORDW=10,        // screen coordinate width in bits
    parameter H_RES_FULL=800,  // horizontal screen resolution inc. blanking
    parameter ADDRW=6          // width of graphic memory address bus
    ) (
    input  wire logic clk,                      // clock
    input  wire logic rst,                      // reset
    input  wire logic start,                    // start control
    input  wire logic [CORDW-1:0] sx,           // horizontal screen position
    input  wire logic [CORDW-1:0] sprx,         // horizontal sprite position
    input  wire logic [COLR_BITS-1:0] data_in,  // data from external memory
    output      logic [ADDRW-1:0] pos,          // sprite pixel position
    output      logic [COLR_BITS-1:0] pix,      // pixel colour to draw
    output      logic draw,                     // signal sprite is drawing
    output      logic done                      // signal sprite drawing is complete
    );

    // position within sprite
    logic [$clog2(WIDTH)-1:0]  ox;
    logic [$clog2(HEIGHT)-1:0] oy;

    // scale counters
    logic [$clog2(SCALE_X)-1:0] cnt_x;
    logic [$clog2(SCALE_Y)-1:0] cnt_y;

    enum {
        IDLE,       // awaiting start signal
        START,      // prepare for new sprite drawing
        AWAIT_POS,  // await horizontal position
        DRAW,       // draw pixel
        NEXT_LINE,  // prepare for next sprite line
        DONE        // set done signal, then go idle
    } state, state_next;

    always_ff @(posedge clk) begin
        state <= state_next;  // advance to next state

        if (state == START) begin
            done <= 0;
            oy <= 0;
            cnt_y <= 0;
            pos <= 0;
        end

        if (state == AWAIT_POS) begin
            ox <= 0;
            cnt_x <= 0;
        end

        if (state == DRAW) begin
            if (SCALE_X <= 1 || cnt_x == SCALE_X-1) begin
                ox <= ox + 1;
                cnt_x <= 0;
                pos <= pos + 1;
            end else begin
                cnt_x <= cnt_x + 1;
            end
        end

        if (state == NEXT_LINE) begin
            if (SCALE_Y <= 1 || cnt_y == SCALE_Y-1) begin
                oy <= oy + 1;
                cnt_y <= 0;
            end else begin
                cnt_y <= cnt_y + 1;
                pos <= pos - WIDTH;  // go back to start of line
            end
        end

        if (state == DONE) begin
            done <= 1;
        end

        if (rst) begin
            state <= IDLE;
            ox <= 0;
            oy <= 0;
            cnt_x <= 0;
            cnt_y <= 0;
            pos <= 0;
            done <= 0;
        end
    end

    // output current pixel colour when drawing
    always_comb begin
        pix = (state == DRAW) ? data_in : 0;
    end

    // create status signals and correct horizontal position
    logic last_pixel, last_line;
    logic [CORDW-1:0] sprx_cor;
    always_comb begin
        last_pixel = (ox == WIDTH-1 && cnt_x == SCALE_X-1);
        last_line  = (oy == HEIGHT-1 && cnt_y == SCALE_Y-1);
        draw = (state == DRAW);

        // BRAM adds an extra cycle of latency
        case (sprx)
            0: sprx_cor = H_RES_FULL - 2;
            1: sprx_cor = H_RES_FULL - 1;
            default: sprx_cor = sprx - 2;
        endcase
    end

    // determine next state
    always_comb begin
        case(state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx_cor) ? DRAW : AWAIT_POS;
            DRAW:       state_next = !last_pixel ? DRAW : (!last_line ? NEXT_LINE : DONE);
            NEXT_LINE:  state_next = AWAIT_POS;
            DONE:       state_next = IDLE;
            default:    state_next = IDLE;
        endcase
    end
endmodule

There are several significant changes to the sprite module compared to v3:

External Memory

We’ve moved the memory interface outside the sprite module. The sprite module sends the desired position using the pos output and receives the pixel data on the data_in input. In the next part, we’ll take this a step further; multiple sprites will share one memory interface.

Latency

The block ram adds an additional cycle of latency, so we need to subtract two from the horizontal position. This complicates the wrapping a little, so we use a case statement to handle it:

// BRAM adds an extra cycle of latency
case (sprx)
    0: sprx_cor = H_RES_FULL - 2;
    1: sprx_cor = H_RES_FULL - 1;
    default: sprx_cor = sprx - 2;
endcase

Drawing & Done

For better control and reuse of sprite instances, we’ve added two new signals: draw is high when the sprite is drawing pixels, and done indicates the sprite is complete.

A Refined Palette

Our boards have 12-bit colour output, supporting 4,096 colours. We can map our 11 sprite colours to any of these using a colour lookup table (CLUT). We populate the colour lookup table using a simple text file [hedgehog_palette.mem]:

CCC AAA 888 874 763 651 540 330 111 F0F 000

Hedgehog Palette

The hedgehog sprite has ten drawing colours, with an additional colour F0F (magenta) used for transparency. These colours are 12-bit in hex format: RGB; the same as a web colour hex triplet.

CLUT & Display Output

The logic for the CLUT is straightforward: we only have 11 colours, so a simple memory array suffices. The Xilinx version is show below:

    // Colour Lookup Table
    logic [11:0] clut [11];  // 11 x 12-bit colour palette entries
    initial begin
        $display("Loading palette '%s' into CLUT.", SPR_PALETTE);
        $readmemh(SPR_PALETTE, clut);  // load palette into CLUT
    end

    // map colour index to palette using CLUT
    logic pix_trans;                // pixel transparent?
    logic [3:0] red, green, blue;   // pixel colour components
    always_comb begin
        pix_trans = (spr_pix == SPR_TRANS);
        {red, green, blue} = clut[spr_pix];
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_r <= (de && spr_draw && !pix_trans) ? red   : 4'h0;
        vga_g <= (de && spr_draw && !pix_trans) ? green : 4'h0;
        vga_b <= (de && spr_draw && !pix_trans) ? blue  : 4'h0;
    end

We take the colour index provided by the sprite module, spr_pix, and use it look up the red, green, and blue components of the pixel colour. We also check to see whether the pixel colour matches the transparant colour specified in SPR_TRANS: we don’t want to draw anything if the pixel is transparent.

Top Hedgehog

We’re now ready to draw our hedgehog using a new top module:

Animation

Our hedgehog looks like it’s on ice. To complete our sprite design we need to add animation support so the hedgehog can move its legs. The required change is surprisingly small; we just need to load all three hedgehog images into memory and add offset the memory position to choose which image to display.

Walking Hedgehog

Quick Aside: Your Own Colour Graphics
You’ll learn how to create your own memory files from images using img2fmem later in this series.

The animated sprite graphic has three images stacked vertically, so each image is contiguous in memory hedgehog_walk.mem:

Hedgehog Frames

Every display frame we increment the counter cnt_anim. When the counter hits specific values, we update the sprite base address, spr_base_addr, to select a different image in the sprite graphic.

    // sprite frame selector
    logic [5:0] cnt_anim;  // count from 0-63
    always_ff @(posedge clk_pix) begin
        if (animate) begin
            // select sprite frame
            cnt_anim <= cnt_anim + 1;
            case (cnt_anim)
                0: spr_base_addr <= 0;
                15: spr_base_addr <= SPR_PIXELS;
                31: spr_base_addr <= 0;
                47: spr_base_addr <= 2 * SPR_PIXELS;
                default: spr_base_addr <= spr_base_addr;
            endcase

The rom address uses the sprite base address, spr_base_addr, to select the right image:

    // sprite ROM
    logic [COLR_BITS-1:0] spr_rom_data;
    logic [SPR_ADDRW-1:0] spr_rom_addr, spr_base_addr;
    rom_sync #(
        .WIDTH(COLR_BITS),
        .DEPTH(SPR_DEPTH),
        .INIT_F(SPR_FILE)
    ) spr_rom (
        .clk(clk_pix),
        .addr(spr_base_addr + spr_rom_addr),
        .data(spr_rom_data)
    );

Our completed top module animates the hedgehog at approximently four frames per second:

Xilinx version shown below:

module top_hedgehog (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic de;
    display_timings timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync(vga_hsync),
        .vsync(vga_vsync),
        .de
    );

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    logic animate;  // high for one clock tick at start of blanking
    always_comb animate = (sy == V_RES && sx == 0);

    // sprite
    localparam SPR_WIDTH    = 32;   // width in pixels
    localparam SPR_HEIGHT   = 20;   // number of lines
    localparam SPR_SCALE_X  = 4;    // width scale-factor
    localparam SPR_SCALE_Y  = 4;    // height scale-factor
    localparam COLR_BITS    = 4;    // bits per pixel (2^4=16 colours)
    localparam SPR_TRANS    = 9;    // transparent palette entry
    localparam SPR_FRAMES   = 3;    // number of frames in graphic
    localparam SPR_FILE     = "hedgehog_walk.mem";
    localparam SPR_PALETTE  = "hedgehog_palette.mem";

    localparam SPR_PIXELS = SPR_WIDTH * SPR_HEIGHT;
    localparam SPR_DEPTH  = SPR_PIXELS * SPR_FRAMES;
    localparam SPR_ADDRW  = $clog2(SPR_DEPTH);

    logic spr_start, spr_draw;
    logic [COLR_BITS-1:0] spr_pix;

    // sprite ROM
    logic [COLR_BITS-1:0] spr_rom_data;
    logic [SPR_ADDRW-1:0] spr_rom_addr, spr_base_addr;
    rom_sync #(
        .WIDTH(COLR_BITS),
        .DEPTH(SPR_DEPTH),
        .INIT_F(SPR_FILE)
    ) spr_rom (
        .clk(clk_pix),
        .addr(spr_base_addr + spr_rom_addr),
        .data(spr_rom_data)
    );

    // draw sprite at position
    localparam SPR_SPEED_X = 2;
    localparam SPR_SPEED_Y = 0;
    logic [CORDW-1:0] sprx, spry;

    // sprite frame selector
    logic [5:0] cnt_anim;  // count from 0-63
    always_ff @(posedge clk_pix) begin
        if (animate) begin
            // select sprite frame
            cnt_anim <= cnt_anim + 1;
            case (cnt_anim)
                0: spr_base_addr <= 0;
                15: spr_base_addr <= SPR_PIXELS;
                31: spr_base_addr <= 0;
                47: spr_base_addr <= 2 * SPR_PIXELS;
                default: spr_base_addr <= spr_base_addr;
            endcase

            // walk right-to-left (correct position for screen width)
            sprx <= (sprx > SPR_SPEED_X) ? sprx - SPR_SPEED_X :
                                           H_RES_FULL - (SPR_SPEED_X - sprx);
        end
        if (!clk_locked) begin
            sprx <= 0;
            spry <= 200;
        end
    end

    // start sprite in blanking of line before first line drawn
    logic [CORDW-1:0] spry_cor;  // corrected for wrapping
    always_comb begin
        spry_cor = (spry == 0) ? V_RES_FULL - 1 : spry - 1;
        spr_start = (sy == spry_cor && sx == H_RES);
    end

    sprite #(
        .WIDTH(SPR_WIDTH),
        .HEIGHT(SPR_HEIGHT),
        .COLR_BITS(COLR_BITS),
        .SCALE_X(SPR_SCALE_X),
        .SCALE_Y(SPR_SCALE_Y),
        .ADDRW(SPR_ADDRW)
        ) spr_instance (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(spr_start),
        .sx,
        .sprx,
        .data_in(spr_rom_data),
        .pos(spr_rom_addr),
        .pix(spr_pix),
        .draw(spr_draw),
        .done()
    );

    // Colour Lookup Table
    logic [11:0] clut [11];  // 11 x 12-bit colour palette entries
    initial begin
        $display("Loading palette '%s' into CLUT.", SPR_PALETTE);
        $readmemh(SPR_PALETTE, clut);  // load palette into CLUT
    end

    // map colour index to palette using CLUT
    logic pix_trans;                // pixel transparent?
    logic [3:0] red, green, blue;   // pixel colour components
    always_comb begin
        pix_trans = (spr_pix == SPR_TRANS);
        {red, green, blue} = clut[spr_pix];
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_r <= (de && spr_draw && !pix_trans) ? red   : 4'h0;
        vga_g <= (de && spr_draw && !pix_trans) ? green : 4'h0;
        vga_b <= (de && spr_draw && !pix_trans) ? blue  : 4'h0;
    end
endmodule

Efficient

You can get a feel for how efficient hardware sprites are by looking at timings and resource usage for the final animated hedgehog. On the iCEBreaker the timing estimate is a healthy 24.27 ns (41.20 MHz) with two BRAMs and just 195 LUTs used for the complete design. These designs haven’t been optimised, so there’s no doubt scope for further improvement. Whatever your FPGA board, hardware sprites are an efficient way to create graphics.

Explore

I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few sprite suggestions:

  • Design your own 1-bit space ship and asteroid sprites
  • Use buttons to control the position of a sprite on screen
  • Replace SCALE_X and SCALE_Y with input signals so the sprite scale can be adjusted at runtime
  • Draw the numbers 0-9 as small pixels sprites and add a score to FPGA Pong

Feedback is most welcome; you can get in touch with @WillFlux or open an issue on GitHub.

Next Time

In the next part, we’ll create a demo using hardware sprites and animated starfields in FPGA Ad Astra.

©2020 Will Green, Project F