28 October 2020

Hardware Sprites

Welcome back to Exploring FPGA Graphics. In the previous part, we recreated Pong. In this part, we learn how to create colourful animated graphics with hardware sprites. Hardware sprites maintain much of the simplicity of our Pong design while offering much greater creative freedom. In the next part, we’ll create a demo that gives a taste of what’s possible with sprites.

In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how displays work, race the beam with Pong, animate starfields and sprites, paint Michelangelo’s David, simulate life with bitmaps, draw lines and shapes, and create smooth animation with double buffering. New to the series? Start with Intro to FPGA Graphics.

You can watch an FPGA Graphics demo reel with designs from across this series.

Updated 2021-10-19. Get in touch with @WillFlux or open an issue on GitHub.

Series Outline

  • Intro to FPGA Graphics - draw your first FPGA graphics
  • Pong - race the beam to create the arcade classic
  • Hardware Sprites (this post) - fast, colourful, graphics with minimal logic
  • Ad Astra - graphics demo with starfields and hardware sprites
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life
  • Lines and Triangles - drawing lines and triangles with a framebuffer
  • 2D Shapes - filling shapes and drawing pictures
  • Animated Shapes - animation and double-buffering

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing with these boards:

Source

The SystemVerilog designs featured in this series are available from the projf-explore git repo under the open-source MIT licence: build on them to your heart’s content. The rest of the blog content is subject to standard copyright restrictions: don’t republish it without permission.

What is a Sprite?

A sprite is a graphics object that can be moved and animated independently of the background and other sprites. Hardware sprites use dedicated logic for drawing, and until the mid-90s they were an essential part of computer graphics. Hardware sprites are a good fit for an FPGA as they’re easy to control, and we can scale them to fit our game design: whether we want hundreds of tiny sprites or a few huge ones. Hardware sprites are also useful for cursors or pointers in professional applications, providing a responsive UI without complex screen redrawing.

A Simple Sprite

We’re going to start with a small 8x8 pixel sprite with just two colours.

We’ll load the sprite into FPGA memory using a simple text format. Each line is simply composed of eight 1s or 0s separated by spaces. I’m going to start with the letter ‘F’ and a full stop (period) as a sprite. It’s a simple, asymmetric, design, making it easier to spot bugs (such as incorrect orientation or pixels being missed off):

F.

The text file to initialize the sprite memory looks like this [letter_f.mem]:

1 1 1 1 1 1 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 1 1 1 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 0 0
1 1 0 0 0 0 1 1
0 0 0 0 0 0 1 1

We read the binary text format into FPGA memory with $readmemb (note the ‘b’ for binary); you can see this in action in the rom_async module, below. If you want to know more about loading data into memory, see Initialize Memory in Verilog.

Simple Sprite Drawing

Before starting to design hardware, we should consider what steps we go through in drawing a sprite at position X,Y on the screen (where X,Y is the top left of the sprite). The screen is drawn from the top left, a line at a time, so the steps are:

  1. Wait for the screen to reach the vertical sprite position (Y)
  2. Start sprite
  3. Wait for the horizontal sprite position (X)
  4. Draw a line of sprite pixels
  5. If we’re not done, then go to step 3
  6. Sprite is complete

This process is well represented by our old friend, the finite state machine (FSM). In fact, our first sprite design is little more than a simple finite state machine with a small ROM for the graphic [sprite_v1.sv]:

module sprite_v1 #(
    parameter WIDTH=8,            // graphic width in pixels
    parameter HEIGHT=8,           // graphic height in pixels
    parameter SPR_FILE="",        // file to load sprite graphic from
    parameter CORDW=16,           // screen coordinate width in bits
    parameter DEPTH=WIDTH*HEIGHT  // depth of memory array holding graphic
    ) (
    input  wire logic clk,               // clock
    input  wire logic rst,               // reset
    input  wire logic start,             // start control
    input  wire logic signed [CORDW-1:0] sx,    // horizontal screen position
    input  wire logic signed [CORDW-1:0] sprx,  // horizontal sprite position
    output      logic pix                // pixel colour to draw
    );

    // sprite graphic ROM
    logic [$clog2(DEPTH)-1:0] spr_rom_addr;  // pixel position
    logic spr_rom_data;  // pixel colour
    rom_async #(
        .WIDTH(1),  // 1 bit per pixel
        .DEPTH(DEPTH),
        .INIT_F(SPR_FILE)
    ) spr_rom (
        .addr(spr_rom_addr),
        .data(spr_rom_data)
    );

    // position within sprite
    logic [$clog2(WIDTH)-1:0]  ox;
    logic [$clog2(HEIGHT)-1:0] oy;

    enum {
        IDLE,       // awaiting start signal
        START,      // prepare for new sprite drawing
        AWAIT_POS,  // await horizontal position
        DRAW,       // draw pixel
        NEXT_LINE   // prepare for next sprite line
    } state, state_next;

    always_ff @(posedge clk) begin
        state <= state_next;  // advance to next state

        case (state)
            START: begin
                oy <= 0;
                spr_rom_addr <= 0;
            end
            AWAIT_POS: ox <= 0;
            DRAW: begin
                ox <= ox + 1;
                spr_rom_addr <= spr_rom_addr + 1;
            end
            NEXT_LINE: oy <= oy + 1;
        endcase

        if (rst) begin
            state <= IDLE;
            ox <= 0;
            oy <= 0;
            spr_rom_addr <= 0;
        end
    end

    // output current pixel colour when drawing
    always_comb pix = (state == DRAW) ? spr_rom_data : 0;

    // create status signals
    logic last_pixel, last_line;
    always_comb begin
        last_pixel = (ox == WIDTH-1);
        last_line  = (oy == HEIGHT-1);
    end

    // determine next state
    always_comb begin
        case (state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx-1) ? DRAW : AWAIT_POS;
            DRAW:       state_next = !last_pixel ? DRAW :
                                     (!last_line ? NEXT_LINE : IDLE);
            NEXT_LINE:  state_next = AWAIT_POS;
            default:    state_next = IDLE;
        endcase
    end
endmodule

The module does nothing until it receives a start signal, then it awaits the sprite position:

AWAIT_POS: state_next = (sx == sprx-1) ? DRAW : AWAIT_POS;

We wait for the pixel before the start of the sprite, so that the next cycle we start drawing in the right place:

DRAW: state_next = !last_pixel ? DRAW : (!last_line ? NEXT_LINE : IDLE);

In the next-state logic, we’ve nested a conditional operator. Nested conditional operators are confusing, so best avoided. However, in this case, I think it reads naturally. If we’re not reached the last pixel, keep drawing. If we’ve not arrived at the final line, go to the next line. Otherwise idle.

The graphic itself is held in an asynchronous ROM. An asynchronous ROM uses logic (LUTs); it’s a good choice for a small ROM such as this [rom_async.sv]:

module rom_async #(
    parameter WIDTH=8,
    parameter DEPTH=256,
    parameter INIT_F="",
    localparam ADDRW=$clog2(DEPTH)
    ) (
    input wire logic [ADDRW-1:0] addr,
    output     logic [WIDTH-1:0] data
    );

    logic [WIDTH-1:0] memory [DEPTH];

    initial begin
        if (INIT_F != 0) begin
            $display("Creating rom_async from init file '%s'.", INIT_F);
            $readmemh(INIT_F, memory);
        end
    end

    always_comb data = memory[addr];
endmodule

Note the lack of a clock: this ROM is entirely combinational. You’ll see a synchronous ROM, using BRAM, later in this post.

Revisiting Display Timings

Didn’t we already sort out display timings at the start of the series? Yes, we did, but there are good reasons to change how we treat screen coordinates before we start drawing sprites.

Take a look at the new [display_timings_480p.sv] module, then we’ll discuss the changes:

module display_timings_480p #(
    CORDW=16,   // signed coordinate width (bits)
    H_RES=640,  // horizontal resolution (pixels)
    V_RES=480,  // vertical resolution (lines)
    H_FP=16,    // horizontal front porch
    H_SYNC=96,  // horizontal sync
    H_BP=48,    // horizontal back porch
    V_FP=10,    // vertical front porch
    V_SYNC=2,   // vertical sync
    V_BP=33,    // vertical back porch
    H_POL=0,    // horizontal sync polarity (0:neg, 1:pos)
    V_POL=0     // vertical sync polarity (0:neg, 1:pos)
    ) (
    input  wire logic clk_pix,  // pixel clock
    input  wire logic rst,      // reset
    output      logic hsync,    // horizontal sync
    output      logic vsync,    // vertical sync
    output      logic de,       // data enable (low in blanking intervals)
    output      logic frame,    // high at start of frame
    output      logic line,     // high at start of active line
    output      logic signed [CORDW-1:0] sx,  // horizontal screen position
    output      logic signed [CORDW-1:0] sy   // vertical screen position
    );

    // horizontal timings
    localparam signed H_STA  = 0 - H_FP - H_SYNC - H_BP;    // horizontal start
    localparam signed HS_STA = H_STA + H_FP;                // sync start
    localparam signed HS_END = HS_STA + H_SYNC;             // sync end
    localparam signed HA_STA = 0;                           // active start
    localparam signed HA_END = H_RES - 1;                   // active end

    // vertical timings
    localparam signed V_STA  = 0 - V_FP - V_SYNC - V_BP;    // vertical start
    localparam signed VS_STA = V_STA + V_FP;                // sync start
    localparam signed VS_END = VS_STA + V_SYNC;             // sync end
    localparam signed VA_STA = 0;                           // active start
    localparam signed VA_END = V_RES - 1;                   // active end

    logic signed [CORDW-1:0] x, y;  // screen position

    // generate horizontal and vertical syncs with correct polarity
    always_ff @(posedge clk_pix) begin
        hsync <= H_POL ? (x > HS_STA && x <= HS_END)
                      : ~(x > HS_STA && x <= HS_END);
        vsync <= V_POL ? (y > VS_STA && y <= VS_END)
                      : ~(y > VS_STA && y <= VS_END);
    end

    // control signals
    always_ff @(posedge clk_pix) begin
        de    <= (y >= VA_STA && x >= HA_STA);
        frame <= (y == V_STA  && x == H_STA);
        line  <= (y >= VA_STA && x == H_STA);
        if (rst) frame <= 0;  // don't assert frame in reset
    end

    // calculate horizontal and vertical screen position
    always_ff @(posedge clk_pix) begin
        if (x == HA_END) begin  // last pixel on line?
            x <= H_STA;
            y <= (y == VA_END) ? V_STA : y + 1;  // last line on screen?
        end else begin
            x <= x + 1;
        end
        if (rst) begin
            x <= H_STA;
            y <= V_STA;
        end
    end

    // align screen position with sync and control signals
    always_ff @ (posedge clk_pix) begin
        sx <= x;
        sy <= y;
        if (rst) begin
            sx <= H_STA;
            sy <= V_STA;
        end
    end
endmodule

The three significant changes are:

  1. The blanking intervals occur before the active drawing area
  2. The use of signed coordinates for screen position
  3. All signals are registered for improved timing

Imagine we want to draw a sprite at the far-left of the screen. We need to start the drawing process before the first sprite pixel, for example, to load pixels from ram. If horizontal blanking occurs at the end of the line, we need to start the sprite on the line before. If we want to draw at the top of the screen, there is no previous line, so we need to start in the previous frame. Dealing with these edge cases complicates drawing needlessly.

If blanking before active area is so handy, why didn’t we do this before? Putting the blanking interval first means the first visible pixel is no longer (0,0); for VGA, it moves to (160,45). We can add an offset to all our coordinates, which is slightly annoying, but the Amiga successfully used this approach. However, this creates issues when handling different resolutions: the Amiga worked around this by always using low-resolution sprite coordinates.

There is a better way if we’re prepared to accept signed coordinates. We retain (0,0) as the top-left of the screen, while blanking occurs at negative coordinates. If we adopt 16-bit signed coordinates, we can handle any plausible screen size and an (X,Y) coordinate pair fits cleanly into a 32-bit word. Signed signals are a bit of a pain in Verilog, but I’ve found the slight inconvenience more than worth it when working with sprites and framebuffers.

To abstract the size of blanking intervals, I’ve added a couple of signals for the start of the frame and active line. For example, if you’re animating a sprite, you can move it when you receive a frame. We register control signals, so they don’t make timing closure harder.

Vivado users can test the new display timings with xc7/display_timings_480p_tb.sv.

Let’s Draw

With our new display timings in hand, it’s time to see our static sprite in action:

iCE40 version show below:

module top_sprite_v1 (
    input  wire logic clk_12m,      // 12 MHz clock
    input  wire logic btn_rst,      // reset button (active high)
    output      logic dvi_clk,      // DVI pixel clock
    output      logic dvi_hsync,    // DVI horizontal sync
    output      logic dvi_vsync,    // DVI vertical sync
    output      logic dvi_de,       // DVI data enable
    output      logic [3:0] dvi_r,  // 4-bit DVI red
    output      logic [3:0] dvi_g,  // 4-bit DVI green
    output      logic [3:0] dvi_b   // 4-bit DVI blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_12m),
       .rst(btn_rst),
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 16;
    logic signed [CORDW-1:0] sx, sy;
    logic hsync, vsync;
    logic de, line;
    display_timings_480p #(.CORDW(CORDW)) display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for pixel clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de,
        .frame(),
        .line
    );

    // sprite
    localparam SPR_WIDTH  = 8;  // width in pixels
    localparam SPR_HEIGHT = 8;  // number of lines
    localparam SPR_FILE = "../res/simple/letter_f.mem";
    logic spr_start;
    logic spr_pix;

    // draw sprite at position
    localparam DRAW_X = 16;
    localparam DRAW_Y = 16;

    // signal to start sprite drawing
    always_comb spr_start = (line && sy == DRAW_Y);

    sprite_v1 #(
        .WIDTH(SPR_WIDTH),
        .HEIGHT(SPR_HEIGHT),
        .SPR_FILE(SPR_FILE)
    ) spr_instance (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(spr_start),
        .sx,
        .sprx(DRAW_X),
        .pix(spr_pix)
    );

    // colours
    logic [3:0] red, green, blue;
    always_comb begin
        red   = (de && spr_pix) ? 4'hF: 4'h0;
        green = (de && spr_pix) ? 4'hC: 4'h0;
        blue  = (de && spr_pix) ? 4'h0: 4'h0;
    end

    // Output DVI clock: 180° out of phase with other DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010000)  // PIN_OUTPUT_DDR
    ) dvi_clk_io (
        .PACKAGE_PIN(dvi_clk),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0(1'b0),
        .D_OUT_1(1'b1)
    );

    // Output DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010100)  // PIN_OUTPUT_REGISTERED
    ) dvi_signal_io [14:0] (
        .PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0({hsync, vsync, de, red, green, blue}),
        .D_OUT_1()
    );
endmodule

We start the sprite drawing with the following logic (line is high at the start of active lines):

always_comb spr_start = (line && sy == DRAW_Y);

We pass the horizontal position of the screen, sx, to the sprite module, so it can wait for the correct horizontal drawing position.

Building the Designs
In the Hardware Sprites section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

Program your board, and you should see a small golden letter ‘F’ and a dot towards the top left of the screen. From these tiny beginnings, mighty sprites will grow.

Scaling Up

Now we can position our sprite correctly, it’s time to make it bigger. We make larger sprites by increasing the size of the design. However, it’s also useful to be able to scale our sprites up when drawing them. A scaled-up sprite will be blocky but will use few resources and allows a design to work at different screen resolutions.

To scale our sprite, we count additional screen pixels and lines when drawing the sprite using cnt_x and cnt_y respectively. The new module is [sprite_v2sv]:

module sprite_v2 #(
    parameter WIDTH=8,            // graphic width in pixels
    parameter HEIGHT=8,           // graphic height in pixels
    parameter SCALE_X=1,          // sprite width scale-factor
    parameter SCALE_Y=1,          // sprite height scale-factor
    parameter SPR_FILE="",        // file to load sprite graphic from
    parameter CORDW=16,           // screen coordinate width in bits
    parameter DEPTH=WIDTH*HEIGHT  // depth of memory array holding graphic
    ) (
    input  wire logic clk,               // clock
    input  wire logic rst,               // reset
    input  wire logic start,             // start control
    input  wire logic signed [CORDW-1:0] sx,    // horizontal screen position
    input  wire logic signed [CORDW-1:0] sprx,  // horizontal sprite position
    output      logic pix                // pixel colour to draw
    );

    // sprite graphic ROM
    logic [$clog2(DEPTH)-1:0] spr_rom_addr;  // pixel position
    logic spr_rom_data;  // pixel colour
    rom_async #(
        .WIDTH(1),  // 1 bit per pixel
        .DEPTH(DEPTH),
        .INIT_F(SPR_FILE)
    ) spr_rom (
        .addr(spr_rom_addr),
        .data(spr_rom_data)
    );

    // position within sprite
    logic [$clog2(WIDTH)-1:0]  ox;
    logic [$clog2(HEIGHT)-1:0] oy;

    // scale counters
    logic [$clog2(SCALE_X)-1:0] cnt_x;
    logic [$clog2(SCALE_Y)-1:0] cnt_y;

    enum {
        IDLE,       // awaiting start signal
        START,      // prepare for new sprite drawing
        AWAIT_POS,  // await horizontal position
        DRAW,       // draw pixel
        NEXT_LINE   // prepare for next sprite line
    } state, state_next;

    always_ff @(posedge clk) begin
        state <= state_next;  // advance to next state

        case (state)
            START: begin
                oy <= 0;
                cnt_y <= 0;
                spr_rom_addr <= 0;
            end
            AWAIT_POS: begin
                ox <= 0;
                cnt_x <= 0;
            end
            DRAW: begin
                if (SCALE_X <= 1 || cnt_x == SCALE_X-1) begin
                    ox <= ox + 1;
                    cnt_x <= 0;
                    spr_rom_addr <= spr_rom_addr + 1;
                end else begin
                    cnt_x <= cnt_x + 1;
                end
            end
            NEXT_LINE: begin
                if (SCALE_Y <= 1 || cnt_y == SCALE_Y-1) begin
                    oy <= oy + 1;
                    cnt_y <= 0;
                end else begin
                    cnt_y <= cnt_y + 1;
                    spr_rom_addr <= spr_rom_addr - WIDTH;  // restart line
                end
            end
        endcase

        if (rst) begin
            state <= IDLE;
            ox <= 0;
            oy <= 0;
            cnt_x <= 0;
            cnt_y <= 0;
            spr_rom_addr <= 0;
        end
    end

    // output current pixel colour when drawing
    always_comb pix = (state == DRAW) ? spr_rom_data : 0;

    // create status signals
    logic last_pixel, last_line;
    always_comb begin
        last_pixel = (ox == WIDTH-1  && cnt_x == SCALE_X-1);
        last_line  = (oy == HEIGHT-1 && cnt_y == SCALE_Y-1);
    end

    // determine next state
    always_comb begin
        case (state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx-1) ? DRAW : AWAIT_POS;
            DRAW:       state_next = !last_pixel ? DRAW :
                                    (!last_line ? NEXT_LINE : IDLE);
            NEXT_LINE:  state_next = AWAIT_POS;
            default:    state_next = IDLE;
        endcase
    end
endmodule

We can then drive this with a small change to our top module:

Build the v2 design with scaling. This design has hard-coded scale parameters, SCALE_X and SCALE_Y, but these could easily be made module inputs to allow for changes at run time.

Letter F

Motion

It’s time we got our sprites moving. Now we have our basic sprite working, I’m going to use a new design, which I’m charitably calling a flying saucer:

Flying Saucer

The memory initilization file is [saucer.mem]:

0 0 1 1 1 1 0 0
0 1 1 0 0 1 1 0
1 1 0 1 1 0 1 1
1 0 1 1 1 1 0 1
1 0 1 1 1 1 0 1
1 1 0 1 1 0 1 1
0 1 1 0 0 1 1 0
0 0 1 1 1 1 0 0

I’m sure you can come up with something better: create your own design following the same format as above. You don’t have to limit yourself to 8x8 pixels, just be sure to update the width and height in the top module (see below).

To make it easy to build your own design I’ve added an empty sprite with the name [user.mem]. If you replace this file in res/simple/ with your own design it will automatically be included in projects and makefiles. See the Hardware Sprites section of the git repo for build instructions.

To move our sprite, I’ve borrowed the horizontal bouncing logic from our first part, yet again, to create:

Build this version of the project. You should see your sprite bounce back and forth across the screen. If you created your own sprite, remember to update the sprite filename, SPR_FILE, in top_sprite_v2a.sv. You can also tweak the sprite height, width, and scale as you like.

Colourful?

The introduction promised “colourful animated graphics”: it’s time to make good on this by increasing our colour depth. Rather than continue to inflict my drawing skills on you, I’m using the adorable hedgehog from the Amiga platformer, Superfrog.

Hedgehog

The hedgehog graphic is 32x20, so has a total of 640 pixels. The original Amiga game uses 32 colours, of which the hedgehog uses ten, plus one transparent colour. To allow for 11 colours we need four bits per pixel.

The memory requirement for this sprite is: 32 x 20 x 4 = 2,560 or 2.5 kilobits.

The memory initialization file is similar to our monochrome sprites, but instead of pixels being 0 or 1, they’re 0 to F. I use a tool called img2fmem (discussed later) to convert the sprite image to [hedgehog.mem]. To read this hex text data into memory we use $readmemh (note the ‘h’ for hex).

Indexed Colour
This design was common in older computers, for example, the original Amiga chipset supported 32 colours from a possible 4,096: very similar to our design! The GIF and PNG formats still use this approach to squeeze the best quality out of 256-colour images.

More Memory

If we’re going to use larger sprites, we need to rethink our memory design. An async ROM suffices for small designs but is a resource hog and timing disaster for larger sprites. FPGAs include block ram (BRAM), which is ideal for memories of a few hundred bits to a few tens of kilobits.

Our sprites designs don’t change at runtime, so we can create a ROM using BRAM. The ROM takes a clock and address as input, and outputs the requested data the following clock cycle. This extra latency has implications for correct positioning of our sprite, which we’ll discuss shortly. Check out FPGA Memory Types to learn more about BRAM and other memory.

Synchronous ROM using BRAM [rom_sync.sv]:

module rom_sync #(
    parameter WIDTH=8,
    parameter DEPTH=256,
    parameter INIT_F="",
    localparam ADDRW=$clog2(DEPTH)
    ) (
    input wire logic clk,
    input wire logic [ADDRW-1:0] addr,
    output     logic [WIDTH-1:0] data
    );

    logic [WIDTH-1:0] memory [DEPTH];

    initial begin
        if (INIT_F != 0) begin
            $display("Creating rom_sync from init file '%s'.", INIT_F);
            $readmemh(INIT_F, memory);
        end
    end

    always_ff @(posedge clk) begin
        data <= memory[addr];
    end
endmodule

Once you’ve created the appropriate memory module, we can create our final sprite module [sprite.sv]:

module sprite #(
    parameter WIDTH=8,         // graphic width in pixels
    parameter HEIGHT=8,        // graphic height in pixels
    parameter SCALE_X=1,       // sprite width scale-factor
    parameter SCALE_Y=1,       // sprite height scale-factor
    parameter COLR_BITS=4,     // bits per pixel (2^4=16 colours)
    parameter CORDW=16,        // screen coordinate width in bits
    parameter ADDRW=6          // width of graphic memory address bus
    ) (
    input  wire logic clk,                      // clock
    input  wire logic rst,                      // reset
    input  wire logic start,                    // start control
    input  wire logic signed [CORDW-1:0] sx,    // horizontal screen position
    input  wire logic signed [CORDW-1:0] sprx,  // horizontal sprite position
    input  wire logic [COLR_BITS-1:0] data_in,  // data from external memory
    output      logic [ADDRW-1:0] pos,          // sprite pixel position
    output      logic [COLR_BITS-1:0] pix,      // pixel colour to draw
    output      logic drawing,                  // sprite is drawing
    output      logic done                      // sprite drawing is complete
    );

    // position within sprite
    logic [$clog2(WIDTH)-1:0]  ox;
    logic [$clog2(HEIGHT)-1:0] oy;

    // scale counters
    logic [$clog2(SCALE_X)-1:0] cnt_x;
    logic [$clog2(SCALE_Y)-1:0] cnt_y;

    enum {
        IDLE,       // awaiting start signal
        START,      // prepare for new sprite drawing
        AWAIT_POS,  // await horizontal position
        DRAW,       // draw pixel
        NEXT_LINE,  // prepare for next sprite line
        DONE        // set done signal, then go idle
    } state, state_next;

    always_ff @(posedge clk) begin
        state <= state_next;  // advance to next state

        case (state)
            START: begin
                done <= 0;
                oy <= 0;
                cnt_y <= 0;
                pos <= 0;
            end
            AWAIT_POS: begin
                ox <= 0;
                cnt_x <= 0;
            end
            DRAW: begin
                if (SCALE_X <= 1 || cnt_x == SCALE_X-1) begin
                    ox <= ox + 1;
                    cnt_x <= 0;
                    pos <= pos + 1;
                end else begin
                    cnt_x <= cnt_x + 1;
                end
            end
            NEXT_LINE: begin
                if (SCALE_Y <= 1 || cnt_y == SCALE_Y-1) begin
                    oy <= oy + 1;
                    cnt_y <= 0;
                end else begin
                    cnt_y <= cnt_y + 1;
                    pos <= pos - WIDTH;  // go back to start of line
                end
            end
            DONE: done <= 1;
        endcase

        if (rst) begin
            state <= IDLE;
            ox <= 0;
            oy <= 0;
            cnt_x <= 0;
            cnt_y <= 0;
            pos <= 0;
            done <= 0;
        end
    end

    // output current pixel colour when drawing
    always_comb pix = (state == DRAW) ? data_in : 0;

    // create status signals
    logic last_pixel, last_line;
    always_comb begin
        last_pixel = (ox == WIDTH-1  && cnt_x == SCALE_X-1);
        last_line  = (oy == HEIGHT-1 && cnt_y == SCALE_Y-1);
        drawing = (state == DRAW);
    end

    // determine next state
    always_comb begin
        case (state)
            IDLE:       state_next = start ? START : IDLE;
            START:      state_next = AWAIT_POS;
            AWAIT_POS:  state_next = (sx == sprx-2) ? DRAW : AWAIT_POS;  // BRAM
            DRAW:       state_next = !last_pixel ? DRAW :
                                    (!last_line ? NEXT_LINE : DONE);
            NEXT_LINE:  state_next = AWAIT_POS;
            DONE:       state_next = IDLE;
            default:    state_next = IDLE;
        endcase
    end
endmodule

There are three changes of note:

External Memory

We’ve moved the memory interface outside the sprite module. The sprite module sends the desired position using the pos output and receives the pixel data on the data_in input. In the next part, we’ll take this a step further: multiple sprites will share one memory interface.

Latency

The block ram adds an additional cycle of latency, so we need to subtract two from the horizontal position in AWAIT_POS.

Drawing & Done

For better control and reuse of sprite instances, we’ve added two new signals: drawing is high when the sprite is drawing pixels, and done indicates the sprite is complete.

A Refined Palette

Our boards have 12-bit colour output, supporting 4,096 colours. We can map our 11 sprite colours to any of these using a colour lookup table (CLUT). We populate the colour lookup table using a simple text file [hedgehog_palette.mem]:

CCC AAA 888 874 763 651 540 330 111 F0F 000

Hedgehog Palette

The hedgehog sprite has ten drawing colours, with an additional colour F0F (magenta) used for transparency. These colours are 12-bit in hex format: RGB; the same as a web colour hex triplet.

CLUT & Display Output

The logic for the CLUT is straightforward: we only have 11 colours, so an async ROM suffices:

    // colour lookup table (ROM) 11x12-bit entries
    logic [11:0] clut_colr;
    rom_async #(
        .WIDTH(12),
        .DEPTH(11),
        .INIT_F(SPR_PALETTE)
    ) clut (
        .addr(spr_pix),
        .data(clut_colr)
    );

    // map sprite colour index to palette using CLUT and incorporate background
    logic spr_trans;  // sprite pixel transparent?
    logic [3:0] red_spr, green_spr, blue_spr;  // sprite colour components
    logic [3:0] red_bg,  green_bg,  blue_bg;   // background colour components
    logic [3:0] red, green, blue;              // final colour
    always_comb begin
        spr_trans = (spr_pix == SPR_TRANS);
        {red_spr, green_spr, blue_spr} = clut_colr;
        {red_bg,  green_bg,  blue_bg}  = 12'h260;
        red   = (spr_drawing && !spr_trans) ? red_spr   : red_bg;
        green = (spr_drawing && !spr_trans) ? green_spr : green_bg;
        blue  = (spr_drawing && !spr_trans) ? blue_spr  : blue_bg;
    end

We take the colour index provided by the sprite module, spr_pix, and use it look up the red, green, and blue components of the pixel colour. We also check to see whether the pixel colour matches the transparant colour specified in SPR_TRANS: we don’t want to draw the sprite if the pixel is transparent. This version has a solid green background specified by the hex colour 12'h260.

Top Hedgehog

We’re now ready to draw our hedgehog using a new top module:

Animation

Our hedgehog looks like it’s on ice. To complete our sprite design we need to add animation support so the hedgehog can move its legs. The required change is surprisingly small; we just need to load all three hedgehog images into memory and add offset the memory position to choose which image to display.

Walking Hedgehog

Your Own Colour Graphics
You’ll learn how to create your own memory files from images using img2fmem later in this series.

The animated sprite graphic has three images stacked vertically, so each image is contiguous in memory hedgehog_walk.mem:

Hedgehog Frames

Every display frame we increment the counter cnt_anim. When the counter hits specific values, we update the sprite base address, spr_base_addr, to select a different image in the sprite graphic.

    // sprite frame selector
    logic [5:0] cnt_anim;  // count from 0-63
    always_ff @(posedge clk_pix) begin
        if (frame) begin
            // select sprite frame
            cnt_anim <= cnt_anim + 1;
            case (cnt_anim)
                0: spr_base_addr <= 0;
                15: spr_base_addr <= SPR_PIXELS;
                31: spr_base_addr <= 0;
                47: spr_base_addr <= 2 * SPR_PIXELS;
                default: spr_base_addr <= spr_base_addr;
            endcase

            // ...

The rom address uses the sprite base address, spr_base_addr, to select the right image:

    // sprite graphic ROM
    logic [COLR_BITS-1:0] spr_rom_data;
    logic [SPR_ADDRW-1:0] spr_rom_addr, spr_base_addr;
    rom_sync #(
        .WIDTH(COLR_BITS),
        .DEPTH(SPR_DEPTH),
        .INIT_F(SPR_FILE)
    ) spr_rom (
        .clk(clk_pix),
        .addr(spr_base_addr + spr_rom_addr),
        .data(spr_rom_data)
    );

Next we want to make our hedgehog’s world a little more interesting by bringing in the sky. We do this by creating horizontal bars of colour in the background:

    // background colour
    logic [11:0] bg_colr;
    always_ff @(posedge clk_pix) begin
        if (line) begin
            if      (sy == 0)   bg_colr <= 12'h239;
            else if (sy == 80)  bg_colr <= 12'h24A;
            else if (sy == 140) bg_colr <= 12'h25B;
            else if (sy == 190) bg_colr <= 12'h26C;
            else if (sy == 230) bg_colr <= 12'h27D;
            else if (sy == 265) bg_colr <= 12'h29E;
            else if (sy == 295) bg_colr <= 12'h2BF;
            else if (sy == 320) bg_colr <= 12'h260;
        end
    end

We move our hedgehog down the screen to walk on the ground:

    if (!clk_locked) begin
        sprx <= H_RES;
        spry <= 240;
    end

Hedgehog walking under the sky

Our completed top module animates the hedgehog at approximently four frames per second:

Arty version shown below:

module top_hedgehog (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 16;
    logic signed [CORDW-1:0] sx, sy;
    logic hsync, vsync;
    logic de, frame, line;
    display_timings_480p #(.CORDW(CORDW)) display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for pixel clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de,
        .frame,
        .line
    );

    // sprite
    localparam SPR_WIDTH    = 32;   // width in pixels
    localparam SPR_HEIGHT   = 20;   // number of lines
    localparam SPR_SCALE_X  = 4;    // width scale-factor
    localparam SPR_SCALE_Y  = 4;    // height scale-factor
    localparam COLR_BITS    = 4;    // bits per pixel (2^4=16 colours)
    localparam SPR_TRANS    = 9;    // transparent palette entry
    localparam SPR_FRAMES   = 3;    // number of frames in graphic
    localparam SPR_FILE     = "hedgehog_walk.mem";
    localparam SPR_PALETTE  = "hedgehog_palette.mem";

    localparam SPR_PIXELS = SPR_WIDTH * SPR_HEIGHT;
    localparam SPR_DEPTH  = SPR_PIXELS * SPR_FRAMES;
    localparam SPR_ADDRW  = $clog2(SPR_DEPTH);

    logic spr_start, spr_drawing;
    logic [COLR_BITS-1:0] spr_pix;

    // sprite graphic ROM
    logic [COLR_BITS-1:0] spr_rom_data;
    logic [SPR_ADDRW-1:0] spr_rom_addr, spr_base_addr;
    rom_sync #(
        .WIDTH(COLR_BITS),
        .DEPTH(SPR_DEPTH),
        .INIT_F(SPR_FILE)
    ) spr_rom (
        .clk(clk_pix),
        .addr(spr_base_addr + spr_rom_addr),
        .data(spr_rom_data)
    );

    // draw sprite at position
    localparam H_RES = 640;
    localparam SPR_SPEED_X = 2;
    logic signed [CORDW-1:0] sprx, spry;

    // sprite frame selector
    logic [5:0] cnt_anim;  // count from 0-63
    always_ff @(posedge clk_pix) begin
        if (frame) begin
            // select sprite frame
            cnt_anim <= cnt_anim + 1;
            case (cnt_anim)
                0: spr_base_addr <= 0;
                15: spr_base_addr <= SPR_PIXELS;
                31: spr_base_addr <= 0;
                47: spr_base_addr <= 2 * SPR_PIXELS;
                default: spr_base_addr <= spr_base_addr;
            endcase

            // walk right-to-left: -132 covers sprite width and within blanking
            sprx <= (sprx > -132) ? sprx - SPR_SPEED_X : H_RES;
        end
        if (!clk_locked) begin
            sprx <= H_RES;
            spry <= 240;
        end
    end

    // signal to start sprite drawing
    always_comb spr_start = (line && sy == spry);

    sprite #(
        .WIDTH(SPR_WIDTH),
        .HEIGHT(SPR_HEIGHT),
        .COLR_BITS(COLR_BITS),
        .SCALE_X(SPR_SCALE_X),
        .SCALE_Y(SPR_SCALE_Y),
        .ADDRW(SPR_ADDRW)
        ) spr_instance (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(spr_start),
        .sx,
        .sprx,
        .data_in(spr_rom_data),
        .pos(spr_rom_addr),
        .pix(spr_pix),
        .drawing(spr_drawing),
        .done()
    );

    // background colour
    logic [11:0] bg_colr;
    always_ff @(posedge clk_pix) begin
        if (line) begin
            if      (sy == 0)   bg_colr <= 12'h239;
            else if (sy == 80)  bg_colr <= 12'h24A;
            else if (sy == 140) bg_colr <= 12'h25B;
            else if (sy == 190) bg_colr <= 12'h26C;
            else if (sy == 230) bg_colr <= 12'h27D;
            else if (sy == 265) bg_colr <= 12'h29E;
            else if (sy == 295) bg_colr <= 12'h2BF;
            else if (sy == 320) bg_colr <= 12'h260;
        end
    end

    // colour lookup table (ROM) 11x12-bit entries
    logic [11:0] clut_colr;
    rom_async #(
        .WIDTH(12),
        .DEPTH(11),
        .INIT_F(SPR_PALETTE)
    ) clut (
        .addr(spr_pix),
        .data(clut_colr)
    );

    // map sprite colour index to palette using CLUT and incorporate background
    logic spr_trans;  // sprite pixel transparent?
    logic [3:0] red_spr, green_spr, blue_spr;  // sprite colour components
    logic [3:0] red_bg,  green_bg,  blue_bg;   // background colour components
    logic [3:0] red, green, blue;              // final colour
    always_comb begin
        spr_trans = (spr_pix == SPR_TRANS);
        {red_spr, green_spr, blue_spr} = clut_colr;
        {red_bg,  green_bg,  blue_bg}  = bg_colr;
        red   = (spr_drawing && !spr_trans) ? red_spr   : red_bg;
        green = (spr_drawing && !spr_trans) ? green_spr : green_bg;
        blue  = (spr_drawing && !spr_trans) ? blue_spr  : blue_bg;
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync;
        vga_vsync <= vsync;
        vga_r <= de ? red   : 4'h0;
        vga_g <= de ? green : 4'h0;
        vga_b <= de ? blue  : 4'h0;
    end
endmodule

Explore

I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few sprite suggestions:

  • Design your own 1-bit space ship and asteroid sprites
  • Use buttons to control the position of sprites on screen
  • Add additional hedgehogs sprite instances to the final design
  • Draw the numbers 0-9 as sprites and use them to score Pong

Sponsor Project F
If you like what I do, consider sponsoring me on GitHub.
I use contributions to spend more time creating open-source FPGA designs and tutorials.

Next Time

In the next part, we’ll create a demo using hardware sprites and animated starfields in FPGA Ad Astra.

Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.

©2021 Will Green, Project F