17 March 2021

FPGA Shapes

Welcome back to Exploring FPGA Graphics. This time we’re going to build on our work in lines and triangles by drawing more shapes and filling them in before using our framebuffer to animate them.

In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how displays work, race the beam with Pong, animate starfields and sprites, paint Michelangelo’s David, simulate life with bitmaps, draw lines and shapes, and finally render simple 3D models. New to the series? Start with Exploring FPGA Graphics.

April 2021: Draft post - designs for iCEBreaker and filled triangles will be added soon.

Updated 2021-04-17. Get in touch with @WillFlux or open an issue on GitHub.

Series Outline

  • Exploring FPGA Graphics - learn how displays work and animate simple shapes
  • FPGA Pong - race the beam to create the arcade classic
  • Hardware Sprites - fast, colourful, graphics with minimal resources
  • FPGA Ad Astra - demo with hardware sprites and animated starfields
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life
  • Lines and Triangles - drawing lines and triangles with a framebuffer
  • FPGA Shapes (this post) - filling and animating shapes
  • Simple 3D - models and wireframe rendering (draft coming soon)

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing with these boards:

Source

The SystemVerilog designs featured in this series are available from the projf-explore git repo under the open-source MIT licence: build on them to your heart’s content. The rest of the blog content is subject to standard copyright restrictions: don’t republish it without permission.

The Rectangle

Let’s kick things off by creating a module for rectangle drawing; this is similar to the triangle and provides a helpful stepping stone to drawing filled shapes.

Create module [draw_rectangle.sv]:

module draw_rectangle #(parameter CORDW=16) (  // signed coordinate width
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start rectangle drawing
    input  wire logic oe,              // output enable
    input  wire logic signed [CORDW-1:0] x0,  // vertex 0 - horizontal position
    input  wire logic signed [CORDW-1:0] y0,  // vertex 0 - vertical position
    input  wire logic signed [CORDW-1:0] x1,  // vertex 2 - horizontal position
    input  wire logic signed [CORDW-1:0] y1,  // vertex 2 - vertical position
    output      logic signed [CORDW-1:0] x,   // horizontal drawing position
    output      logic signed [CORDW-1:0] y,   // vertical drawing position
    output      logic drawing,         // rectangle is drawing
    output      logic done             // rectangle complete (high for one tick)
    );

    logic [1:0] line_id;  // current line (0, 1, 2, or 3)
    logic line_start;     // start drawing line
    logic line_done;      // finished drawing current line?

    // current line coordinates
    logic signed [CORDW-1:0] lx0, ly0;  // point 0 position
    logic signed [CORDW-1:0] lx1, ly1;  // point 1 position

    enum {IDLE, INIT, DRAW} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk) begin
        line_start <= 0;
        case (state)
            INIT: begin  // register coordinates
                if (line_id == 2'd0) begin  // (x0,y0) (x1,y0)
                    lx0 <= x0; ly0 <= y0;
                    lx1 <= x1; ly1 <= y0;
                end else if (line_id == 2'd1) begin  // (x1,y0) (x1,y1)
                    lx0 <= x1; ly0 <= y0;
                    lx1 <= x1; ly1 <= y1;
                end else if (line_id == 2'd2) begin  // (x1,y1) (x0,y1)
                    lx0 <= x1; ly0 <= y1;
                    lx1 <= x0; ly1 <= y1;
                end else begin  // (x0,y1) (x0,y0)
                    lx0 <= x0; ly0 <= y1;
                    lx1 <= x0; ly1 <= y0;
                end
                state <= DRAW;
                line_start <= 1;
            end
            DRAW: begin
                if (line_done) begin
                    if (line_id == 3) begin  // final line
                        done <= 1;
                        state <= IDLE;
                    end else begin
                        line_id <= line_id + 1;
                        state <= INIT;
                    end
                end
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin
                    line_id <= 0;
                    state <= INIT;
                end
            end
        endcase

        if (rst) begin
            line_id <= 0;
            line_start <= 0;
            done <= 0;
            state <= IDLE;
        end
    end

    draw_line #(.CORDW(CORDW)) draw_line_inst (
        .clk,
        .rst,
        .start(line_start),
        .oe,
        .x0(lx0),
        .y0(ly0),
        .x1(lx1),
        .y1(ly1),
        .x,
        .y,
        .drawing,
        .done(line_done)
    );
endmodule

This module has similar I/O and state machine to draw_triangle, but a rectangle only needs two pairs of coordinates to define it.

To demo our rectangle drawing, we’re going to draw a series of rectangles inside each other using each colour (except black). The top module is similar to the one we used in the previous part to draw lines and triangles:

Building the Designs
In the FPGA Shapes section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

The Arty version of top_rectangles looks like this:

module top_rectangles (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 16;
    logic hsync, vsync;
    logic de, frame, line;
    display_timings_480p #(.CORDW(CORDW)) display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for pixel clock lock
        .sx(),
        .sy(),
        .hsync,
        .vsync,
        .de,
        .frame,
        .line
    );

    logic frame_sys;  // start of new frame in system clock domain
    xd xd_frame (.clk_i(clk_pix), .clk_o(clk_100m),
                 .rst_i(1'b0), .rst_o(1'b0), .i(frame), .o(frame_sys));

    // framebuffer (FB)
    localparam FB_WIDTH   = 320;
    localparam FB_HEIGHT  = 240;
    localparam FB_CIDXW   = 4;
    localparam FB_CHANW   = 4;
    localparam FB_SCALE   = 2;
    localparam FB_IMAGE   = "";
    localparam FB_PALETTE = "16_colr_4bit_palette.mem";

    logic fb_we;
    logic signed [CORDW-1:0] fbx, fby;  // framebuffer coordinates
    logic [FB_CIDXW-1:0] fb_cidx;
    logic [FB_CHANW-1:0] fb_red, fb_green, fb_blue;  // colours for display

    framebuffer #(
        .WIDTH(FB_WIDTH),
        .HEIGHT(FB_HEIGHT),
        .CIDXW(FB_CIDXW),
        .CHANW(FB_CHANW),
        .SCALE(FB_SCALE),
        .F_IMAGE(FB_IMAGE),
        .F_PALETTE(FB_PALETTE)
    ) fb_inst (
        .clk_sys(clk_100m),
        .clk_pix,
        .rst_sys(1'b0),
        .rst_pix(1'b0),
        .de,
        .frame,
        .line,
        .we(fb_we),
        .x(fbx),
        .y(fby),
        .cidx(fb_cidx),
        .clip(),
        .red(fb_red),
        .green(fb_green),
        .blue(fb_blue)
    );

    // draw rectangles in framebuffer
    localparam SHAPE_CNT=15;  // number of shapes to draw
    logic [3:0] shape_id;  // shape identifier
    logic signed [CORDW-1:0] rx0, ry0, rx1, ry1;  // shape coords
    logic draw_start, drawing, draw_done;  // drawing signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk_100m) begin
        draw_start <= 0;
        case (state)
            INIT: begin  // register coordinates and colour
                draw_start <= 1;
                state <= DRAW;
                rx0 <=  80 + shape_id;
                ry0 <=  60 + shape_id;
                rx1 <= 240 - shape_id;
                ry1 <= 180 - shape_id;
                fb_cidx <= shape_id + 1;  // skip 1st colour (black)
            end
            DRAW: if (draw_done) begin
                if (shape_id == SHAPE_CNT-1) begin
                    state <= DONE;
                end else begin
                    shape_id <= shape_id + 1;
                    state <= INIT;
                end
            end
            DONE: state <= DONE;
            default: if (frame_sys) state <= INIT;  // IDLE
        endcase
    end

    // control drawing output enable - wait 300 frames, then 1 pixel/frame
    localparam DRAW_WAIT = 300;
    logic [$clog2(DRAW_WAIT)-1:0] cnt_draw_wait;
    logic draw_oe;
    always_ff @(posedge clk_100m) begin
        draw_oe <= 0;
        if (frame_sys) begin
            if (cnt_draw_wait != DRAW_WAIT-1) begin
                cnt_draw_wait <= cnt_draw_wait + 1;
            end else draw_oe <= 1;
        end
    end

    draw_rectangle #(.CORDW(CORDW)) draw_rectangle_inst (
        .clk(clk_100m),
        .rst(1'b0),
        .start(draw_start),
        .oe(draw_oe),
        .x0(rx0),
        .y0(ry0),
        .x1(rx1),
        .y1(ry1),
        .x(fbx),
        .y(fby),
        .drawing,
        .done(draw_done)
    );

    // write to framebuffer when drawing
    always_comb fb_we = drawing;

    // reading from FB takes one cycle: delay display signals to match
    logic hsync_p1, vsync_p1;
    always_ff @(posedge clk_pix) begin
        hsync_p1 <= hsync;
        vsync_p1 <= vsync;
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync_p1;
        vga_vsync <= vsync_p1;
        vga_r <= fb_red;
        vga_g <= fb_green;
        vga_b <= fb_blue;
    end
endmodule

Filled Rectangle

For a filled rectangle, we keep drawing horizontal lines in the right position until the shape is complete:

Create module [draw_rectangle_fill.sv]:

module draw_rectangle_fill #(parameter CORDW=16) (  // signed coordinate width
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start rectangle drawing
    input  wire logic oe,              // output enable
    input  wire logic signed [CORDW-1:0] x0,  // vertex 0 - horizontal position
    input  wire logic signed [CORDW-1:0] y0,  // vertex 0 - vertical position
    input  wire logic signed [CORDW-1:0] x1,  // vertex 2 - horizontal position
    input  wire logic signed [CORDW-1:0] y1,  // vertex 2 - vertical position
    output      logic signed [CORDW-1:0] x,   // horizontal drawing position
    output      logic signed [CORDW-1:0] y,   // vertical drawing position
    output      logic drawing,         // rectangle is drawing
    output      logic done             // rectangle complete (high for one tick)
    );

    // filled rectangle has as many lines as it is tall abs(y1-y0)
    logic signed [CORDW-1:0] line_id;  // current line
    logic line_start;  // start drawing line
    logic line_done;   // finished drawing current line?

    // sort input Y coordinates so we always draw top-to-bottom
    logic signed [CORDW-1:0] y0s, y1s;  // vertex 0 - ordered
    always_comb begin
        y0s = (y0 > y1) ? y1 : y0;
        y1s = (y0 > y1) ? y0 : y1;  // last line
    end

    // line coordinates - horizontal lines, so only one y-value
    logic signed [CORDW-1:0] lx0, lx1, ly;

    enum {IDLE, INIT, DRAW} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk) begin
        line_start <= 0;
        case (state)
            INIT: begin  // register coordinates
                // x-coordinates don't change for a given filled rectangle
                lx0 <= (x0 > x1) ? x1 : x0;  // draw left-to-right
                lx1 <= (x0 > x1) ? x0 : x1;
                ly <= y0s + line_id;  // vertical position
                state <= DRAW;
                line_start <= 1;
            end
            DRAW: begin
                if (line_done) begin
                    if (ly == y1s) begin
                        done <= 1;
                        state <= IDLE;
                    end else begin
                        line_id <= line_id + 1;
                        state <= INIT;
                    end
                end
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin
                    line_id <= 0;
                    state <= INIT;
                end
            end
        endcase

        if (rst) begin
            line_id <= 0;
            line_start <= 0;
            done <= 0;
            state <= IDLE;
        end
    end

    draw_line #(.CORDW(CORDW)) draw_line_inst (
        .clk,
        .rst,
        .start(line_start),
        .oe,
        .x0(lx0),
        .y0(ly),
        .x1(lx1),
        .y1(ly),
        .x,
        .y,
        .drawing,
        .done(line_done)
    );
endmodule

For a simple demo of overlapping squares using the filled rectangle module, create a new top module:

The Arty version of top_rectangles looks like this:

module top_rectangles_fill (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 16;
    logic hsync, vsync;
    logic de, frame, line;
    display_timings_480p #(.CORDW(CORDW)) display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for pixel clock lock
        .sx(),
        .sy(),
        .hsync,
        .vsync,
        .de,
        .frame,
        .line
    );

    logic frame_sys;  // start of new frame in system clock domain
    xd xd_frame (.clk_i(clk_pix), .clk_o(clk_100m),
                 .rst_i(1'b0), .rst_o(1'b0), .i(frame), .o(frame_sys));

    // framebuffer (FB)
    localparam FB_WIDTH   = 320;
    localparam FB_HEIGHT  = 240;
    localparam FB_CIDXW   = 4;
    localparam FB_CHANW   = 4;
    localparam FB_SCALE   = 2;
    localparam FB_IMAGE   = "";
    localparam FB_PALETTE = "16_colr_4bit_palette.mem";

    logic fb_we;
    logic signed [CORDW-1:0] fbx, fby;  // framebuffer coordinates
    logic [FB_CIDXW-1:0] fb_cidx;
    logic [FB_CHANW-1:0] fb_red, fb_green, fb_blue;  // colours for display

    framebuffer #(
        .WIDTH(FB_WIDTH),
        .HEIGHT(FB_HEIGHT),
        .CIDXW(FB_CIDXW),
        .CHANW(FB_CHANW),
        .SCALE(FB_SCALE),
        .F_IMAGE(FB_IMAGE),
        .F_PALETTE(FB_PALETTE)
    ) fb_inst (
        .clk_sys(clk_100m),
        .clk_pix,
        .rst_sys(1'b0),
        .rst_pix(1'b0),
        .de,
        .frame,
        .line,
        .we(fb_we),
        .x(fbx),
        .y(fby),
        .cidx(fb_cidx),
        .clip(),
        .red(fb_red),
        .green(fb_green),
        .blue(fb_blue)
    );

    // draw filled rectangles in framebuffer
    localparam SHAPE_CNT=15;  // number of shapes to draw
    logic [3:0] shape_id;  // shape identifier
    logic [CORDW-1:0] rx0, ry0, rx1, ry1;  // shape coords
    logic draw_start, drawing, draw_done;  // drawing signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk_100m) begin
        draw_start <= 0;
        case (state)
            INIT: begin  // register coordinates and colour
                draw_start <= 1;
                state <= DRAW;
                rx0 <=  80 + 4 * shape_id;
                ry0 <=  60 + 4 * shape_id;
                rx1 <= 160 + 4 * shape_id;
                ry1 <= 140 + 4 * shape_id;
                fb_cidx <= shape_id + 1;  // skip 1st colour (black)
            end
            DRAW: if (draw_done) begin
                if (shape_id == SHAPE_CNT-1) begin
                    state <= DONE;
                end else begin
                    shape_id <= shape_id + 1;
                    state <= INIT;
                end
            end
            DONE: state <= DONE;
            default: if (frame_sys) state <= INIT;  // IDLE
        endcase
    end

    draw_rectangle_fill #(.CORDW(CORDW)) draw_rectangle_inst (
        .clk(clk_100m),
        .rst(1'b0),
        .start(draw_start),
        .oe(1'b1),
        .x0(rx0),
        .y0(ry0),
        .x1(rx1),
        .y1(ry1),
        .x(fbx),
        .y(fby),
        .drawing,
        .done(draw_done)
    );

    // write to framebuffer when drawing
    always_comb fb_we = drawing;

    // reading from FB takes one cycle: delay display signals to match
    logic hsync_p1, vsync_p1;
    always_ff @(posedge clk_pix) begin
        hsync_p1 <= hsync;
        vsync_p1 <= vsync;
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync_p1;
        vga_vsync <= vsync_p1;
        vga_r <= fb_red;
        vga_g <= fb_green;
        vga_b <= fb_blue;
    end
endmodule

Filled Triangle

Design in progress

Blazing a Trail

Back at the start of the series, we animated bouncing squares; now we’re going to do it with a framebuffer.

Our framebuffer remembers all the square we’ve drawn, so the screen gradually fills with striped colour. While this is a fun effect, it’s not usually what you want.

There are three approaches we can take to move an object around the screen cleanly:

  1. Use Hardware Sprites - suitable for simple 2D graphics
  2. Use a blitter to cut out and move a framebuffer region - fast and effective for 2D objects
  3. Clear the framebuffer and draw from scratch - versatile, but requires plenty of bandwidth

For this post, we’ll go with option 3, but more than that, we’ll also introduce double buffering.

Double Buffering

We can’t draw in a framebuffer while the display controller reads it; otherwise we’ll get tearing. We could limit ourselves to drawing in the vertical blanking interval, but even for 640x480 with its generous blanking, we’d only be able to draw for less than 10% of the time. And we still need time to clear the framebuffer every frame, which takes a considerable time, even at 320x180.

To draw all the time, we can double our framebuffer: drawing in one half while the display controller read from the other one. That way, we can be drawing all the time and avoid screen tearing. The only downsides are the need for twice the memory and the extra frame of latency.

Add the framebuffer module with double buffering - [framebuffer_db.sv]:

module framebuffer_db #(
    parameter CORDW=16,      // signed coordinate width (bits)
    parameter WIDTH=160,     // width of framebuffer in pixels
    parameter HEIGHT=120,    // height of framebuffer in pixels
    parameter CIDXW=4,       // colour index data width: 4=16, 8=256 colours
    parameter CHANW=4,       // width of RGB colour channels (4 or 8 bit)
    parameter SCALE=4,       // display output scaling factor (>=1)
    parameter F_IMAGE="",    // image file to load into framebuffer
    parameter F_PALETTE=""   // palette file to load into CLUT
    ) (
    input  wire logic clk_sys,    // system clock
    input  wire logic clk_pix,    // pixel clock
    input  wire logic rst_sys,    // reset (clk_sys)
    input  wire logic rst_pix,    // reset (clk_pix)
    input  wire logic de,         // data enable for display (clk_pix)
    input  wire logic frame,      // start a new frame (clk_pix)
    input  wire logic line,       // start a new screen line (clk_pix)
    input  wire logic we,         // write enable
    input  wire logic signed [CORDW-1:0] x,  // horizontal pixel coordinate
    input  wire logic signed [CORDW-1:0] y,  // vertical pixel coordinate
    input  wire logic [CIDXW-1:0] cidx,   // framebuffer colour index
    input  wire logic [CIDXW-1:0] bgidx,  // framebuffer background colour index
    input  wire logic clear,              // clear write buffer on frame start
    output      logic wready,             // ready for writing
    output      logic clip,               // pixel coordinate outside buffer
    output      logic [CHANW-1:0] red,    // colour output to display (clk_pix)
    output      logic [CHANW-1:0] green,  //     "    "    "    "    "
    output      logic [CHANW-1:0] blue    //     "    "    "    "    "
    );

    logic frame_sys;  // start of new frame in system clock domain
    xd xd_frame (.clk_i(clk_pix), .clk_o(clk_sys),
                 .rst_i(rst_pix), .rst_o(rst_sys), .i(frame), .o(frame_sys));

    // buffer selection
    logic front_buf;
    always @(posedge clk_sys) begin
        if (frame_sys) front_buf <= ~front_buf;  // swap every frame
        if (rst_sys) front_buf <= 0;
    end

    // framebuffer (FB)
    localparam FB_PIXELS = WIDTH * HEIGHT;
    localparam FB_DEPTH  = 2 * FB_PIXELS;  // double buffer
    localparam FB_ADDRW  = $clog2(FB_DEPTH);
    localparam FB_DATAW  = CIDXW;

    logic [FB_ADDRW-1:0] fb_addr_read, fb_addr_write;
    logic [FB_DATAW-1:0] fb_cidx_read, fb_cidx_read_p1;

    // write address components
    logic signed [CORDW-1:0] x_add;     // pixel position on line
    logic [FB_ADDRW-1:0] fb_addr_line;  // address of line for writing
    logic [FB_ADDRW-1:0] fb_addr_clr;   // address for clearing screen

    // write state machine
    enum {IDLE, INIT, CLR, ACTIVE} wstate;
    initial wstate = IDLE;  // needed for Yosys
    always @(posedge clk_sys) begin
        case (wstate)
            INIT: wstate <= (clear) ? CLR : ACTIVE;
            CLR: if (fb_addr_clr == FB_PIXELS-1) wstate <= ACTIVE;
            default: if (frame_sys) wstate <= INIT;  // IDLE or ACTIVE
        endcase
    end

    always_comb wready = (wstate == ACTIVE);

    // calculate write address from pixel coordinates (two stage: mul then add)
    always_ff @(posedge clk_sys) begin
        fb_addr_line <= WIDTH * y;  // write address 1st stage
        x_add <= x;  // save x for write address 2nd stage
        fb_addr_clr <= (wstate == INIT) ? 0 : fb_addr_clr + 1;
        fb_addr_write <= (wstate == CLR) ? fb_addr_clr : fb_addr_line + x_add;
    end

    // draw colour and write enable (delay to match address calculation)
    logic fb_we, we_in_p1;
    logic [FB_DATAW-1:0] fb_cidx_write, cidx_in_p1;
    always_ff @(posedge clk_sys) begin
        we_in_p1 <= (we || (wstate == CLR));  // write enable for input or clear
        cidx_in_p1 <= (wstate == CLR) ? bgidx : cidx;  // which draw colour?
        clip <= (y < 0 || y >= HEIGHT || x < 0 || x >= WIDTH);  // clipped?
        fb_we <= (clip) ? 0 : we_in_p1;  // write enable if not clipped
        fb_cidx_write <= cidx_in_p1;
    end

    // add offset to read and write addresses to match buffer used
    logic [FB_ADDRW-1:0] fb_addr_read_offs, fb_addr_write_offs;
    always_comb begin  // this could be a performance bottleneck
        fb_addr_read_offs  = fb_addr_read  + ((front_buf) ? FB_PIXELS : 0);
        fb_addr_write_offs = fb_addr_write + ((front_buf) ? 0 : FB_PIXELS);
    end

    // framebuffer memory (BRAM)
    bram_sdp #(
        .WIDTH(FB_DATAW),
        .DEPTH(FB_DEPTH),
        .INIT_F(F_IMAGE)
    ) bram_inst (
        .clk_write(clk_sys),
        .clk_read(clk_sys),
        .we(fb_we),
        .addr_write(fb_addr_write_offs),
        .addr_read(fb_addr_read_offs),
        .data_in(fb_cidx_write),
        .data_out(fb_cidx_read)
    );

    // linebuffer (LB)
    localparam LB_SCALE = SCALE;  // scale (horizontal and vertical)
    localparam LB_LEN   = WIDTH;  // line length matches framebuffer
    localparam LB_BPC   = CHANW;  // bits per colour channel

    // Load data from FB into LB
    logic lb_data_req;  // LB requesting data
    logic [$clog2(LB_LEN+1)-1:0] cnt_h;  // count pixels in line to read
    always_ff @(posedge clk_sys) begin
        if (fb_addr_read < FB_PIXELS-1) begin
            if (lb_data_req) begin
                cnt_h <= 0;  // start new line
            end else if (cnt_h < LB_LEN) begin  // advance to start of next line
                cnt_h <= cnt_h + 1;
                fb_addr_read <= fb_addr_read + 1;
            end
        end else cnt_h <= LB_LEN;
        if (frame_sys) fb_addr_read <= 0;  // new frame
        if (rst_sys) begin
            fb_addr_read <= 0;
            cnt_h <= LB_LEN;  // don't start reading after reset
        end
    end

    // LB enable (not corrected for latency)
    logic lb_en_in, lb_en_out;
    always_comb lb_en_in  = cnt_h < LB_LEN;
    always_comb lb_en_out = de;

    // LB enable in: address calc and CLUT reg add three cycles of latency
    localparam LAT = 3;  // write latency
    logic [LAT-1:0] lb_en_in_sr;
    always @(posedge clk_sys) begin
        lb_en_in_sr <= {lb_en_in, lb_en_in_sr[LAT-1:1]};
        if (rst_sys) lb_en_in_sr <= 0;
    end

    // LB colour channels
    logic [LB_BPC-1:0] lb_in_0,  lb_in_1,  lb_in_2;
    logic [LB_BPC-1:0] lb_out_0, lb_out_1, lb_out_2;

    linebuffer #(
        .WIDTH(LB_BPC),   // data width of each channel
        .LEN(LB_LEN),     // length of line
        .SCALE(LB_SCALE)  // scaling factor (>=1)
        ) lb_inst (
        .clk_in(clk_sys),        // input clock
        .clk_out(clk_pix),       // output clock
        .rst_in(rst_sys),        // reset (clk_in)
        .rst_out(rst_pix),       // reset (clk_out)
        .data_req(lb_data_req),  // request input data (clk_in)
        .en_in(lb_en_in_sr[0]),  // enable input (clk_in)
        .en_out(lb_en_out),      // enable output (clk_out)
        .frame,                  // start a new frame (clk_out)
        .line,                   // start a new line (clk_out)
        .din_0(lb_in_0),         // data in (clk_in)
        .din_1(lb_in_1),
        .din_2(lb_in_2),
        .dout_0(lb_out_0),       // data out (clk_out)
        .dout_1(lb_out_1),
        .dout_2(lb_out_2)
    );

    // improve timing with register between BRAM and async ROM
    always @(posedge clk_sys) fb_cidx_read_p1 <= fb_cidx_read;

    // colour lookup table (ROM)
    localparam CLUTW = 3 * CHANW;
    logic [CLUTW-1:0] clut_colr;
    rom_async #(
        .WIDTH(CLUTW),
        .DEPTH(2**CIDXW),
        .INIT_F(F_PALETTE)
    ) clut (
        .addr(fb_cidx_read_p1),
        .data(clut_colr)
    );

    // map colour index to palette using CLUT and read into LB
    always_ff @(posedge clk_sys) {lb_in_2, lb_in_1, lb_in_0} <= clut_colr;

    logic lb_en_out_p1;  // LB enable out: reading from LB BRAM takes one cycle
    always_ff @(posedge clk_pix) lb_en_out_p1 <= lb_en_out;

    // colour output - combinational because top module should register
    always_comb begin
        red   = lb_en_out_p1 ? lb_out_2 : 0;
        green = lb_en_out_p1 ? lb_out_1 : 0;
        blue  = lb_en_out_p1 ? lb_out_0 : 0;
    end
endmodule

This double-buffered design is only 30 lines longer than the original framebuffer module.

There’s a test bench you can use to exercise the module with Vivado: [xc7/framebuffer_db_tb.sv].

Details of the changes will be added shortly.

Hip to be Square Redux

We can cleanly animate a square using our new double-buffered framebuffer:

That seems like a lot of work to replicate what we did in a few lines back at the start of the series, but drawing shapes in a framebuffer is far more versatile.

Tunnel Vision

Using our new framebuffer and a few animated rectangles, we can create the illusion of flying down a tunnel:

  • Arty (XC7): [top_tunnel.sv]
  • iCEBreaker (iCE40): not yet available

The Arty version of the tunnel looks like this:

module top_tunnel (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 16;
    logic hsync, vsync;
    logic de, frame, line;
    display_timings_480p #(.CORDW(CORDW)) display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for pixel clock lock
        /* verilator lint_off PINCONNECTEMPTY */
        .sx(),
        .sy(),
        /* verilator lint_on PINCONNECTEMPTY */
        .hsync,
        .vsync,
        .de,
        .frame,
        .line
    );

    logic frame_sys;  // start of new frame in system clock domain
    xd xd_frame (.clk_i(clk_pix), .clk_o(clk_100m),
                 .rst_i(1'b0), .rst_o(1'b0), .i(frame), .o(frame_sys));

    // framebuffer (FB)
    localparam FB_WIDTH   = 320;
    localparam FB_HEIGHT  = 240;
    localparam FB_CIDXW   = 4;
    localparam FB_CHANW   = 4;
    localparam FB_SCALE   = 2;
    localparam FB_IMAGE   = "";
    localparam FB_PALETTE = "tunnel_16_colr_4bit_palette.mem";

    logic fb_we, fb_wready;
    logic signed [CORDW-1:0] fbx, fby;  // framebuffer coordinates
    logic [FB_CIDXW-1:0] fb_cidx;
    logic [FB_CHANW-1:0] fb_red, fb_green, fb_blue;  // colours for display

    framebuffer_db #(
        .WIDTH(FB_WIDTH),
        .HEIGHT(FB_HEIGHT),
        .CIDXW(FB_CIDXW),
        .CHANW(FB_CHANW),
        .SCALE(FB_SCALE),
        .F_IMAGE(FB_IMAGE),
        .F_PALETTE(FB_PALETTE)
    ) fb_inst (
        .clk_sys(clk_100m),
        .clk_pix(clk_pix),
        .rst_sys(1'b0),
        .rst_pix(1'b0),
        .de,
        .frame,
        .line,
        .we(fb_we),
        .x(fbx),
        .y(fby),
        .cidx(fb_cidx),
        .bgidx(0),
        .clear(0),  // tunnel doesn't need clearing
        .wready(fb_wready),
        /* verilator lint_off PINCONNECTEMPTY */
        .clip(),
        /* verilator lint_on PINCONNECTEMPTY */
        .red(fb_red),
        .green(fb_green),
        .blue(fb_blue)
    );

    // animation steps
    localparam ANIM_CNT=5;    // five different frames in animation
    localparam ANIM_SPEED=5;  // display each animation step five times (12 FPS)
    logic [$clog2(ANIM_CNT)-1:0] cnt_anim;
    logic [$clog2(ANIM_SPEED)-1:0] cnt_anim_speed;
    logic [FB_CIDXW-1:0] colr_offs;  // colour offset
    always @(posedge clk_100m) begin
        if (frame_sys) begin
            if (cnt_anim_speed == ANIM_SPEED-1) begin
                if (cnt_anim == ANIM_CNT-1) begin
                    cnt_anim <= 0;
                    colr_offs <= colr_offs + 1;
                end else cnt_anim <= cnt_anim + 1;
                cnt_anim_speed <= 0;
            end else cnt_anim_speed <= cnt_anim_speed + 1;
        end
    end

    // draw squares in framebuffer
    localparam SHAPE_CNT=7;  // number of shapes to draw
    logic [3:0] shape_id;  // shape identifier
    logic [CORDW-1:0] dx0, dy0, dx1, dy1;  // shape coords
    logic draw_start, drawing, draw_done;  // drawing signals

    // draw state machine
    enum {IDLE, INIT, CLEAR, DRAW, DONE} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk_100m) begin
        draw_start <= 0;
        case (state)
            INIT: begin  // register coordinates and colour
                if (fb_wready) begin
                    draw_start <= 1;
                    state <= DRAW;
                    case (shape_id)
                        4'd0: begin
                            dx0 <=  40 - cnt_anim * 12;
                            dy0 <=   0 - cnt_anim * 12;
                            dx1 <= 279 + cnt_anim * 12;
                            dy1 <= 279 + cnt_anim * 12;
                            fb_cidx <= colr_offs;
                        end
                        4'd1: begin  // 8 pixels per anim step
                            dx0 <=  80 - cnt_anim * 8;
                            dy0 <=  40 - cnt_anim * 8;
                            dx1 <= 239 + cnt_anim * 8;
                            dy1 <= 199 + cnt_anim * 8;
                            fb_cidx <= colr_offs + 1;
                        end
                        4'd2: begin  // 5 pixels per anim step
                            dx0 <= 105 - cnt_anim * 5;
                            dy0 <=  65 - cnt_anim * 5;
                            dx1 <= 214 + cnt_anim * 5;
                            dy1 <= 174 + cnt_anim * 5;
                            fb_cidx <= colr_offs + 2;
                        end
                        4'd3: begin  // 4 pixels per anim step
                            dx0 <= 125 - cnt_anim * 4;
                            dy0 <=  85 - cnt_anim * 4;
                            dx1 <= 194 + cnt_anim * 4;
                            dy1 <= 154 + cnt_anim * 4;
                            fb_cidx <= colr_offs + 3;
                        end
                        4'd4: begin  // 3 pixels per anim step
                            dx0 <= 140 - cnt_anim * 3;
                            dy0 <= 100 - cnt_anim * 3;
                            dx1 <= 179 + cnt_anim * 3;
                            dy1 <= 139 + cnt_anim * 3;
                            fb_cidx <= colr_offs + 4;
                        end
                        4'd5: begin  // 2 pixels per anim step
                            dx0 <= 150 - cnt_anim * 2;
                            dy0 <= 110 - cnt_anim * 2;
                            dx1 <= 169 + cnt_anim * 2;
                            dy1 <= 129 + cnt_anim * 2;
                            fb_cidx <= colr_offs + 5;
                        end
                        4'd6: begin  // 1 pixel per anim step
                            dx0 <= 155 - cnt_anim * 1;
                            dy0 <= 115 - cnt_anim * 1;
                            dx1 <= 164 + cnt_anim * 1;
                            dy1 <= 124 + cnt_anim * 1;
                            fb_cidx <= colr_offs + 6;
                        end
                        default: begin  // should never occur
                            dx0 <=  10; dy0 <=  10;
                            dx1 <=  20; dy1 <=  20;
                            fb_cidx <= 4'h7;  // white
                        end
                    endcase
                end
            end
            DRAW: if (draw_done) begin
                if (shape_id == SHAPE_CNT-1) begin
                    state <= DONE;
                end else begin
                    shape_id <= shape_id + 1;
                    state <= INIT;
                end
            end
            DONE: state <= IDLE;
            default: if (frame_sys) begin  // IDLE
                state <= INIT;
                shape_id <= 0;
            end
        endcase
    end

    draw_rectangle_fill #(.CORDW(CORDW)) draw_rectangle_inst (
        .clk(clk_100m),
        .rst(1'b0),
        .start(draw_start),
        .oe(1'b1),
        .x0(dx0),
        .y0(dy0),
        .x1(dx1),
        .y1(dy1),
        .x(fbx),
        .y(fby),
        .drawing,
        .done(draw_done)
    );

    // write to framebuffer when drawing
    always_comb fb_we = drawing;

    // reading from FB takes one cycle: delay display signals to match
    logic hsync_p1, vsync_p1;
    always_ff @(posedge clk_pix) begin
        hsync_p1 <= hsync;
        vsync_p1 <= vsync;
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync_p1;
        vga_vsync <= vsync_p1;
        vga_r <= fb_red;
        vga_g <= fb_green;
        vga_b <= fb_blue;
    end
endmodule

Explore

I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:

Suggestions will be added soon

Next Time

Next time we move into the third dimension with wireframe models, including the Blender monkey and Utah teapot (draft coming soon).

Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.

©2021 Will Green, Project F