Project F

Lines and Triangles

Published · Updated

Welcome back to Exploring FPGA Graphics. It’s time to turn our attention to drawing. Most modern computer graphics come down to drawing triangles and colouring them in. So, it seems fitting to begin our drawing tour with triangles and the straight lines that form them. This post will implement Bresenham’s line algorithm in Verilog and create lines, triangles, and even a cube (our first sort-of 3D).

In this series, we learn about graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how screens work, play Pong, create starfields and sprites, paint Michelangelo’s David, draw lines and triangles, and animate characters and shapes. New to the series? Start with Beginning FPGA Graphics.

Share your thoughts with @WillFlux on Mastodon or Twitter. If you like what I do, sponsor me. 🙏

Series Outline

Requirements

You should be to run these designs on any recent FPGA board. I include everything you need for the iCEBreaker with 12-Bit DVI Pmod, Digilent Arty A7-35T with Pmod VGA, Digilent Nexys Video with on-board HDMI output, and Verilator Simulation with SDL. See requirements from Beginning FPGA Graphics for more details.

This post builds on Framebuffers; I recommend reading that post first if you haven’t already.

New Screen Resolution

Choosing your graphics resolution and colour depth are critical decisions in any graphics design. We want a high resolution for this post, but we want to continue using on-FPGA ram for simplicity. Adopting a 16:9 aspect ratio makes scaling on modern displays simpler while keeping memory usage manageable (320x180 < 216).

320 x 180 in 16 colours

  • 320 x 180 has 57,600 pixels
  • 16 colours require four bits per pixel
  • 256 kb (32 kilobytes) buffer required
  • Scale up 2x for 640x480 with letterbox
  • Scale up 4x for 1280x720 fullscreen

iCEBreaker: 160x90 in 4 colours

  • 160 x 90 has 14,400 pixels
  • 4 colours require two bits per pixel
  • 32 Kb (4 kilobytes) buffer required
  • Scale up 4x for 640x480 with letterbox
  • Scale up 8x for 1280x720 fullscreen

We use a lower resolution on iCEBreaker to continue storing the framebuffer in BRAM. iCE40UP SPRAM is large enough to handle 320x180 but I haven’t added support yet [issue #133]. If you’re already familiar with SPRAM, feel free to use the 320x180 designs with iCEBreaker: the new render modules (discussed later) make this straightforward.

An Address for Every Pixel

A framebuffer is two-dimensional, with an X and Y coordinate for every pixel. Memory addresses are one-dimensional. We need to map a framebuffer coordinate to a memory address [bitmap_addr.sv]:

module bitmap_addr #(
    parameter CORDW=16,  // signed coordinate width (bits)
    parameter ADDRW=24   // address width (bits)
    ) (
    input  wire logic clk,                      // clock
    input  wire logic signed [CORDW-1:0] bmpw,  // bitmap width
    input  wire logic signed [CORDW-1:0] bmph,  // bitmap height
    input  wire logic signed [CORDW-1:0] x,     // horizontal pixel coordinate
    input  wire logic signed [CORDW-1:0] y,     // vertical pixel coordinate
    input  wire logic signed [CORDW-1:0] offx,  // horizontal offset
    input  wire logic signed [CORDW-1:0] offy,  // vertical offset
    output      logic [ADDRW-1:0] addr,         // pixel memory address
    output      logic clip                      // pixel coordinate outside bitmap
    );

    logic signed [CORDW-1:0] addr_y1, addr_x1, addr_x2;
    logic [ADDRW-1:0] addr_mul;
    logic clip_t1;  // clip check temporary

    always_ff @(posedge clk) begin
        // step 1
        addr_y1 <= y + offy;
        addr_x1 <= x + offx;

        // step 2
        addr_mul <= bmpw * addr_y1;
        addr_x2  <= addr_x1;
        clip_t1  <= (addr_x1 < 0 || addr_x1 > bmpw-1 || addr_y1 < 0 || addr_y1 > bmph-1);

        // step 3
        clip <= clip_t1;
        addr <= addr_mul + addr_x2;
    end
endmodule

This module calculates the memory address for the position (x,y) with an optional offset (offx,offy). The offset lets you quickly move graphics without adjusting your drawing logic. The clip output signals when the requested coordinate is outside the bitmap.

The address calculation has a latency of three cycles; we will need to account for this when drawing.

Screen Coordinates: Up and Down
Our coordinate system has the origin (0,0) at the top-left of the screen, and the Y-coordinate increases down the screen. Many 3D systems, such as OpenGL, have the origin at the bottom-left, and the Y-coordinate increases up the screen.

Addressing Performance

We could have performed the calculation in one step:

addr <= bmpw * (y + offy) + x + offx;

But performance is much worse than the multi-step design. The address calculation becomes the performance bottleneck for our drawing designs.

This is a valuable reminder of how hardware design differs from software design.

The one-step calculation generates a hardware design that performs two additions and multiplication in a single clock cycle. Other things being equal, the more complex a single step, the lower the maximum frequency.

With software, the compiler generates a series of CPU instructions from your source code. Whether you write the calculation as one complex statement or multiple simpler statements has little to no impact on the code generated by the compiler.

From Point to Line

We can now draw a point, but we commonly want to draw a line between two points. Bresenham’s line algorithm is the definitive way to do this, and The Beauty of Bresenham’s Algorithm has just what we need: a clearly written version of the algorithm using integers.

Here’s the C design:

void plotLine(int x0, int y0, int x1, int y1)
{
   int dx =  abs(x1-x0), sx = x0<x1 ? 1 : -1;
   int dy = -abs(y1-y0), sy = y0<y1 ? 1 : -1;
   int err = dx+dy, e2; /* error value e_xy */

   for(;;){  /* loop */
      setPixel(x0,y0);
      if (x0==x1 && y0==y1) break;
      e2 = 2*err;
      if (e2 >= dy) { err += dy; x0 += sx; } /* e_xy+e_x > 0 */
      if (e2 <= dx) { err += dx; y0 += sy; } /* e_xy+e_y < 0 */
   }
}

For the hows and whys, read A Rasterizing Algorithm for Drawing Curves (PDF). Kudos to Alois Zingl.

From C to Verilog

There are two stages: calculating the initial values and running the algorithm in the loop.

As initial values, we need the difference between the start and end coordinates and the sign and absolute value of that difference. We can use a little combinational logic to determine the direction without having to calculate absolute values explicitly:

logic signed [CORDW:0] dx, dy;  // a bit wider as signed
logic right, down;  // drawing direction
always_comb begin
    right = (x0 < x1);
    down  = (y0 < y1);
    dx = right ? x1 - x0 : x0 - x1;  // dx =  abs(x1 - x0)
    dy = down  ? y0 - y1 : y1 - y0;  // dy = -abs(y1 - y0)
end

NB. The sign of dy is different from dx; check the C version of the algorithm to see what I mean.

Going Loopy

We can quickly bash out an always_ff block to cover the loop. But this isn’t software; a trap lurks to catch the unwary.

Rewriting the C directly in Verilog, we get the following dubious logic:

always_ff @(posedge clk) begin
    // ...

    if (e2 >= dy) begin
        x <= (right) ? x + 1 : x - 1;
        err <= err + dy;
    end
    if (e2 <= dx) begin
        y <= (down) ? y + 1 : y - 1;
        err <= err + dx;
    end
end

At first glance, it looks OK, and your tools will almost certainly build it without complaint. Experienced Verilog engineers are probably rolling their eyes, but it’s worth discussing when and why this fails.

Consider what happens if (e2 >= dy) and (e2 <= dx) are both true?

x and y are incremented correctly, but err <= err + dy; is ignored. Huh?!

The <= assignment is non-blocking, and non-blocking assignments happen in parallel. The Verilog standard says that if a variable has multiple non-blocking assignments, the last assignment wins.

We can’t calculate the error with just a combinatorial block either: the new error value depends on the previous one (we need to maintain state). Instead, we use a combinational block, with blocking assignment, to calculate the change in error, then add it to the previous value in a clocked always_ff block:

logic signed [CORDW:0] err, derr;
logic movx, movy;  // move in x and/or y required
always_comb begin
    movx = (2*err >= dy);
    movy = (2*err <= dx);
    derr = movx ? dy : 0;
    if (movy) derr = derr + dx;
end

always_ff @(posedge clk) begin
    // ...

    if (movx) x <= right ? x + 1 : x - 1;
    if (movy) y <= down  ? y + 1 : y - 1;
    err <= err + derr;
end

The two blocking assignments to derr happen one after the other.

Note how we’ve also eliminated the need for e2, replacing it with 2*err in our comparisons.

Our first attempt at a line drawing module:

module draw_line #(parameter CORDW=16) (  // framebuffer coord width in bits
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start line drawing
    input  wire logic signed [CORDW-1:0] x0,  // point 0 - horizontal position
    input  wire logic signed [CORDW-1:0] y0,  // point 0 - vertical position
    input  wire logic signed [CORDW-1:0] x1,  // point 1 - horizontal position
    input  wire logic signed [CORDW-1:0] y1,  // point 1 - vertical position
    output      logic signed [CORDW-1:0] x,   // horizontal drawing position
    output      logic signed [CORDW-1:0] y,   // vertical drawing position
    output      logic drawing,         // line is drawing
    output      logic done             // line complete (high for one tick)
    );

    // line properties
    logic signed [CORDW:0] dx, dy;  // a bit wider as signed
    logic right, down;  // drawing direction
    always_comb begin
        right = (x0 < x1);
        down  = (y0 < y1);
        dx = right ? x1 - x0 : x0 - x1;  // dx =  abs(x1 - x0)
        dy = down  ? y0 - y1 : y1 - y0;  // dy = -abs(y1 - y0)
    end

    // error values
    logic signed [CORDW:0] err, derr;
    logic movx, movy;  // move in x and/or y required
    always_comb begin
        movx = (2*err >= dy);
        movy = (2*err <= dx);
        derr = movx ? dy : 0;
        if (movy) derr = derr + dx;
    end

    // draw state machine
    enum {IDLE, DRAW} state;  // we're either idle or drawing
    always_comb drawing = (state == DRAW);

    always_ff @(posedge clk) begin
        case (state)
            DRAW: begin
                if (x == x1 && y == y1) begin
                    done <= 1;
                    state <= IDLE;
                end else begin
                    if (movx) x <= right ? x + 1 : x - 1;
                    if (movy) y <= down  ? y + 1 : y - 1;
                    err <= err + derr;
                end
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin
                    err <= dx + dy;
                    x <= x0;
                    y <= y0;
                    state <= DRAW;
                end
            end
        endcase

        if (rst) begin
            done <= 0;
            state <= IDLE;
        end
    end
endmodule

We’ve got a good start here, but our module has significant problems we should tackle.

Oh dear! I shall be too late!

Line drawing crops up all over the place; if it’s slow, it’ll be a significant performance bottleneck.

Our current line drawing design makes direct use of relatively complex combinational logic. For example, we use movy to control our vertical drawing. movy depends on dx, which depends on right. All these signals are purely combinational, with nothing stored in registers. Unsurprisingly, my tests showed this path was the limiting factor for line drawing speed.

Our first improvement is straightforward: we register dx and dy in an always_ff block. Even better, because dx and dy don’t change for a given line, we only have to do this once and don’t suffer a latency penalty:

always_comb begin
    right = (x0 < x1);
    down  = (y0 < y1);
end

always_ff @(posedge clk) begin
    dx <= right ? x1 - x0 : x0 - x1;  // dx =  abs(x1 - x0)
    dy <= down  ? y0 - y1 : y1 - y0;  // dy = -abs(y1 - y0)
end

We can further improve timing by removing the combinational derr and using dx and dy directly in the main always_ff block:

    DRAW: begin
        if (x == xb && y == yb) begin
            done <= 1;
            state <= IDLE;
        end else begin
            if (movx) begin
                x <= right ? x + 1 : x - 1;
                err <= err + dy;
            end
            if (movy) begin
                y <= y + 1;  // always down
                err <= err + dx;
            end
            if (movx && movy) begin
                x <= right ? x + 1 : x - 1;
                y <= y + 1;  // always down
                err <= err + dy + dx;
            end
        end
    end

This Verilog seems overly verbose compared to the combinational derr, but the timing is much better on simpler FPGAs, such as the iCE40. For example, the cube design we will discuss shortly improves from 22 MHz to 28 MHz with these changes.

With experience, you’ll get a feel for when registering a signal makes sense. For example, back in 2020, I learnt that iCE40 subtraction takes two layers of logic, making registering the initial line values all the more valuable. Both Vivado (Arty) and nextpnr (iCEBreaker) provide timing reports to help you improve the performance of your designs.

Breaking Symmetry

Bresenham’s line algorithm is not symmetrical: drawing from (x0,y0) to (x1,y1) is not necessarily the same as drawing from (x1,y1) to (x0,y0).

For example, I drew the triangle (2,2) (6,2) (4,6) clockwise then anticlockwise:

Two triangles that should be identical but aren’t

Variations in rendering may not matter if you’re drawing a single shape, but what happens if we draw two shapes next to each other? We don’t want any gaps between the shapes. To ensure one unique rendering of the line (x0,y0) to (x1,y1), we need a consistent way to order the points. I have chosen to draw down the screen; that is, with the y-coordinate increasing. To achieve this, we look at the y-coordinates and swap them if y0 is greater than y1.

That leaves horizontal lines: the y-coordinate is the same for both points in this case. However, it does not matter which direction we draw horizontal lines: Bresenham’s line algorithm is the same in both directions.

The swapping logic looks like this:

    // line properties
    logic swap;   // swap points to ensure y1 >= y0
    logic right;  // drawing direction
    logic signed [CORDW-1:0] xa, ya;  // start point
    logic signed [CORDW-1:0] xb, yb;  // end point
    logic signed [CORDW-1:0] x_end, y_end;  // register end point
    always_comb begin
        swap = (y0 > y1);  // swap points if y0 is below y1
        xa = swap ? x1 : x0;
        xb = swap ? x0 : x1;
        ya = swap ? y1 : y0;
        yb = swap ? y0 : y1;
    end

If we use these new combinational signals directly, our timing will suffer. To avoid this, we can register the end coordinate and drawing direction:

always_ff @(posedge clk) begin
    // ...
    x_end <= xb;
    y_end <= yb;

    // ...
    right <= (xa < xb);  // draw right to left?

Ready to Draw

We’re now ready to use our improved line drawing module [draw_line.sv]:

module draw_line #(parameter CORDW=16) (  // signed coordinate width
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start line drawing
    input  wire logic oe,              // output enable
    input  wire logic signed [CORDW-1:0] x0, y0,  // point 0
    input  wire logic signed [CORDW-1:0] x1, y1,  // point 1
    output      logic signed [CORDW-1:0] x,  y,   // drawing position
    output      logic drawing,         // actively drawing
    output      logic busy,            // drawing request in progress
    output      logic done             // drawing is complete (high for one tick)
    );

    // line properties
    logic swap;   // swap points to ensure y1 >= y0
    logic right;  // drawing direction
    logic signed [CORDW-1:0] xa, ya;  // start point
    logic signed [CORDW-1:0] xb, yb;  // end point
    logic signed [CORDW-1:0] x_end, y_end;  // register end point
    always_comb begin
        swap = (y0 > y1);  // swap points if y0 is below y1
        xa = swap ? x1 : x0;
        xb = swap ? x0 : x1;
        ya = swap ? y1 : y0;
        yb = swap ? y0 : y1;
    end

    // error values
    logic signed [CORDW:0] err;  // a bit wider as signed
    logic signed [CORDW:0] dx, dy;
    logic movx, movy;  // horizontal/vertical move required
    always_comb begin
        movx = (2*err >= dy);
        movy = (2*err <= dx);
    end

    // draw state machine
    enum {IDLE, INIT_0, INIT_1, DRAW} state;
    always_comb drawing = (state == DRAW && oe);

    always_ff @(posedge clk) begin
        case (state)
            DRAW: begin
                if (oe) begin
                    if (x == x_end && y == y_end) begin
                        state <= IDLE;
                        busy <= 0;
                        done <= 1;
                    end else begin
                        if (movx) begin
                            x <= right ? x + 1 : x - 1;
                            err <= err + dy;
                        end
                        if (movy) begin
                            y <= y + 1;  // always down
                            err <= err + dx;
                        end
                        if (movx && movy) begin
                            x <= right ? x + 1 : x - 1;
                            y <= y + 1;
                            err <= err + dy + dx;
                        end
                    end
                end
            end
            INIT_0: begin
                state <= INIT_1;
                dx <= right ? xb - xa : xa - xb;  // dx = abs(xb - xa)
                dy <= ya - yb;  // dy = -abs(yb - ya)
            end
            INIT_1: begin
                state <= DRAW;
                err <= dx + dy;
                x <= xa;
                y <= ya;
                x_end <= xb;
                y_end <= yb;
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin
                    state <= INIT_0;
                    right <= (xa < xb);  // draw right to left?
                    busy <= 1;
                end
            end
        endcase

        if (rst) begin
            state <= IDLE;
            busy <= 0;
            done <= 0;
        end
    end
endmodule

The pixel to draw is output as (x,y), and the line coordinates are input as (x0,y0) and (x1,y1). A high start signal begins drawing, and drawing completion is marked by done (high for one tick). An output enable signal, oe, allows you to pause drawing, which is handy for multiplexing memory access or slowing down the action to make it visible.

There’s a test bench you can use to exercise the module with Vivado: [xc7/draw_line_tb.sv].

We test several lines: steep and not steep, drawn upwards, downwards, left to right, and right to left, points, and the longest possible horizontal, vertical, and diagonal lines. A steep line is one in which the vertical change is larger than the horizontal.

Render Module

In previous posts, we created a separate top module for every demo. Now that our designs are getting a little more complex, breaking each demo rendering into a module makes sense. That way, we can reuse the same top module for multiple demos.

To support a wide range of FPGAs, I’ve created two sets of rendering modules:

  • 320x180 in 16 colours (256 kb per buffer) - used for Arty and Verilator
  • 160x90 in 4 colours (32 kb per buffer) - used for iCEBreaker

Our first rendering module draws a single diagonal line:

The 16-colour 320x180 version is shown below:

module render_line #(
    parameter CORDW=16,  // signed coordinate width (bits)
    parameter CIDXW=4,   // colour index width (bits)
    parameter SCALE=1    // drawing scale: 1=320x180, 2=640x360, 4=1280x720
    ) (
    input  wire logic clk,    // clock
    input  wire logic rst,    // reset
    input  wire logic oe,     // output enable
    input  wire logic start,  // start drawing
    output      logic signed [CORDW-1:0] x,  // horizontal draw position
    output      logic signed [CORDW-1:0] y,  // vertical draw position
    output      logic [CIDXW-1:0] cidx,  // pixel colour
    output      logic drawing,  // actively drawing
    output      logic done      // drawing is complete (high for one tick)
    );

    logic signed [CORDW-1:0] vx0, vy0, vx1, vy1;  // line coords
    logic draw_start, draw_done;  // drawing signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    always_ff @(posedge clk) begin
        case (state)
            INIT: begin  // register coordinates and colour
                vx0 <=  70; vy0 <=   0;
                vx1 <= 249; vy1 <= 179;
                cidx <= 'h3;  // colour index
                draw_start <= 1;
                state <= DRAW;
            end
            DRAW: begin
                draw_start <= 0;
                if (draw_done) state <= DONE;
            end
            DONE: state <= DONE;
            default: if (start) state <= INIT;  // IDLE
        endcase
        if (rst) state <= IDLE;
    end

    draw_line #(.CORDW(CORDW)) draw_line_inst (
        .clk,
        .rst,
        .start(draw_start),
        .oe,
        .x0(vx0 * SCALE),
        .y0(vy0 * SCALE),
        .x1(vx1 * SCALE),
        .y1(vy1 * SCALE),
        .x,
        .y,
        .drawing,
        .busy(),
        .done(draw_done)
    );

    always_comb done = (state == DONE);
endmodule

The render module takes a SCALE parameter with which you can scale up the drawing. For example, if you have a 640x360 framebuffer, set SCALE=2 to fill the screen with the 320x180/render_line.sv design.

Demo Driver

It’s time to get drawing with actual hardware.

Plugging our rendering module into the top design from framebuffers, we get our line demo:

Building the Designs
In the Lines and Triangles section of the git repo, you’ll find the design files, a makefile for iCEBreaker and Verilator, and a Vivado project for Xilinx-based boards. There are also build instructions for boards and simulations.

The Verilator version of top_demo looks like this:

module top_demo #(parameter CORDW=16) (  // signed coordinate width (bits)
    input  wire logic clk_pix,      // pixel clock
    input  wire logic rst_pix,      // sim reset
    output      logic signed [CORDW-1:0] sdl_sx,  // horizontal SDL position
    output      logic signed [CORDW-1:0] sdl_sy,  // vertical SDL position
    output      logic sdl_de,       // data enable (low in blanking interval)
    output      logic sdl_frame,    // high at start of frame
    output      logic [7:0] sdl_r,  // 8-bit red
    output      logic [7:0] sdl_g,  // 8-bit green
    output      logic [7:0] sdl_b   // 8-bit blue
    );

    // system clock is the same as pixel clock in simulation
    logic clk_sys, rst_sys;
    always_comb begin
        clk_sys = clk_pix;
        rst_sys = rst_pix;
    end

    // display sync signals and coordinates
    logic signed [CORDW-1:0] sx, sy;
    logic de, frame, line;
    display_480p #(.CORDW(CORDW)) display_inst (
        .clk_pix,
        .rst_pix,
        .sx,
        .sy,
        .hsync(),
        .vsync(),
        .de,
        .frame,
        .line
    );

    // library resource path
    localparam LIB_RES = "../../../lib/res";

    // colour parameters
    localparam CHANW = 4;        // colour channel width (bits)
    localparam COLRW = 3*CHANW;  // colour width: three channels (bits)
    localparam CIDXW = 4;        // colour index width (bits)
    localparam BG_COLR = 'h137;  // background colour
    localparam PAL_FILE = {LIB_RES,"/palettes/sweetie16_4b.mem"};  // palette file

    // framebuffer (FB)
    localparam FB_WIDTH  = 320;  // framebuffer width in pixels
    localparam FB_HEIGHT = 180;  // framebuffer height in pixels
    localparam FB_SCALE  =   2;  // framebuffer display scale (1-63)
    localparam FB_OFFX   =   0;  // horizontal offset
    localparam FB_OFFY   =  60;  // vertical offset
    localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;  // total pixels in buffer
    localparam FB_ADDRW  = $clog2(FB_PIXELS);  // address width
    localparam FB_DATAW  = CIDXW;  // colour bits per pixel

    // pixel read and write addresses and colours
    logic [FB_ADDRW-1:0] fb_addr_write, fb_addr_read;
    logic [FB_DATAW-1:0] fb_colr_write, fb_colr_read;
    logic fb_we;  // framebuffer write enable

    // framebuffer memory
    bram_sdp #(
        .WIDTH(FB_DATAW),
        .DEPTH(FB_PIXELS),
        .INIT_F("")
    ) bram_inst (
        .clk_write(clk_sys),
        .clk_read(clk_sys),
        .we(fb_we),
        .addr_write(fb_addr_write),
        .addr_read(fb_addr_read),
        .data_in(fb_colr_write),
        .data_out(fb_colr_read)
    );

    // display flags in system clock domain
    logic frame_sys, line_sys, line0_sys;
    xd xd_frame (.clk_src(clk_pix), .clk_dst(clk_sys),
        .flag_src(frame), .flag_dst(frame_sys));
    xd xd_line  (.clk_src(clk_pix), .clk_dst(clk_sys),
        .flag_src(line),  .flag_dst(line_sys));
    xd xd_line0 (.clk_src(clk_pix), .clk_dst(clk_sys),
        .flag_src(line && sy==FB_OFFY), .flag_dst(line0_sys));

    //
    // draw in framebuffer
    //

    // reduce drawing speed to make process visible
    localparam FRAME_WAIT = 200;  // wait this many frames to start drawing
    logic [$clog2(FRAME_WAIT)-1:0] cnt_frame_wait;
    logic draw_oe;  // draw requested
    always_ff @(posedge clk_sys) begin
        draw_oe <= 0;  // comment out to draw at full speed
        if (cnt_frame_wait != FRAME_WAIT-1) begin  // wait for initial frames
            if (frame_sys) cnt_frame_wait <= cnt_frame_wait + 1;
        end else if (frame_sys) draw_oe <= 1;  // draw one pixel per frame
    end

    // render line/edge/cube/triangles
    parameter DRAW_SCALE = 1;  // relative to framebuffer dimensions
    logic drawing;  // actively drawing
    logic clip;  // location is clipped
    logic signed [CORDW-1:0] drx, dry;  // draw coordinates
    render_line #(  // switch module name to change demo
        .CORDW(CORDW),
        .CIDXW(CIDXW),
        .SCALE(DRAW_SCALE)
    ) render_instance (
        .clk(clk_sys),
        .rst(rst_sys),
        .oe(draw_oe),
        .start(frame_sys),
        .x(drx),
        .y(dry),
        .cidx(fb_colr_write),
        .drawing,
        .done()
    );

    // calculate pixel address in framebuffer (three-cycle latency)
    bitmap_addr #(
        .CORDW(CORDW),
        .ADDRW(FB_ADDRW)
    ) bitmap_addr_instance (
        .clk(clk_sys),
        .bmpw(FB_WIDTH),
        .bmph(FB_HEIGHT),
        .x(drx),
        .y(dry),
        .offx(0),
        .offy(0),
        .addr(fb_addr_write),
        .clip
    );

    // delay write enable to match address calculation
    localparam LAT_ADDR = 3;  // latency (cycles)
    logic [LAT_ADDR-1:0] fb_we_sr;
    always_ff @(posedge clk_sys) begin
        fb_we_sr <= {drawing, fb_we_sr[LAT_ADDR-1:1]};
        if (rst_sys) fb_we_sr <= 0;
    end
    always_comb fb_we = fb_we_sr[0] && !clip;  // check for clipping

    //
    // read framebuffer for display output via linebuffer
    //

    // count lines for scaling via linebuffer
    logic [$clog2(FB_SCALE):0] cnt_lb_line;
    always_ff @(posedge clk_sys) begin
        if (line0_sys) cnt_lb_line <= 0;
        else if (line_sys) begin
            cnt_lb_line <= (cnt_lb_line == FB_SCALE-1) ? 0 : cnt_lb_line + 1;
        end
    end

    // which screen lines need linebuffer?
    logic lb_line;
    always_ff @(posedge clk_sys) begin
        if (line0_sys) lb_line <= 1;  // enable from sy==0
        if (frame_sys) lb_line <= 0;  // disable at frame start
    end

    // enable linebuffer input
    logic lb_en_in;
    logic [$clog2(FB_WIDTH)-1:0] cnt_lbx;  // horizontal pixel counter
    always_comb lb_en_in = (lb_line && cnt_lb_line == 0 && cnt_lbx < FB_WIDTH);

    // calculate framebuffer read address for linebuffer
    always_ff @(posedge clk_sys) begin
        if (line_sys) begin  // reset horizontal counter at start of line
            cnt_lbx <= 0;
        end else if (lb_en_in) begin  // increment address when LB enabled
            fb_addr_read <= fb_addr_read + 1;
            cnt_lbx <= cnt_lbx + 1;
        end
        if (frame_sys) fb_addr_read <= 0;  // reset address at frame start
    end

    // enable linebuffer output
    logic lb_en_out;
    localparam LAT_LB = 3;  // output latency compensation: lb_en_out+1, LB+1, CLUT+1
    always_ff @(posedge clk_pix) begin
        lb_en_out <= (sy >= FB_OFFY && sy < (FB_HEIGHT * FB_SCALE) + FB_OFFY
            && sx >= FB_OFFX - LAT_LB && sx < (FB_WIDTH * FB_SCALE) + FB_OFFX - LAT_LB);
    end

    // display linebuffer
    logic [FB_DATAW-1:0] lb_colr_out;
    linebuffer_simple #(
        .DATAW(FB_DATAW),
        .LEN(FB_WIDTH)
    ) linebuffer_instance (
        .clk_sys,
        .clk_pix,
        .line,
        .line_sys,
        .en_in(lb_en_in),
        .en_out(lb_en_out),
        .scale(FB_SCALE),
        .data_in(fb_colr_read),
        .data_out(lb_colr_out)
    );

    // colour lookup table (CLUT)
    logic [COLRW-1:0] fb_pix_colr;
    clut_simple #(
        .COLRW(COLRW),
        .CIDXW(CIDXW),
        .F_PAL(PAL_FILE)
        ) clut_instance (
        .clk_write(clk_pix),
        .clk_read(clk_pix),
        .we(0),
        .cidx_write(0),
        .cidx_read(lb_colr_out),
        .colr_in(0),
        .colr_out(fb_pix_colr)
    );

    // paint screen
    logic paint_area;  // area of screen to paint
    logic [CHANW-1:0] paint_r, paint_g, paint_b;  // colour channels
    always_comb begin
        paint_area = (sy >= FB_OFFY && sy < (FB_HEIGHT * FB_SCALE) + FB_OFFY
            && sx >= FB_OFFX && sx < (FB_WIDTH * FB_SCALE) + FB_OFFX);
        {paint_r, paint_g, paint_b} = paint_area ? fb_pix_colr : BG_COLR;
    end

    // display colour: paint colour but black in blanking interval
    logic [CHANW-1:0] display_r, display_g, display_b;
    always_comb {display_r, display_g, display_b} = (de) ? {paint_r, paint_g, paint_b} : 0;

    // SDL output (8 bits per colour channel)
    always_ff @(posedge clk_pix) begin
        sdl_sx <= sx;
        sdl_sy <= sy;
        sdl_de <= de;
        sdl_frame <= frame;
        sdl_r <= {2{display_r}};
        sdl_g <= {2{display_g}};
        sdl_b <= {2{display_b}};
    end
endmodule

The following section will examine the new module. Refer to Framebuffers if you’d like an explanation of the clocks, linebuffer, and the colour lookup table.

Draw Instance

The actual drawing is accomplished by the render_line instance a little after line 100. This is where you can adjust the drawing scale and switch to other rendering modules (discussed shortly).

    // render line/edge/cube/triangles
    parameter DRAW_SCALE = 1;  // relative to framebuffer dimensions
    logic drawing;  // actively drawing
    logic signed [CORDW-1:0] drx, dry;  // draw coordinates
    render_line #(  // switch module name to change demo
        .CORDW(CORDW),
        .CIDXW(CIDXW),
        .SCALE(DRAW_SCALE)
    ) render_instance (
        .clk(clk_sys),
        .rst(rst_sys),
        .oe(draw_oe),
        .start(frame_sys),
        .x(drx),
        .y(dry),
        .cidx(fb_colr_write),
        .drawing,
        .done()
    );

Drawing Speed

Our line-drawing algorithm runs at over 125+ MHz on the Arty. To see how the line is drawn, we can reduce the drawing speed to one pixel per frame (60 pixels per second):

    // reduce drawing speed to make process visible
    localparam FRAME_WAIT = 200;  // wait this many frames to start drawing
    logic [$clog2(FRAME_WAIT)-1:0] cnt_frame_wait;
    logic draw_oe;  // draw requested
    always_ff @(posedge clk_sys) begin
        draw_oe <= 0;  // comment out to draw at full speed
        if (cnt_frame_wait != FRAME_WAIT-1) begin  // wait for initial frames
            if (frame_sys) cnt_frame_wait <= cnt_frame_wait + 1;
        end else if (frame_sys) draw_oe <= 1;  // draw one pixel per frame
    end

The design waits for 200 frames (3.3 seconds) to give the monitor time to show the image before starting to draw. On iCEBreaker, the demo waits for 300 frames (5 seconds) before starting to draw. You can adjust this delay or remove it altogether if you prefer.

To draw at full speed after the delay, comment out draw_oe <= 0;.

Latency Correction

When we discussed address calculation we noted the need for latency correction. We use a shift register to delay the write-enable signal to match the address calculation:

    // delay write enable to match address calculation
    localparam LAT_ADDR = 3;  // latency (cycles)
    logic [LAT_ADDR-1:0] fb_we_sr;
    always_ff @(posedge clk_sys) begin
        fb_we_sr <= {drawing, fb_we_sr[LAT_ADDR-1:1]};
        if (rst_sys) fb_we_sr <= 0;
    end
    always_comb fb_we = fb_we_sr[0] && !clip;  // check for clipping

To avoid drawing artefacts, we disable fb_we if the position is outside the screen area (clipped).

That Ain’t No Cube

If we can draw one line, we can draw many!

Let’s draw a cube as you’ve probably doodled on paper; this requires nine lines.

The 16-colour 320x180 version is shown below:

module render_cube #(
    parameter CORDW=16,  // signed coordinate width (bits)
    parameter CIDXW=4,   // colour index width (bits)
    parameter SCALE=1    // drawing scale: 1=320x180, 2=640x360, 4=1280x720
    ) (
    input  wire logic clk,    // clock
    input  wire logic rst,    // reset
    input  wire logic oe,     // output enable
    input  wire logic start,  // start drawing
    output      logic signed [CORDW-1:0] x,  // horizontal draw position
    output      logic signed [CORDW-1:0] y,  // vertical draw position
    output      logic [CIDXW-1:0] cidx,  // pixel colour
    output      logic drawing,  // actively drawing
    output      logic done      // drawing is complete (high for one tick)
    );

    localparam LINE_CNT=9;  // number of lines to draw
    logic [$clog2(LINE_CNT):0] line_id;  // line identifier
    logic signed [CORDW-1:0] vx0, vy0, vx1, vy1;  // line coords
    logic draw_start, draw_done;  // drawing signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    always_ff @(posedge clk) begin
        case (state)
            INIT: begin  // register coordinates and colour
                draw_start <= 1;
                state <= DRAW;
                cidx <= 'h2;  // colour index
                case (line_id)
                    'd0: begin
                        vx0 <= 130; vy0 <=  60; vx1 <= 230; vy1 <=  60;
                    end
                    'd1: begin
                        vx0 <= 230; vy0 <=  60; vx1 <= 230; vy1 <= 160;
                    end
                    'd2: begin
                        vx0 <= 230; vy0 <= 160; vx1 <= 130; vy1 <= 160;
                    end
                    'd3: begin
                        vx0 <= 130; vy0 <= 160; vx1 <= 130; vy1 <=  60;
                    end
                    'd4: begin
                        vx0 <= 130; vy0 <= 160; vx1 <=  90; vy1 <= 120;
                    end
                    'd5: begin
                        vx0 <=  90; vy0 <= 120; vx1 <=  90; vy1 <=  20;
                    end
                    'd6: begin
                        vx0 <=  90; vy0 <=  20; vx1 <= 130; vy1 <=  60;
                    end
                    'd7: begin
                        vx0 <=  90; vy0 <=  20; vx1 <= 190; vy1 <=  20;
                    end
                    default: begin  // line_id=8
                        vx0 <= 190; vy0 <=  20; vx1 <= 230; vy1 <=  60;
                    end
                endcase
            end
            DRAW: begin
                draw_start <= 0;
                if (draw_done) begin
                    if (line_id == LINE_CNT-1) begin
                        state <= DONE;
                    end else begin
                        line_id <= line_id + 1;
                        state <= INIT;
                    end
                end
            end
            DONE: state <= DONE;
            default: if (start) state <= INIT;  // IDLE
        endcase
        if (rst) state <= IDLE;
    end

    draw_line #(.CORDW(CORDW)) draw_line_inst (
        .clk,
        .rst,
        .start(draw_start),
        .oe,
        .x0(vx0 * SCALE),
        .y0(vy0 * SCALE),
        .x1(vx1 * SCALE),
        .y1(vy1 * SCALE),
        .x,
        .y,
        .drawing,
        .busy(),
        .done(draw_done)
    );

    always_comb done = (state == DONE);
endmodule

Cube Demo

To see the cube demo, replace render_* with render_cube in your top module and rebuild.

Ersatz Cube

It looks like a cube, but it’s an ersatz cube. Our cube has no real depth; it cannot move in 3D space, nor can we apply realistic lighting. We’ll cover real 3D models in a future post, but for now, let’s turn our attention to the most critical shape in all of computer graphics: the triangle.

The Triangle

As you gaze upon the beautiful 4K vista from a AAA game in the 2020s, know this: it’s all triangles!

A triangle consists of three lines, so we could issue three requests to draw_line, but it’s so valuable that it deserves its own module [draw_triangle.sv]:

module draw_triangle #(parameter CORDW=16) (  // signed coordinate width
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start triangle drawing
    input  wire logic oe,              // output enable
    input  wire logic signed [CORDW-1:0] x0, y0,  // vertex 0
    input  wire logic signed [CORDW-1:0] x1, y1,  // vertex 1
    input  wire logic signed [CORDW-1:0] x2, y2,  // vertex 2
    output      logic signed [CORDW-1:0] x,  y,   // drawing position
    output      logic drawing,         // actively drawing
    output      logic busy,            // drawing request in progress
    output      logic done             // drawing is complete (high for one tick)
    );

    logic [1:0] line_id;  // current line (0, 1, or 2)
    logic line_start;     // start drawing line
    logic line_done;      // finished drawing current line?

    // current line coordinates
    logic signed [CORDW-1:0] lx0, ly0;  // point 0 position
    logic signed [CORDW-1:0] lx1, ly1;  // point 1 position

    // draw state machine
    enum {IDLE, INIT, DRAW} state;
    always_ff @(posedge clk) begin
        case (state)
            INIT: begin  // register coordinates
                state <= DRAW;
                line_start <= 1;
                if (line_id == 2'd0) begin  // (x0,y0) (x1,y1)
                    lx0 <= x0; ly0 <= y0;
                    lx1 <= x1; ly1 <= y1;
                end else if (line_id == 2'd1) begin  // (x1,y1) (x2,y2)
                    lx0 <= x1; ly0 <= y1;
                    lx1 <= x2; ly1 <= y2;
                end else begin  // (x2,y2) (x0,y0)
                    lx0 <= x2; ly0 <= y2;
                    lx1 <= x0; ly1 <= y0;
                end
            end
            DRAW: begin
                line_start <= 0;
                if (line_done) begin
                    if (line_id == 2) begin  // final line
                        state <= IDLE;
                        busy <= 0;
                        done <= 1;
                    end else begin
                        state <= INIT;
                        line_id <= line_id + 1;
                    end
                end
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin
                    state <= INIT;
                    line_id <= 0;
                    busy <= 1;
                end
            end
        endcase

        if (rst) begin
            state <= IDLE;
            line_id <= 0;
            line_start <= 0;
            busy <= 0;
            done <= 0;
        end
    end

    draw_line #(.CORDW(CORDW)) draw_line_inst (
        .clk,
        .rst,
        .start(line_start),
        .oe,
        .x0(lx0),
        .y0(ly0),
        .x1(lx1),
        .y1(ly1),
        .x,
        .y,
        .drawing,
        .busy(),
        .done(line_done)
    );
endmodule

There’s a test bench you can use to exercise the module with Vivado: [xc7/draw_triangle_tb.sv].

The Magic Number

Triangles Sim

Let’s create a render module to draw three triangles:

Replace render_* with render_triangles in your top module and rebuild.

We can draw millions of pixels per second, but drawing one per frame is fun to watch:

Drawing Triangles

Explore

I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:

  • Experiment with different lines, triangles, and colours
  • What’s the most impressive thing you can draw with a handful of straight lines?
  • We drew a cube, but how about the other Platonic solids?
  • Draw a landscape with one-point perspective (YouTube example)

What’s Next?

Next time, in 2D Shapes, we’ll draw rectangles and circles as well as triangles, then learn to colour them in. Check out demos and tutorials for more FPGA projects.