28 January 2021

Lines and Triangles

Welcome back to Exploring FPGA Graphics. It’s time to turn our attention to drawing. Most modern computer graphics come down to drawing triangles and colouring them in. So, it seems fitting to begin our tour of drawing with triangles and the straight lines that form them. This post will implement Bresenham’s line algorithm in Verilog, creating lines, triangles, and even a cube (our first sort-of 3D graphics).

In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020 and 2021. New to the series? Start with Exploring FPGA Graphics.

Updated 2021-02-26. Get in touch with @WillFlux or open an issue on GitHub.

Series Outline

  • Exploring FPGA Graphics - learn how displays work and animate simple shapes
  • FPGA Pong - race the beam to create the arcade classic
  • Hardware Sprites - fast, colourful, graphics with minimal resources
  • FPGA Ad Astra - demo with hardware sprites and animated starfields
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life
  • Lines and Triangles (this post) - drawing lines and triangles with a framebuffer

More parts to follow.

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing with these boards:

Source

The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.

An Address for Every Pixel

Let’s start by reminding ourselves how a framebuffer works. A framebuffer memory location backs every pixel on screen. To update a pixel, we convert its coordinates into a memory address using the framebuffer dimensions.

For this post we’ll be using:

  • Arty: 320 x 240 pixels
  • iCEBreaker: 160 x 120 pixels

For example, to change the colour of pixel (42,21), we write to the following address:

  • Arty: (320 * 21) + 42 = 6,762
  • iCEBreaker: (160 * 21) + 42 = 3,402

If our pixel coordinates are (px, py) and the framebuffer width is hres, our Verilog is:

always_ff @(posedge clk) begin
    pix_addr <= (hres * py) + px;
end

We could have written this with combinatorial logic, but timing is better if you store the result in a register and accept one cycle of latency.

Many Colours?

In our previous framebuffer design, we loaded an image and picked a palette to match. Now we’re drawing; we want the freedom to choose from a good range of colours. However, we also want to leave enough ram for double-buffering when we start animating, so we’ll settle for four colours (2 bit) on iCEBreaker, and 16 colours (4 bit) on Arty.

A single framebuffer requires:

  • 4 * 320 * 240 = 307,200 bits (12 of 50 BRAMs on Arty)
  • 2 * 160 * 120 = 38,400 bits (10 of 30 BRAMs on iCEBreaker)

I have selected two palettes to get you started, but use anything you like.

16 Colour Palette

For the 16 colour palette I’ve chosen the PICO-8 palette (adjusted for 12-bit output):

PICO-8 Palette

We load the 16 colours into the colour lookup table (CLUT) ROM using a file: [16_colr_4bit_palette.mem].

000  // 0 - black
235  // 1 - dark-blue
825  // 2 - dark-purple
085  // 3 - dark-green
B53  // 4 - brown
655  // 5 - dark-grey
CCC  // 6 - light-grey
FFF  // 7 - white
F05  // 8 - red
FA0  // 9 - orange
FF2  // A - yellow
0E3  // B - green
3BF  // C - blue
87A  // D - indigo
F7B  // E - pink
FCA  // F - peach

4 Colour Palette

For the iCEBreaker palette I’ve chosen four of these colours:

000  // 0 - black
FA0  // 1 - orange
0E3  // 2 - green
3BF  // 3 - blue

We load the 4 colours into the CLUT ROM using a file: [4_colr_4bit_palette.mem].

From Point to Line

We can draw a point, but we want to draw a line between two points. Bresenham’s line algorithm is the definitive way to do this, and The Beauty of Bresenham’s Algorithm has just what we need: a clearly written version of the algorithm using integers.

Here’s the C design:

void plotLine(int x0, int y0, int x1, int y1)
{
   int dx =  abs(x1-x0), sx = x0<x1 ? 1 : -1;
   int dy = -abs(y1-y0), sy = y0<y1 ? 1 : -1;
   int err = dx+dy, e2; /* error value e_xy */

   for(;;){  /* loop */
      setPixel(x0,y0);
      if (x0==x1 && y0==y1) break;
      e2 = 2*err;
      if (e2 >= dy) { err += dy; x0 += sx; } /* e_xy+e_x > 0 */
      if (e2 <= dx) { err += dx; y0 += sy; } /* e_xy+e_y < 0 */
   }
}

For the hows and whys read A Rasterizing Algorithm for Drawing Curves (PDF). Kudos to Alois Zingl.

From C to Verilog

There are two stages to the algorithm: setting the initial values and running the algorithm in the loop.

As initial values, we need the difference between the start and end coordinates, and the sign and absolute value of that difference. Your first thought might be to mess around with two’s complement to determine abs(x1-x0), but we can use a little combinatorial logic, remembering to use logic signed as needed:

logic signed [CORDW:0] dx, dy;  // a bit wider as signed
logic right, down;  // drawing direction
always_comb begin
    right = (x0 < x1);
    down  = (y0 < y1);
    dx = right ? x1 - x0 : x0 - x1;  // dx =  abs(x1 - x0)
    dy = down  ? y0 - y1 : y1 - y0;  // dy = -abs(y1 - y0)
end

NB. The sign of dy is different from dx; check the C version of the algorithm to see what I mean.

Going Loopy

Next, we could quickly bash out an always_ff block to cover the loop. But this isn’t software; there’s a trap lurking to catch the unwary.

Rewriting the C in Verilog, we could end up with the following (dubious) logic:

always_ff @(posedge clk) begin
    // ...

    if (e2 >= dy) begin
        x <= (right) ? x + 1 : x - 1;
        err <= err + dy;
    end
    if (e2 <= dx) begin
        y <= (down) ? y + 1 : y - 1;
        err <= err + dx;
    end
end

At first glance, it looks OK, and your tools will almost certainly build it without complaint. Experienced Verilog engineers are probably rolling their eyes, but it’s worth thinking through why this won’t work.

Consider what happens if (e2 >= dy) and (e2 <= dx) are both true?

x and y are incremented correctly, but err <= err + dy; is ignored. Huh?!

The <= assignment is non-blocking and non-blocking assignments happen in parallel. The Verilog standard says that if a variable has multiple non-blocking assignments, the last assignment wins.

We can’t calculate the error with just a combinatorial block either: the new error value depends on the previous one, in other words we need to maintain state. Instead, we use a combinatorial block, with blocking assignment, to calculate the change in error, then add it to the previous value in a clocked always_ff block:

logic signed [CORDW:0] err, derr;
logic movx, movy;  // move in x and/or y required
always_comb begin
    movx = (2*err >= dy);
    movy = (2*err <= dx);
    derr = movx ? dy : 0;
    if (movy) derr = derr + dx;
end

always_ff @(posedge clk) begin
    // ...

    if (movx) x <= right ? x + 1 : x - 1;
    if (movy) y <= down  ? y + 1 : y - 1;
    err <= err + derr;
end

The two blocking assignments to derr happen one after the other. Alternatively, we could have calculated next_err in our always_comb block and assigned it to err in our always_ff; I tried this approach, but it used more LUTs.

Note how we’ve also eliminated the need for e2; replacing it with 2*err in our comparisons.

Ready to Draw

With the error logic fixed we’re ready to create our line drawing module [draw_line.sv]:

module draw_line #(parameter CORDW=10) (  // FB coord width in bits
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start line drawing
    input  wire logic oe,              // output enable
    input  wire logic [CORDW-1:0] x0,  // horizontal start position
    input  wire logic [CORDW-1:0] y0,  // vertical start position
    input  wire logic [CORDW-1:0] x1,  // horizontal end position
    input  wire logic [CORDW-1:0] y1,  // vertical end position
    output      logic [CORDW-1:0] x,   // horizontal drawing position
    output      logic [CORDW-1:0] y,   // vertical drawing position
    output      logic drawing,         // line is drawing
    output      logic done             // line complete (high for one tick)
    );

    // "constant" signals
    logic signed [CORDW:0] dx, dx_c, dy, dy_c;  // a bit wider as signed
    logic right, right_c, down, down_c;  // drawing direction
    always_comb begin
        right_c = (x0 < x1);
        down_c  = (y0 < y1);
        dx_c = right_c ? x1 - x0 : x0 - x1;  // dx_c =  abs(x1 - x0)
        dy_c = down_c  ? y0 - y1 : y1 - y0;  // dy_y = -abs(y1 - y0)
    end

    // error values
    logic signed [CORDW:0] err, derr;
    logic movx, movy;  // move in x and/or y required
    always_comb begin
        movx = (2*err >= dy);
        movy = (2*err <= dx);
        derr = movx ? dy : 0;
        if (movy) derr = derr + dx;
    end

    logic in_progress = 0;  // drawing calculation in progress
    always_comb begin
        drawing = 0;
        if (in_progress && oe) drawing = 1;
    end

    enum {IDLE, INIT, DRAW} state;
    always @(posedge clk) begin
        case (state)
            DRAW: begin
                if (x == x1 && y == y1) begin
                    in_progress <= 0;
                    done <= 1;
                    state <= IDLE;
                end else if (oe) begin
                    if (movx) x <= right ? x + 1 : x - 1;
                    if (movy) y <= down  ? y + 1 : y - 1;
                    err <= err + derr;
                end
            end
            INIT: begin
                err <= dx + dy;
                x <= x0;
                y <= y0;
                in_progress <= 1;
                state <= DRAW;
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin  // register "constant" signals
                    right <= right_c;
                    down  <= down_c;
                    dx <= dx_c;
                    dy <= dy_c;
                    state <= INIT;
                end
            end
        endcase

        if (rst) begin
            in_progress <= 0;
            done <= 0;
            state <= IDLE;
        end
    end
endmodule

The pixel to draw is output as (x,y) and the start and end coordinates are input as (x0,y0) and (x1,y1) respectively. A high start signal begins drawing and drawing completion is marked by done. An output enable signal, oe, allows you to pause drawing, handy for multiplexing memory access, or for slowing down the action to make it visible.

We register the “constant” signals to improve timing; this is particularly important on the iCE40 where subtraction takes two layers of logic.

There’s a test bench you can use to exercise the module for Xilinx: [xc7/draw_line_tb.sv].

We test different sorts of lines: steep and not steep, drawn upwards, downwards, left to right, and right to left, as well as points, and the longest possible horizontal, vertical, and diagonal lines. A steep line is one in which the vertical change is larger than the horizontal.

Top of the Line

It’s time to get drawing with actual hardware.

Create a new top module and build it for your board:

This design is similar to the top modules we used in the framebuffers post.

Building the Designs
In the Lines and Triangles section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

The XC7 version of top_line looks like this:

module top_line (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    display_timings_480p timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    // vertical blanking interval (will move to display_timings soon)
    logic vbi;
    always_comb vbi = (sy == V_RES && sx == 0);

    // framebuffer (FB)
    localparam FB_WIDTH   = 320;
    localparam FB_HEIGHT  = 240;
    localparam FB_CORDW   = $clog2(FB_WIDTH);  // assumes WIDTH>=HEIGHT
    localparam FB_PIXELS  = FB_WIDTH * FB_HEIGHT;
    localparam FB_ADDRW   = $clog2(FB_PIXELS);
    localparam FB_DATAW   = 4;  // colour bits per pixel
    localparam FB_IMAGE   = "";
    localparam FB_PALETTE = "16_colr_4bit_palette.mem";

    logic fb_we;
    logic [FB_ADDRW-1:0] fb_addr_write, fb_addr_read;
    logic [FB_DATAW-1:0] fb_cidx_write;
    logic [FB_DATAW-1:0] fb_cidx_read, fb_cidx_read_1;

    bram_sdp #(
        .WIDTH(FB_DATAW),
        .DEPTH(FB_PIXELS),
        .INIT_F(FB_IMAGE)
    ) fb_inst (
        .clk_write(clk_pix),
        .clk_read(clk_pix),
        .we(fb_we),
        .addr_write(fb_addr_write),
        .addr_read(fb_addr_read),
        .data_in(fb_cidx_write),
        .data_out(fb_cidx_read_1)
    );

    // draw line in framebuffer
    logic [FB_CORDW-1:0] lx0, ly0, lx1, ly1;  // line start and end coords
    logic [FB_CORDW-1:0] px, py;  // line pixel drawing coordinates
    logic draw_start, drawing, draw_done;  // draw_line signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk_pix) begin
        draw_start <= 0;
        case (state)
            INIT: begin  // register coordinates and colour
                lx0 <=  40; ly0 <=   0;
                lx1 <= 279; ly1 <= 239;
                fb_cidx_write <= 4'h9;  // orange
                draw_start <= 1;
                state <= DRAW;
            end
            DRAW: if (draw_done) state <= DONE;
            DONE: state <= DONE;
            default: if (vbi) state <= INIT;  // IDLE
        endcase
    end

    draw_line #(.CORDW(FB_CORDW)) draw_line_inst (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(draw_start),
        .oe(1'b1),
        .x0(lx0),
        .y0(ly0),
        .x1(lx1),
        .y1(ly1),
        .x(px),
        .y(py),
        .drawing,
        .done(draw_done)
    );

    // pixel coordinate to memory address calculation takes one cycle
    always_ff @(posedge clk_pix) fb_we <= drawing;

    pix_addr #(
        .CORDW(FB_CORDW),
        .ADDRW(FB_ADDRW)
    ) pix_addr_inst (
        .clk(clk_pix),
        .hres(FB_WIDTH),
        .px,
        .py,
        .pix_addr(fb_addr_write)
    );

    // linebuffer (LB)
    localparam LB_SCALE = 2;       // scale (horizontal and vertical)
    localparam LB_LEN = FB_WIDTH;  // line length matches framebuffer
    localparam LB_BPC = 4;         // bits per colour channel

    // LB output to display
    logic lb_en_out;
    always_comb lb_en_out = de;  // Use 'de' for entire frame

    // Load data from FB into LB
    logic lb_data_req;  // LB requesting data
    logic [$clog2(LB_LEN+1)-1:0] cnt_h;  // count pixels in line to read
    always_ff @(posedge clk_pix) begin
        if (vbi) fb_addr_read <= 0;   // new frame
        if (lb_data_req && sy != V_RES-1) begin  // load next line of data...
            cnt_h <= 0;                          // ...if not on last line
        end else if (cnt_h < LB_LEN) begin  // advance to start of next line
            cnt_h <= cnt_h + 1;
            fb_addr_read <= fb_addr_read == FB_PIXELS-1 ? 0 : fb_addr_read + 1;
        end
    end

    // FB BRAM and CLUT pipeline adds three cycles of latency
    logic lb_en_in_2, lb_en_in_1, lb_en_in;
    always_ff @(posedge clk_pix) begin
        lb_en_in_2 <= (cnt_h < LB_LEN);
        lb_en_in_1 <= lb_en_in_2;
        lb_en_in <= lb_en_in_1;
    end

    // LB colour channels
    logic [LB_BPC-1:0] lb_in_0, lb_in_1, lb_in_2;
    logic [LB_BPC-1:0] lb_out_0, lb_out_1, lb_out_2;

    linebuffer #(
        .WIDTH(LB_BPC),     // data width of each channel
        .LEN(LB_LEN),       // length of line
        .SCALE(LB_SCALE)    // scaling factor (>=1)
        ) lb_inst (
        .clk_in(clk_pix),       // input clock
        .clk_out(clk_pix),      // output clock
        .data_req(lb_data_req), // request input data (clk_in)
        .en_in(lb_en_in),       // enable input (clk_in)
        .en_out(lb_en_out),     // enable output (clk_out)
        .vbi,                   // start of vertical blanking interval (clk_out)
        .din_0(lb_in_0),        // data in (clk_in)
        .din_1(lb_in_1),
        .din_2(lb_in_2),
        .dout_0(lb_out_0),      // data out (clk_out)
        .dout_1(lb_out_1),
        .dout_2(lb_out_2)
    );

    // improve timing with register between BRAM and async ROM
    always @(posedge clk_pix) begin
        fb_cidx_read <= fb_cidx_read_1;
    end

    // colour lookup table (ROM) 16x12-bit entries
    logic [11:0] clut_colr;
    rom_async #(
        .WIDTH(12),
        .DEPTH(16),
        .INIT_F(FB_PALETTE)
    ) clut (
        .addr(fb_cidx_read),
        .data(clut_colr)
    );

    // map colour index to palette using CLUT and read into LB
    always_ff @(posedge clk_pix) begin
        {lb_in_2, lb_in_1, lb_in_0} <= clut_colr;
    end

    // LB output adds one cycle of latency - need to correct display signals
    logic hsync_1, vsync_1, lb_en_out_1;
    always_ff @(posedge clk_pix) begin
        hsync_1 <= hsync;
        vsync_1 <= vsync;
        lb_en_out_1 <= lb_en_out;
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync_1;
        vga_vsync <= vsync_1;
        vga_r <= lb_en_out_1 ? lb_out_2 : 4'h0;
        vga_g <= lb_en_out_1 ? lb_out_1 : 4'h0;
        vga_b <= lb_en_out_1 ? lb_out_0 : 4'h0;
    end
endmodule

Note how we set the line colour using the hex index value from the palette: 0x9 is orange.

The module for calculating the pixel address looks like this [pix_addr.sv]:

module pix_addr #(
    parameter CORDW=10,  // framebuffer coordinate width in bits
    parameter ADDRW=13   // width of memory address bus
    ) (
    input  wire logic clk,                  // clock
    input  wire logic [CORDW-1:0] hres,     // horizontal framebuffer resolution
    input  wire logic [CORDW-1:0] px,       // horizontal pixel position
    input  wire logic [CORDW-1:0] py,       // vertical pixel position
    output      logic [ADDRW-1:0] pix_addr  // pixel address
    );

    always_ff @(posedge clk) begin
        pix_addr <= (hres * py) + px;
    end
endmodule

pix_addr has only one line of logic, and a simple one at that, so a module seems like overkill. However, when you use a more complex memory set up, it helps to have this logic defined in one place.

Calculating the pixel address takes one cycle, so we need to delay write enable for the framebuffer by one cycle in the top modules: always_ff @(posedge clk_pix) fb_we <= drawing;.

That Ain’t No Cube

If we can draw one line, we can draw many! Let’s draw a cube as you’ve probably doodled on paper; this requires nine lines. To see how the drawing works, we’ve wired the drawing output enable to the animate signal. Each frame, one new pixel is drawn, with a delay of 300 frames to allow the monitor time to start showing the image.

The cube drawing part looks like this for iCE40:

    // draw cube in framebuffer
    localparam LINE_CNT=9;
    logic [3:0] line_id;  // line identifier
    logic [FB_CORDW-1:0] lx0, ly0, lx1, ly1;  // line start and end coords
    logic [FB_CORDW-1:0] px, py;  // line pixel drawing coordinates
    logic draw_start, drawing, draw_done;  // draw_line signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk_pix) begin
        draw_start <= 0;
        case (state)
            INIT: begin  // register coordinates and colour
                draw_start <= 1;
                state <= DRAW;
                fb_cidx_write <= 2'h2;  // green
                case (line_id)
                    4'd0: begin
                        lx0 <=  65; ly0 <=  45; lx1 <= 115; ly1 <=  45;
                    end
                    4'd1: begin
                        lx0 <= 115; ly0 <=  45; lx1 <= 115; ly1 <=  95;
                    end
                    4'd2: begin
                        lx0 <= 115; ly0 <=  95; lx1 <=  65; ly1 <=  95;
                    end
                    4'd3: begin
                        lx0 <=  65; ly0 <=  95; lx1 <=  65; ly1 <=  45;
                    end
                    4'd4: begin
                        lx0 <=  65; ly0 <=  95; lx1 <=  45; ly1 <=  75;
                    end
                    4'd5: begin
                        lx0 <=  45; ly0 <=  75; lx1 <=  45; ly1 <=  25;
                    end
                    4'd6: begin
                        lx0 <=  45; ly0 <=  25; lx1 <=  65; ly1 <=  45;
                    end
                    4'd7: begin
                        lx0 <=  45; ly0 <=  25; lx1 <=  95; ly1 <=  25;
                    end
                    4'd8: begin
                        lx0 <=  95; ly0 <=  25; lx1 <= 115; ly1 <=  45;
                    end
                    default: begin
                        lx0 <=   0; ly0 <=   0; lx1 <=   0; ly1 <=   0;
                    end
                endcase
            end
            DRAW: if (draw_done) begin
                if (line_id == LINE_CNT-1) begin
                    state <= DONE;
                end else begin
                    line_id <= line_id + 1;
                    state <= INIT;
                end
            end
            DONE: state <= DONE;
            default: if (vbi) state <= INIT;  // IDLE
        endcase
    end

    // control drawing output enable - wait 300 frames, then 1 pixel/frame
    localparam DRAW_WAIT = 300;
    logic [$clog2(DRAW_WAIT)-1:0] cnt_draw_wait;
    logic draw_oe;
    always_ff @(posedge clk_pix) begin
        draw_oe <= 0;
        if (vbi) begin
            if (cnt_draw_wait != DRAW_WAIT-1) begin
                cnt_draw_wait <= cnt_draw_wait + 1;
            end else draw_oe <= 1;
        end
    end

    draw_line #(.CORDW(FB_CORDW)) draw_line_inst (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(draw_start),
        .oe(draw_oe),
        .x0(lx0),
        .y0(ly0),
        .x1(lx1),
        .y1(ly1),
        .x(px),
        .y(py),
        .drawing,
        .done(draw_done)
    );

Ersatz Cube

It looks like a cube, but it’s an ersatz cube. Our cube has no real depth; it cannot move in 3D space, nor can we apply realistic lighting. We’ll cover real 3D models in a later post, but for now let’s turn our attention to the most critical shape in computer graphics: the triangle.

The Triangle

As you gaze upon the beautiful 4K vista from a AAA game in 2021, know this: it’s all triangles!

A triangle consists of three lines so that we could issue three draw_line commands, but it’s so useful, it deserves its own module [draw_triangle.sv]:

module draw_triangle #(parameter CORDW=10) (  // FB coord width in bits
    input  wire logic clk,             // clock
    input  wire logic rst,             // reset
    input  wire logic start,           // start triangle drawing
    input  wire logic oe,              // output enable
    input  wire logic [CORDW-1:0] x0,  // vertex 0 - horizontal position
    input  wire logic [CORDW-1:0] y0,  // vertex 0 - vertical position
    input  wire logic [CORDW-1:0] x1,  // vertex 1 - horizontal position
    input  wire logic [CORDW-1:0] y1,  // vertex 1 - vertical position
    input  wire logic [CORDW-1:0] x2,  // vertex 2 - horizontal position
    input  wire logic [CORDW-1:0] y2,  // vertex 2 - vertical position
    output      logic [CORDW-1:0] x,   // horizontal drawing position
    output      logic [CORDW-1:0] y,   // vertical drawing position
    output      logic drawing,         // triangle is drawing
    output      logic done             // triangle complete (high for one tick)
    );

    enum {IDLE, INIT, DRAW} state;

    localparam CNT_LINE = 3;  // triangle has three lines
    logic [$clog2(CNT_LINE)-1:0] line_id;  // current line
    logic line_start;  // start drawing line
    logic line_done;   // finished drawing current line?

    // line coordinates
    logic [CORDW-1:0] lx0, ly0;  // current line start position
    logic [CORDW-1:0] lx1, ly1;  // current line end position

    always @(posedge clk) begin
        line_start <= 0;
        case (state)
            INIT: begin  // register coordinates
                if (line_id == 2'd0) begin  // (x0,y0) -> (x1,y1)
                    lx0 <= x0; ly0 <= y0;
                    lx1 <= x1; ly1 <= y1;
                end else if (line_id == 2'd1) begin  // (x1,y1) -> (x2,y2)
                    lx0 <= x1; ly0 <= y1;
                    lx1 <= x2; ly1 <= y2;
                end else begin  // (x2,y2) -> (x0,y0)
                    lx0 <= x2; ly0 <= y2;
                    lx1 <= x0; ly1 <= y0;
                end
                state <= DRAW;
                line_start <= 1;
            end
            DRAW: begin
                if (line_done) begin
                    if (line_id == CNT_LINE-1) begin
                        done <= 1;
                        state <= IDLE;
                    end else begin
                        line_id <= line_id + 1;
                        state <= INIT;
                    end
                end
            end
            default: begin  // IDLE
                done <= 0;
                if (start) begin
                    line_id <= 0;
                    state <= INIT;
                end
            end
        endcase

        if (rst) begin
            line_id <= 0;
            line_start <= 0;
            done <= 0;
            state <= IDLE;
        end
    end

    draw_line #(.CORDW(CORDW)) draw_line_inst (
        .clk,
        .rst,
        .start(line_start),
        .oe,
        .x0(lx0),
        .y0(ly0),
        .x1(lx1),
        .y1(ly1),
        .x,
        .y,
        .drawing,
        .done(line_done)
    );
endmodule

There’s a test bench you can use to exercise the module for Xilinx: [xc7/draw_triangle_tb.sv].

And a similar top module can be used to draw a few triangles:

The triangle drawing part looks like this for XC7:

    // draw shapes in framebuffer
    localparam SHAPE_CNT=3;
    logic [1:0] shape_id;  // shape identifier
    logic [FB_CORDW-1:0] tx0, ty0, tx1, ty1, tx2, ty2;  // triangle coords
    logic [FB_CORDW-1:0] px, py;  // triangle pixel drawing coordinates
    logic draw_start, drawing, draw_done;  // draw_line signals

    // draw state machine
    enum {IDLE, INIT, DRAW, DONE} state;
    initial state = IDLE;  // needed for Yosys
    always @(posedge clk_pix) begin
        draw_start <= 0;
        case (state)
            INIT: begin  // register coordinates and colour
                draw_start <= 1;
                state <= DRAW;
                case (shape_id)
                    2'd0: begin
                        tx0 <=  20; ty0 <=  60;
                        tx1 <=  60; ty1 <= 180;
                        tx2 <= 110; ty2 <=  90;
                        fb_cidx_write <= 4'h2;  // dark purple
                    end
                    2'd1: begin
                        tx0 <=  70; ty0 <= 200;
                        tx1 <= 240; ty1 <= 100;
                        tx2 <= 170; ty2 <=  10;
                        fb_cidx_write <= 4'hC;  // blue
                    end
                    2'd2: begin
                        tx0 <=  60; ty0 <=  30;
                        tx1 <= 300; ty1 <=  80;
                        tx2 <= 160; ty2 <= 220;
                        fb_cidx_write <= 4'h9;  // orange
                    end
                    default: begin  // should never occur
                        tx0 <=   10; ty0 <=   10;
                        tx1 <=   10; ty1 <=   30;
                        tx2 <=   20; ty2 <=   20;
                        fb_cidx_write <= 4'h7;  // white
                    end
                endcase
            end
            DRAW: if (draw_done) begin
                if (shape_id == SHAPE_CNT-1) begin
                    state <= DONE;
                end else begin
                    shape_id <= shape_id + 1;
                    state <= INIT;
                end
            end
            DONE: state <= DONE;
            default: if (vbi) state <= INIT;  // IDLE
        endcase
    end

    // control drawing output enable - wait 300 frames, then 1 pixel/frame
    localparam DRAW_WAIT = 300;
    logic [$clog2(DRAW_WAIT)-1:0] cnt_draw_wait;
    logic draw_oe;
    always_ff @(posedge clk_pix) begin
        draw_oe <= 0;
        if (vbi) begin
            if (cnt_draw_wait != DRAW_WAIT-1) begin
                cnt_draw_wait <= cnt_draw_wait + 1;
            end else draw_oe <= 1;
        end
    end

    draw_triangle #(.CORDW(FB_CORDW)) draw_triangle_inst (
        .clk(clk_pix),
        .rst(!clk_locked),
        .start(draw_start),
        .oe(draw_oe),
        .x0(tx0),
        .y0(ty0),
        .x1(tx1),
        .y1(ty1),
        .x2(tx2),
        .y2(ty2),
        .x(px),
        .y(py),
        .drawing,
        .done(draw_done)
    );

We can draw millions of pixels per second, but drawing 60 per second (one per frame) is fun to watch:

Drawing Triangles

Explore

I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:

  • Experiment with different lines, triangles, and colours
    • What’s the most impressive thing you can draw with a handful of straight lines?
  • We drew a cube, but how about the other Platonic solids?
  • Draw nested squares of different colours
    • Cycle the colours every frame to create the feeling of flying down a tunnel
  • Draw a landscape with one point perspective (YouTube example)

Next Time

Next time we’ll be expanding our repetoir of shapes and animating them using double buffering. Stay tuned.

Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.

©2021 Will Green, Project F