Lines and Triangles
Welcome back to Exploring FPGA Graphics. It’s time to turn our attention to drawing. Most modern computer graphics come down to drawing triangles and colouring them in. So, it seems fitting to begin our drawing tour with triangles and the straight lines that form them. This post will implement Bresenham’s line algorithm in Verilog and create lines, triangles, and even a cube (our first sort-of 3D).
In this series, we learn about graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how screens work, play Pong, create starfields and sprites, paint Michelangelo’s David, draw lines and triangles, and animate characters and shapes. New to the series? Start with Beginning FPGA Graphics.
Series Outline
- Beginning FPGA Graphics - video signals and basic graphics
- Racing the Beam - simple demo effects with minimal logic
- FPGA Pong - recreate the classic arcade on an FPGA
- Display Signals - revisit display signals and meet colour palettes
- Hardware Sprites - fast, colourful graphics for games
- Framebuffers - bitmap graphics featuring Michelangelo’s David
- Lines and Triangles (this post) - drawing lines and triangles
- 2D Shapes - filled shapes and simple pictures
- Animated Shapes - animation and double-buffering
Requirements
You should be to run these designs on any recent FPGA board. I include everything you need for the iCEBreaker with 12-Bit DVI Pmod, Digilent Arty A7-35T with Pmod VGA, Digilent Nexys Video with on-board HDMI output, and Verilator Simulation with SDL. See requirements from Beginning FPGA Graphics for more details.
This post builds on Framebuffers; I recommend reading that post first if you haven’t already.
New Screen Resolution
Choosing your graphics resolution and colour depth are critical decisions in any graphics design. We want a high resolution for this post, but we want to continue using on-FPGA ram for simplicity. Adopting a 16:9 aspect ratio makes scaling on modern displays simpler while keeping memory usage manageable (320x180 < 216).
320 x 180 in 16 colours
- 320 x 180 has 57,600 pixels
- 16 colours require four bits per pixel
- 256 kb (32 kilobytes) buffer required
- Scale up 2x for 640x480 with letterbox
- Scale up 4x for 1280x720 fullscreen
iCEBreaker: 160x90 in 4 colours
- 160 x 90 has 14,400 pixels
- 4 colours require two bits per pixel
- 32 Kb (4 kilobytes) buffer required
- Scale up 4x for 640x480 with letterbox
- Scale up 8x for 1280x720 fullscreen
We use a lower resolution on iCEBreaker to continue storing the framebuffer in BRAM. iCE40UP SPRAM is large enough to handle 320x180 but I haven’t added support yet [issue #133]. If you’re already familiar with SPRAM, feel free to use the 320x180 designs with iCEBreaker: the new render modules (discussed later) make this straightforward.
An Address for Every Pixel
A framebuffer is two-dimensional, with an X and Y coordinate for every pixel. Memory addresses are one-dimensional. We need to map a framebuffer coordinate to a memory address [bitmap_addr.sv]:
module bitmap_addr #(
parameter CORDW=16, // signed coordinate width (bits)
parameter ADDRW=24 // address width (bits)
) (
input wire logic clk, // clock
input wire logic signed [CORDW-1:0] bmpw, // bitmap width
input wire logic signed [CORDW-1:0] bmph, // bitmap height
input wire logic signed [CORDW-1:0] x, // horizontal pixel coordinate
input wire logic signed [CORDW-1:0] y, // vertical pixel coordinate
input wire logic signed [CORDW-1:0] offx, // horizontal offset
input wire logic signed [CORDW-1:0] offy, // vertical offset
output logic [ADDRW-1:0] addr, // pixel memory address
output logic clip // pixel coordinate outside bitmap
);
logic signed [CORDW-1:0] addr_y1, addr_x1, addr_x2;
logic [ADDRW-1:0] addr_mul;
logic clip_t1; // clip check temporary
always_ff @(posedge clk) begin
// step 1
addr_y1 <= y + offy;
addr_x1 <= x + offx;
// step 2
addr_mul <= bmpw * addr_y1;
addr_x2 <= addr_x1;
clip_t1 <= (addr_x1 < 0 || addr_x1 > bmpw-1 || addr_y1 < 0 || addr_y1 > bmph-1);
// step 3
clip <= clip_t1;
addr <= addr_mul + addr_x2;
end
endmodule
This module calculates the memory address for the position (x,y)
with an optional offset (offx,offy)
. The offset lets you quickly move graphics without adjusting your drawing logic. The clip
output signals when the requested coordinate is outside the bitmap.
The address calculation has a latency of three cycles; we will need to account for this when drawing.
Screen Coordinates: Up and Down
Our coordinate system has the origin(0,0)
at the top-left of the screen, and the Y-coordinate increases down the screen. Many 3D systems, such as OpenGL, have the origin at the bottom-left, and the Y-coordinate increases up the screen.
Addressing Performance
We could have performed the calculation in one step:
addr <= bmpw * (y + offy) + x + offx;
But performance is much worse than the multi-step design. The address calculation becomes the performance bottleneck for our drawing designs.
This is a valuable reminder of how hardware design differs from software design.
The one-step calculation generates a hardware design that performs two additions and multiplication in a single clock cycle. Other things being equal, the more complex a single step, the lower the maximum frequency.
With software, the compiler generates a series of CPU instructions from your source code. Whether you write the calculation as one complex statement or multiple simpler statements has little to no impact on the code generated by the compiler.
From Point to Line
We can now draw a point, but we commonly want to draw a line between two points. Bresenham’s line algorithm is the definitive way to do this, and The Beauty of Bresenham’s Algorithm has just what we need: a clearly written version of the algorithm using integers.
Here’s the C design:
void plotLine(int x0, int y0, int x1, int y1)
{
int dx = abs(x1-x0), sx = x0<x1 ? 1 : -1;
int dy = -abs(y1-y0), sy = y0<y1 ? 1 : -1;
int err = dx+dy, e2; /* error value e_xy */
for(;;){ /* loop */
setPixel(x0,y0);
if (x0==x1 && y0==y1) break;
e2 = 2*err;
if (e2 >= dy) { err += dy; x0 += sx; } /* e_xy+e_x > 0 */
if (e2 <= dx) { err += dx; y0 += sy; } /* e_xy+e_y < 0 */
}
}
For the hows and whys, read A Rasterizing Algorithm for Drawing Curves (PDF). Kudos to Alois Zingl.
From C to Verilog
There are two stages: calculating the initial values and running the algorithm in the loop.
As initial values, we need the difference between the start and end coordinates and the sign and absolute value of that difference. We can use a little combinational logic to determine the direction without having to calculate absolute values explicitly:
logic signed [CORDW:0] dx, dy; // a bit wider as signed
logic right, down; // drawing direction
always_comb begin
right = (x0 < x1);
down = (y0 < y1);
dx = right ? x1 - x0 : x0 - x1; // dx = abs(x1 - x0)
dy = down ? y0 - y1 : y1 - y0; // dy = -abs(y1 - y0)
end
NB. The sign of dy
is different from dx
; check the C version of the algorithm to see what I mean.
Going Loopy
We can quickly bash out an always_ff
block to cover the loop. But this isn’t software; a trap lurks to catch the unwary.
Rewriting the C directly in Verilog, we get the following dubious logic:
always_ff @(posedge clk) begin
// ...
if (e2 >= dy) begin
x <= (right) ? x + 1 : x - 1;
err <= err + dy;
end
if (e2 <= dx) begin
y <= (down) ? y + 1 : y - 1;
err <= err + dx;
end
end
At first glance, it looks OK, and your tools will almost certainly build it without complaint. Experienced Verilog engineers are probably rolling their eyes, but it’s worth discussing when and why this fails.
Consider what happens if (e2 >= dy)
and (e2 <= dx)
are both true?
x
and y
are incremented correctly, but err <= err + dy;
is ignored. Huh?!
The <=
assignment is non-blocking, and non-blocking assignments happen in parallel. The Verilog standard says that if a variable has multiple non-blocking assignments, the last assignment wins.
We can’t calculate the error with just a combinatorial block either: the new error value depends on the previous one (we need to maintain state). Instead, we use a combinational block, with blocking assignment, to calculate the change in error, then add it to the previous value in a clocked always_ff
block:
logic signed [CORDW:0] err, derr;
logic movx, movy; // move in x and/or y required
always_comb begin
movx = (2*err >= dy);
movy = (2*err <= dx);
derr = movx ? dy : 0;
if (movy) derr = derr + dx;
end
always_ff @(posedge clk) begin
// ...
if (movx) x <= right ? x + 1 : x - 1;
if (movy) y <= down ? y + 1 : y - 1;
err <= err + derr;
end
The two blocking assignments to derr
happen one after the other.
Note how we’ve also eliminated the need for e2
, replacing it with 2*err
in our comparisons.
Our first attempt at a line drawing module:
module draw_line #(parameter CORDW=16) ( // framebuffer coord width in bits
input wire logic clk, // clock
input wire logic rst, // reset
input wire logic start, // start line drawing
input wire logic signed [CORDW-1:0] x0, // point 0 - horizontal position
input wire logic signed [CORDW-1:0] y0, // point 0 - vertical position
input wire logic signed [CORDW-1:0] x1, // point 1 - horizontal position
input wire logic signed [CORDW-1:0] y1, // point 1 - vertical position
output logic signed [CORDW-1:0] x, // horizontal drawing position
output logic signed [CORDW-1:0] y, // vertical drawing position
output logic drawing, // line is drawing
output logic done // line complete (high for one tick)
);
// line properties
logic signed [CORDW:0] dx, dy; // a bit wider as signed
logic right, down; // drawing direction
always_comb begin
right = (x0 < x1);
down = (y0 < y1);
dx = right ? x1 - x0 : x0 - x1; // dx = abs(x1 - x0)
dy = down ? y0 - y1 : y1 - y0; // dy = -abs(y1 - y0)
end
// error values
logic signed [CORDW:0] err, derr;
logic movx, movy; // move in x and/or y required
always_comb begin
movx = (2*err >= dy);
movy = (2*err <= dx);
derr = movx ? dy : 0;
if (movy) derr = derr + dx;
end
// draw state machine
enum {IDLE, DRAW} state; // we're either idle or drawing
always_comb drawing = (state == DRAW);
always_ff @(posedge clk) begin
case (state)
DRAW: begin
if (x == x1 && y == y1) begin
done <= 1;
state <= IDLE;
end else begin
if (movx) x <= right ? x + 1 : x - 1;
if (movy) y <= down ? y + 1 : y - 1;
err <= err + derr;
end
end
default: begin // IDLE
done <= 0;
if (start) begin
err <= dx + dy;
x <= x0;
y <= y0;
state <= DRAW;
end
end
endcase
if (rst) begin
done <= 0;
state <= IDLE;
end
end
endmodule
We’ve got a good start here, but our module has significant problems we should tackle.
Oh dear! I shall be too late!
Line drawing crops up all over the place; if it’s slow, it’ll be a significant performance bottleneck.
Our current line drawing design makes direct use of relatively complex combinational logic. For example, we use movy
to control our vertical drawing. movy
depends on dx
, which depends on right
. All these signals are purely combinational, with nothing stored in registers. Unsurprisingly, my tests showed this path was the limiting factor for line drawing speed.
Our first improvement is straightforward: we register dx
and dy
in an always_ff
block. Even better, because dx
and dy
don’t change for a given line, we only have to do this once and don’t suffer a latency penalty:
always_comb begin
right = (x0 < x1);
down = (y0 < y1);
end
always_ff @(posedge clk) begin
dx <= right ? x1 - x0 : x0 - x1; // dx = abs(x1 - x0)
dy <= down ? y0 - y1 : y1 - y0; // dy = -abs(y1 - y0)
end
We can further improve timing by removing the combinational derr
and using dx
and dy
directly in the main always_ff
block:
DRAW: begin
if (x == xb && y == yb) begin
done <= 1;
state <= IDLE;
end else begin
if (movx) begin
x <= right ? x + 1 : x - 1;
err <= err + dy;
end
if (movy) begin
y <= y + 1; // always down
err <= err + dx;
end
if (movx && movy) begin
x <= right ? x + 1 : x - 1;
y <= y + 1; // always down
err <= err + dy + dx;
end
end
end
This Verilog seems overly verbose compared to the combinational derr
, but the timing is much better on simpler FPGAs, such as the iCE40. For example, the cube design we will discuss shortly improves from 22 MHz to 28 MHz with these changes.
With experience, you’ll get a feel for when registering a signal makes sense. For example, back in 2020, I learnt that iCE40 subtraction takes two layers of logic, making registering the initial line values all the more valuable. Both Vivado (Arty) and nextpnr (iCEBreaker) provide timing reports to help you improve the performance of your designs.
Breaking Symmetry
Bresenham’s line algorithm is not symmetrical: drawing from (x0,y0)
to (x1,y1)
is not necessarily the same as drawing from (x1,y1)
to (x0,y0)
.
For example, I drew the triangle (2,2) (6,2) (4,6) clockwise then anticlockwise:
Variations in rendering may not matter if you’re drawing a single shape, but what happens if we draw two shapes next to each other? We don’t want any gaps between the shapes. To ensure one unique rendering of the line (x0,y0)
to (x1,y1)
, we need a consistent way to order the points. I have chosen to draw down the screen; that is, with the y-coordinate increasing. To achieve this, we look at the y-coordinates and swap them if y0
is greater than y1
.
That leaves horizontal lines: the y-coordinate is the same for both points in this case. However, it does not matter which direction we draw horizontal lines: Bresenham’s line algorithm is the same in both directions.
The swapping logic looks like this:
// line properties
logic swap; // swap points to ensure y1 >= y0
logic right; // drawing direction
logic signed [CORDW-1:0] xa, ya; // start point
logic signed [CORDW-1:0] xb, yb; // end point
logic signed [CORDW-1:0] x_end, y_end; // register end point
always_comb begin
swap = (y0 > y1); // swap points if y0 is below y1
xa = swap ? x1 : x0;
xb = swap ? x0 : x1;
ya = swap ? y1 : y0;
yb = swap ? y0 : y1;
end
If we use these new combinational signals directly, our timing will suffer. To avoid this, we can register the end coordinate and drawing direction:
always_ff @(posedge clk) begin
// ...
x_end <= xb;
y_end <= yb;
// ...
right <= (xa < xb); // draw right to left?
Ready to Draw
We’re now ready to use our improved line drawing module [draw_line.sv]:
module draw_line #(parameter CORDW=16) ( // signed coordinate width
input wire logic clk, // clock
input wire logic rst, // reset
input wire logic start, // start line drawing
input wire logic oe, // output enable
input wire logic signed [CORDW-1:0] x0, y0, // point 0
input wire logic signed [CORDW-1:0] x1, y1, // point 1
output logic signed [CORDW-1:0] x, y, // drawing position
output logic drawing, // actively drawing
output logic busy, // drawing request in progress
output logic done // drawing is complete (high for one tick)
);
// line properties
logic swap; // swap points to ensure y1 >= y0
logic right; // drawing direction
logic signed [CORDW-1:0] xa, ya; // start point
logic signed [CORDW-1:0] xb, yb; // end point
logic signed [CORDW-1:0] x_end, y_end; // register end point
always_comb begin
swap = (y0 > y1); // swap points if y0 is below y1
xa = swap ? x1 : x0;
xb = swap ? x0 : x1;
ya = swap ? y1 : y0;
yb = swap ? y0 : y1;
end
// error values
logic signed [CORDW:0] err; // a bit wider as signed
logic signed [CORDW:0] dx, dy;
logic movx, movy; // horizontal/vertical move required
always_comb begin
movx = (2*err >= dy);
movy = (2*err <= dx);
end
// draw state machine
enum {IDLE, INIT_0, INIT_1, DRAW} state;
always_comb drawing = (state == DRAW && oe);
always_ff @(posedge clk) begin
case (state)
DRAW: begin
if (oe) begin
if (x == x_end && y == y_end) begin
state <= IDLE;
busy <= 0;
done <= 1;
end else begin
if (movx) begin
x <= right ? x + 1 : x - 1;
err <= err + dy;
end
if (movy) begin
y <= y + 1; // always down
err <= err + dx;
end
if (movx && movy) begin
x <= right ? x + 1 : x - 1;
y <= y + 1;
err <= err + dy + dx;
end
end
end
end
INIT_0: begin
state <= INIT_1;
dx <= right ? xb - xa : xa - xb; // dx = abs(xb - xa)
dy <= ya - yb; // dy = -abs(yb - ya)
end
INIT_1: begin
state <= DRAW;
err <= dx + dy;
x <= xa;
y <= ya;
x_end <= xb;
y_end <= yb;
end
default: begin // IDLE
done <= 0;
if (start) begin
state <= INIT_0;
right <= (xa < xb); // draw right to left?
busy <= 1;
end
end
endcase
if (rst) begin
state <= IDLE;
busy <= 0;
done <= 0;
end
end
endmodule
The pixel to draw is output as (x,y)
, and the line coordinates are input as (x0,y0)
and (x1,y1)
. A high start
signal begins drawing, and drawing completion is marked by done
(high for one tick). An output enable signal, oe
, allows you to pause drawing, which is handy for multiplexing memory access or slowing down the action to make it visible.
There’s a test bench you can use to exercise the module with Vivado: [xc7/draw_line_tb.sv].
We test several lines: steep and not steep, drawn upwards, downwards, left to right, and right to left, points, and the longest possible horizontal, vertical, and diagonal lines. A steep line is one in which the vertical change is larger than the horizontal.
Render Module
In previous posts, we created a separate top module for every demo. Now that our designs are getting a little more complex, breaking each demo rendering into a module makes sense. That way, we can reuse the same top module for multiple demos.
To support a wide range of FPGAs, I’ve created two sets of rendering modules:
- 320x180 in 16 colours (256 kb per buffer) - used for Arty and Verilator
- 160x90 in 4 colours (32 kb per buffer) - used for iCEBreaker
Our first rendering module draws a single diagonal line:
The 16-colour 320x180 version is shown below:
module render_line #(
parameter CORDW=16, // signed coordinate width (bits)
parameter CIDXW=4, // colour index width (bits)
parameter SCALE=1 // drawing scale: 1=320x180, 2=640x360, 4=1280x720
) (
input wire logic clk, // clock
input wire logic rst, // reset
input wire logic oe, // output enable
input wire logic start, // start drawing
output logic signed [CORDW-1:0] x, // horizontal draw position
output logic signed [CORDW-1:0] y, // vertical draw position
output logic [CIDXW-1:0] cidx, // pixel colour
output logic drawing, // actively drawing
output logic done // drawing is complete (high for one tick)
);
logic signed [CORDW-1:0] vx0, vy0, vx1, vy1; // line coords
logic draw_start, draw_done; // drawing signals
// draw state machine
enum {IDLE, INIT, DRAW, DONE} state;
always_ff @(posedge clk) begin
case (state)
INIT: begin // register coordinates and colour
vx0 <= 70; vy0 <= 0;
vx1 <= 249; vy1 <= 179;
cidx <= 'h3; // colour index
draw_start <= 1;
state <= DRAW;
end
DRAW: begin
draw_start <= 0;
if (draw_done) state <= DONE;
end
DONE: state <= DONE;
default: if (start) state <= INIT; // IDLE
endcase
if (rst) state <= IDLE;
end
draw_line #(.CORDW(CORDW)) draw_line_inst (
.clk,
.rst,
.start(draw_start),
.oe,
.x0(vx0 * SCALE),
.y0(vy0 * SCALE),
.x1(vx1 * SCALE),
.y1(vy1 * SCALE),
.x,
.y,
.drawing,
.busy(),
.done(draw_done)
);
always_comb done = (state == DONE);
endmodule
The render module takes a SCALE
parameter with which you can scale up the drawing. For example, if you have a 640x360 framebuffer, set SCALE=2
to fill the screen with the 320x180/render_line.sv
design.
Demo Driver
It’s time to get drawing with actual hardware.
Plugging our rendering module into the top design from framebuffers, we get our line demo:
- iCEBreaker (iCE40): ice40/top_demo.sv
- Arty (XC7): xc7/top_demo.sv
- Nexys Video (XC7): xc7-dvi/top_demo.sv
- Verilator Sim: sim/top_demo.sv
Building the Designs
In the Lines and Triangles section of the git repo, you’ll find the design files, a makefile for iCEBreaker and Verilator, and a Vivado project for Xilinx-based boards. There are also build instructions for boards and simulations.
The Verilator version of top_demo
looks like this:
module top_demo #(parameter CORDW=16) ( // signed coordinate width (bits)
input wire logic clk_pix, // pixel clock
input wire logic rst_pix, // sim reset
output logic signed [CORDW-1:0] sdl_sx, // horizontal SDL position
output logic signed [CORDW-1:0] sdl_sy, // vertical SDL position
output logic sdl_de, // data enable (low in blanking interval)
output logic sdl_frame, // high at start of frame
output logic [7:0] sdl_r, // 8-bit red
output logic [7:0] sdl_g, // 8-bit green
output logic [7:0] sdl_b // 8-bit blue
);
// system clock is the same as pixel clock in simulation
logic clk_sys, rst_sys;
always_comb begin
clk_sys = clk_pix;
rst_sys = rst_pix;
end
// display sync signals and coordinates
logic signed [CORDW-1:0] sx, sy;
logic de, frame, line;
display_480p #(.CORDW(CORDW)) display_inst (
.clk_pix,
.rst_pix,
.sx,
.sy,
.hsync(),
.vsync(),
.de,
.frame,
.line
);
// library resource path
localparam LIB_RES = "../../../lib/res";
// colour parameters
localparam CHANW = 4; // colour channel width (bits)
localparam COLRW = 3*CHANW; // colour width: three channels (bits)
localparam CIDXW = 4; // colour index width (bits)
localparam BG_COLR = 'h137; // background colour
localparam PAL_FILE = {LIB_RES,"/palettes/sweetie16_4b.mem"}; // palette file
// framebuffer (FB)
localparam FB_WIDTH = 320; // framebuffer width in pixels
localparam FB_HEIGHT = 180; // framebuffer height in pixels
localparam FB_SCALE = 2; // framebuffer display scale (1-63)
localparam FB_OFFX = 0; // horizontal offset
localparam FB_OFFY = 60; // vertical offset
localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT; // total pixels in buffer
localparam FB_ADDRW = $clog2(FB_PIXELS); // address width
localparam FB_DATAW = CIDXW; // colour bits per pixel
// pixel read and write addresses and colours
logic [FB_ADDRW-1:0] fb_addr_write, fb_addr_read;
logic [FB_DATAW-1:0] fb_colr_write, fb_colr_read;
logic fb_we; // framebuffer write enable
// framebuffer memory
bram_sdp #(
.WIDTH(FB_DATAW),
.DEPTH(FB_PIXELS),
.INIT_F("")
) bram_inst (
.clk_write(clk_sys),
.clk_read(clk_sys),
.we(fb_we),
.addr_write(fb_addr_write),
.addr_read(fb_addr_read),
.data_in(fb_colr_write),
.data_out(fb_colr_read)
);
// display flags in system clock domain
logic frame_sys, line_sys, line0_sys;
xd xd_frame (.clk_src(clk_pix), .clk_dst(clk_sys),
.flag_src(frame), .flag_dst(frame_sys));
xd xd_line (.clk_src(clk_pix), .clk_dst(clk_sys),
.flag_src(line), .flag_dst(line_sys));
xd xd_line0 (.clk_src(clk_pix), .clk_dst(clk_sys),
.flag_src(line && sy==FB_OFFY), .flag_dst(line0_sys));
//
// draw in framebuffer
//
// reduce drawing speed to make process visible
localparam FRAME_WAIT = 200; // wait this many frames to start drawing
logic [$clog2(FRAME_WAIT)-1:0] cnt_frame_wait;
logic draw_oe; // draw requested
always_ff @(posedge clk_sys) begin
draw_oe <= 0; // comment out to draw at full speed
if (cnt_frame_wait != FRAME_WAIT-1) begin // wait for initial frames
if (frame_sys) cnt_frame_wait <= cnt_frame_wait + 1;
end else if (frame_sys) draw_oe <= 1; // draw one pixel per frame
end
// render line/edge/cube/triangles
parameter DRAW_SCALE = 1; // relative to framebuffer dimensions
logic drawing; // actively drawing
logic clip; // location is clipped
logic signed [CORDW-1:0] drx, dry; // draw coordinates
render_line #( // switch module name to change demo
.CORDW(CORDW),
.CIDXW(CIDXW),
.SCALE(DRAW_SCALE)
) render_instance (
.clk(clk_sys),
.rst(rst_sys),
.oe(draw_oe),
.start(frame_sys),
.x(drx),
.y(dry),
.cidx(fb_colr_write),
.drawing,
.done()
);
// calculate pixel address in framebuffer (three-cycle latency)
bitmap_addr #(
.CORDW(CORDW),
.ADDRW(FB_ADDRW)
) bitmap_addr_instance (
.clk(clk_sys),
.bmpw(FB_WIDTH),
.bmph(FB_HEIGHT),
.x(drx),
.y(dry),
.offx(0),
.offy(0),
.addr(fb_addr_write),
.clip
);
// delay write enable to match address calculation
localparam LAT_ADDR = 3; // latency (cycles)
logic [LAT_ADDR-1:0] fb_we_sr;
always_ff @(posedge clk_sys) begin
fb_we_sr <= {drawing, fb_we_sr[LAT_ADDR-1:1]};
if (rst_sys) fb_we_sr <= 0;
end
always_comb fb_we = fb_we_sr[0] && !clip; // check for clipping
//
// read framebuffer for display output via linebuffer
//
// count lines for scaling via linebuffer
logic [$clog2(FB_SCALE):0] cnt_lb_line;
always_ff @(posedge clk_sys) begin
if (line0_sys) cnt_lb_line <= 0;
else if (line_sys) begin
cnt_lb_line <= (cnt_lb_line == FB_SCALE-1) ? 0 : cnt_lb_line + 1;
end
end
// which screen lines need linebuffer?
logic lb_line;
always_ff @(posedge clk_sys) begin
if (line0_sys) lb_line <= 1; // enable from sy==0
if (frame_sys) lb_line <= 0; // disable at frame start
end
// enable linebuffer input
logic lb_en_in;
logic [$clog2(FB_WIDTH)-1:0] cnt_lbx; // horizontal pixel counter
always_comb lb_en_in = (lb_line && cnt_lb_line == 0 && cnt_lbx < FB_WIDTH);
// calculate framebuffer read address for linebuffer
always_ff @(posedge clk_sys) begin
if (line_sys) begin // reset horizontal counter at start of line
cnt_lbx <= 0;
end else if (lb_en_in) begin // increment address when LB enabled
fb_addr_read <= fb_addr_read + 1;
cnt_lbx <= cnt_lbx + 1;
end
if (frame_sys) fb_addr_read <= 0; // reset address at frame start
end
// enable linebuffer output
logic lb_en_out;
localparam LAT_LB = 3; // output latency compensation: lb_en_out+1, LB+1, CLUT+1
always_ff @(posedge clk_pix) begin
lb_en_out <= (sy >= FB_OFFY && sy < (FB_HEIGHT * FB_SCALE) + FB_OFFY
&& sx >= FB_OFFX - LAT_LB && sx < (FB_WIDTH * FB_SCALE) + FB_OFFX - LAT_LB);
end
// display linebuffer
logic [FB_DATAW-1:0] lb_colr_out;
linebuffer_simple #(
.DATAW(FB_DATAW),
.LEN(FB_WIDTH)
) linebuffer_instance (
.clk_sys,
.clk_pix,
.line,
.line_sys,
.en_in(lb_en_in),
.en_out(lb_en_out),
.scale(FB_SCALE),
.data_in(fb_colr_read),
.data_out(lb_colr_out)
);
// colour lookup table (CLUT)
logic [COLRW-1:0] fb_pix_colr;
clut_simple #(
.COLRW(COLRW),
.CIDXW(CIDXW),
.F_PAL(PAL_FILE)
) clut_instance (
.clk_write(clk_pix),
.clk_read(clk_pix),
.we(0),
.cidx_write(0),
.cidx_read(lb_colr_out),
.colr_in(0),
.colr_out(fb_pix_colr)
);
// paint screen
logic paint_area; // area of screen to paint
logic [CHANW-1:0] paint_r, paint_g, paint_b; // colour channels
always_comb begin
paint_area = (sy >= FB_OFFY && sy < (FB_HEIGHT * FB_SCALE) + FB_OFFY
&& sx >= FB_OFFX && sx < (FB_WIDTH * FB_SCALE) + FB_OFFX);
{paint_r, paint_g, paint_b} = paint_area ? fb_pix_colr : BG_COLR;
end
// display colour: paint colour but black in blanking interval
logic [CHANW-1:0] display_r, display_g, display_b;
always_comb {display_r, display_g, display_b} = (de) ? {paint_r, paint_g, paint_b} : 0;
// SDL output (8 bits per colour channel)
always_ff @(posedge clk_pix) begin
sdl_sx <= sx;
sdl_sy <= sy;
sdl_de <= de;
sdl_frame <= frame;
sdl_r <= {2{display_r}};
sdl_g <= {2{display_g}};
sdl_b <= {2{display_b}};
end
endmodule
The following section will examine the new module. Refer to Framebuffers if you’d like an explanation of the clocks, linebuffer, and the colour lookup table.
Draw Instance
The actual drawing is accomplished by the render_line
instance a little after line 100.
This is where you can adjust the drawing scale and switch to other rendering modules (discussed shortly).
// render line/edge/cube/triangles
parameter DRAW_SCALE = 1; // relative to framebuffer dimensions
logic drawing; // actively drawing
logic signed [CORDW-1:0] drx, dry; // draw coordinates
render_line #( // switch module name to change demo
.CORDW(CORDW),
.CIDXW(CIDXW),
.SCALE(DRAW_SCALE)
) render_instance (
.clk(clk_sys),
.rst(rst_sys),
.oe(draw_oe),
.start(frame_sys),
.x(drx),
.y(dry),
.cidx(fb_colr_write),
.drawing,
.done()
);
Drawing Speed
Our line-drawing algorithm runs at over 125+ MHz on the Arty. To see how the line is drawn, we can reduce the drawing speed to one pixel per frame (60 pixels per second):
// reduce drawing speed to make process visible
localparam FRAME_WAIT = 200; // wait this many frames to start drawing
logic [$clog2(FRAME_WAIT)-1:0] cnt_frame_wait;
logic draw_oe; // draw requested
always_ff @(posedge clk_sys) begin
draw_oe <= 0; // comment out to draw at full speed
if (cnt_frame_wait != FRAME_WAIT-1) begin // wait for initial frames
if (frame_sys) cnt_frame_wait <= cnt_frame_wait + 1;
end else if (frame_sys) draw_oe <= 1; // draw one pixel per frame
end
The design waits for 200 frames (3.3 seconds) to give the monitor time to show the image before starting to draw. On iCEBreaker, the demo waits for 300 frames (5 seconds) before starting to draw. You can adjust this delay or remove it altogether if you prefer.
To draw at full speed after the delay, comment out draw_oe <= 0;
.
Latency Correction
When we discussed address calculation we noted the need for latency correction. We use a shift register to delay the write-enable signal to match the address calculation:
// delay write enable to match address calculation
localparam LAT_ADDR = 3; // latency (cycles)
logic [LAT_ADDR-1:0] fb_we_sr;
always_ff @(posedge clk_sys) begin
fb_we_sr <= {drawing, fb_we_sr[LAT_ADDR-1:1]};
if (rst_sys) fb_we_sr <= 0;
end
always_comb fb_we = fb_we_sr[0] && !clip; // check for clipping
To avoid drawing artefacts, we disable fb_we
if the position is outside the screen area (clipped).
That Ain’t No Cube
If we can draw one line, we can draw many!
Let’s draw a cube as you’ve probably doodled on paper; this requires nine lines.
The 16-colour 320x180 version is shown below:
module render_cube #(
parameter CORDW=16, // signed coordinate width (bits)
parameter CIDXW=4, // colour index width (bits)
parameter SCALE=1 // drawing scale: 1=320x180, 2=640x360, 4=1280x720
) (
input wire logic clk, // clock
input wire logic rst, // reset
input wire logic oe, // output enable
input wire logic start, // start drawing
output logic signed [CORDW-1:0] x, // horizontal draw position
output logic signed [CORDW-1:0] y, // vertical draw position
output logic [CIDXW-1:0] cidx, // pixel colour
output logic drawing, // actively drawing
output logic done // drawing is complete (high for one tick)
);
localparam LINE_CNT=9; // number of lines to draw
logic [$clog2(LINE_CNT):0] line_id; // line identifier
logic signed [CORDW-1:0] vx0, vy0, vx1, vy1; // line coords
logic draw_start, draw_done; // drawing signals
// draw state machine
enum {IDLE, INIT, DRAW, DONE} state;
always_ff @(posedge clk) begin
case (state)
INIT: begin // register coordinates and colour
draw_start <= 1;
state <= DRAW;
cidx <= 'h2; // colour index
case (line_id)
'd0: begin
vx0 <= 130; vy0 <= 60; vx1 <= 230; vy1 <= 60;
end
'd1: begin
vx0 <= 230; vy0 <= 60; vx1 <= 230; vy1 <= 160;
end
'd2: begin
vx0 <= 230; vy0 <= 160; vx1 <= 130; vy1 <= 160;
end
'd3: begin
vx0 <= 130; vy0 <= 160; vx1 <= 130; vy1 <= 60;
end
'd4: begin
vx0 <= 130; vy0 <= 160; vx1 <= 90; vy1 <= 120;
end
'd5: begin
vx0 <= 90; vy0 <= 120; vx1 <= 90; vy1 <= 20;
end
'd6: begin
vx0 <= 90; vy0 <= 20; vx1 <= 130; vy1 <= 60;
end
'd7: begin
vx0 <= 90; vy0 <= 20; vx1 <= 190; vy1 <= 20;
end
default: begin // line_id=8
vx0 <= 190; vy0 <= 20; vx1 <= 230; vy1 <= 60;
end
endcase
end
DRAW: begin
draw_start <= 0;
if (draw_done) begin
if (line_id == LINE_CNT-1) begin
state <= DONE;
end else begin
line_id <= line_id + 1;
state <= INIT;
end
end
end
DONE: state <= DONE;
default: if (start) state <= INIT; // IDLE
endcase
if (rst) state <= IDLE;
end
draw_line #(.CORDW(CORDW)) draw_line_inst (
.clk,
.rst,
.start(draw_start),
.oe,
.x0(vx0 * SCALE),
.y0(vy0 * SCALE),
.x1(vx1 * SCALE),
.y1(vy1 * SCALE),
.x,
.y,
.drawing,
.busy(),
.done(draw_done)
);
always_comb done = (state == DONE);
endmodule
Cube Demo
To see the cube demo, replace render_*
with render_cube
in your top module and rebuild.
It looks like a cube, but it’s an ersatz cube. Our cube has no real depth; it cannot move in 3D space, nor can we apply realistic lighting. We’ll cover real 3D models in a future post, but for now, let’s turn our attention to the most critical shape in all of computer graphics: the triangle.
The Triangle
As you gaze upon the beautiful 4K vista from a AAA game in the 2020s, know this: it’s all triangles!
A triangle consists of three lines, so we could issue three requests to draw_line
, but it’s so valuable that it deserves its own module [draw_triangle.sv]:
module draw_triangle #(parameter CORDW=16) ( // signed coordinate width
input wire logic clk, // clock
input wire logic rst, // reset
input wire logic start, // start triangle drawing
input wire logic oe, // output enable
input wire logic signed [CORDW-1:0] x0, y0, // vertex 0
input wire logic signed [CORDW-1:0] x1, y1, // vertex 1
input wire logic signed [CORDW-1:0] x2, y2, // vertex 2
output logic signed [CORDW-1:0] x, y, // drawing position
output logic drawing, // actively drawing
output logic busy, // drawing request in progress
output logic done // drawing is complete (high for one tick)
);
logic [1:0] line_id; // current line (0, 1, or 2)
logic line_start; // start drawing line
logic line_done; // finished drawing current line?
// current line coordinates
logic signed [CORDW-1:0] lx0, ly0; // point 0 position
logic signed [CORDW-1:0] lx1, ly1; // point 1 position
// draw state machine
enum {IDLE, INIT, DRAW} state;
always_ff @(posedge clk) begin
case (state)
INIT: begin // register coordinates
state <= DRAW;
line_start <= 1;
if (line_id == 2'd0) begin // (x0,y0) (x1,y1)
lx0 <= x0; ly0 <= y0;
lx1 <= x1; ly1 <= y1;
end else if (line_id == 2'd1) begin // (x1,y1) (x2,y2)
lx0 <= x1; ly0 <= y1;
lx1 <= x2; ly1 <= y2;
end else begin // (x2,y2) (x0,y0)
lx0 <= x2; ly0 <= y2;
lx1 <= x0; ly1 <= y0;
end
end
DRAW: begin
line_start <= 0;
if (line_done) begin
if (line_id == 2) begin // final line
state <= IDLE;
busy <= 0;
done <= 1;
end else begin
state <= INIT;
line_id <= line_id + 1;
end
end
end
default: begin // IDLE
done <= 0;
if (start) begin
state <= INIT;
line_id <= 0;
busy <= 1;
end
end
endcase
if (rst) begin
state <= IDLE;
line_id <= 0;
line_start <= 0;
busy <= 0;
done <= 0;
end
end
draw_line #(.CORDW(CORDW)) draw_line_inst (
.clk,
.rst,
.start(line_start),
.oe,
.x0(lx0),
.y0(ly0),
.x1(lx1),
.y1(ly1),
.x,
.y,
.drawing,
.busy(),
.done(line_done)
);
endmodule
There’s a test bench you can use to exercise the module with Vivado: [xc7/draw_triangle_tb.sv].
The Magic Number
Let’s create a render module to draw three triangles:
Replace render_*
with render_triangles
in your top module and rebuild.
We can draw millions of pixels per second, but drawing one per frame is fun to watch:
Explore
I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:
- Experiment with different lines, triangles, and colours
- What’s the most impressive thing you can draw with a handful of straight lines?
- We drew a cube, but how about the other Platonic solids?
- Draw a landscape with one-point perspective (YouTube example)
What’s Next?
In 2D Shapes, we’ll draw rectangles and circles as well as triangles, then learn to colour them in. You can also check out my FPGA & RISC-V Tutorials.
Get in touch on Mastodon, Bluesky, or X. If you enjoy my work, please sponsor me. 🙏