Welcome back to Exploring FPGA Graphics. In the final part of our introductory graphics series, we’re looking at animation. We’ve already seen animation with hardware sprites, but double buffering gives us maximum creative freedom with fast, tear-free motion. We’ll be making extensive use of our designs from 2D Shapes, so have a look back at that post if you need a refresher on drawing shapes. This post was last updated in February 2022.
In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how displays work, race the beam with Pong, animate starfields and sprites, paint Michelangelo’s David, simulate life with bitmaps, draw lines and shapes, and create smooth animation with double buffering. New to the series? Start with Intro to FPGA Graphics.
Get in touch: GitHub Issues, 1BitSquared Discord, @WillFlux (Mastodon), @WillFlux (Twitter)
Series Outline
- Beginning FPGA Graphics - video signals and basic graphics
- Racing the Beam - simple demos with minimal logic
- FPGA Pong - recreate the classic arcade on an FPGA
- Display Signals - revist display signals and meet colour palettes
- Hardware Sprites - fast, colourful graphics for games
- Ad Astra - demo with starfields and hardware sprites
- Framebuffers - bitmap graphics featuring Michelangelo’s David
- Life on Screen - Conway’s Game of Life in logic
- Lines and Triangles - drawing lines and triangles
- 2D Shapes - filled shapes and simple pictures
- Animated Shapes (this post) - animation and double-buffering
Sponsor My Work
If you like what I do, consider sponsoring me on GitHub.
I love FPGAs and want to help more people discover and use them in their projects.
My hardware designs are open source, and my blog is advert free.
Requirements
For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.
We’ll be demoing with these boards:
- iCEBreaker (Lattice iCE40) with 12-Bit DVI Pmod
- Digilent Arty A7-35T (Xilinx Artix-7) with Pmod VGA
iCEBreaker Support
I am working on a double-buffered framebuffer using iCE40 SPRAM. iCEBreaker designs will be available later in 2022. Follow @WillFlux for updates.
Source
The SystemVerilog designs featured in this series are available from the projf-explore git repo under the open-source MIT licence: build on them to your heart’s content. The rest of the blog content is subject to standard copyright restrictions: don’t republish it without permission.
Blazing a Trail
Back in the very first part of this series, we animated bouncing squares; now we’re going to do it with a framebuffer. We take a filled square and bounce it around the screen, changing its colour every frame. This design uses a single framebuffer hence the ‘sb’ name.
- Arty (XC7): [xc7/top_sb_bounce.sv]
- iCEBreaker (iCE40): not yet available
Building the Designs
In the Animated Shapes section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.
The drawing part of the design looks like this in the Arty version:
// square coordinates
localparam Q1_SIZE = 80;
logic [CORDW-1:0] q1x, q1y; // position (top left)
logic q1dx, q1dy; // direction: 0 is right/down
logic [CORDW-1:0] q1s = 2; // speed in pixels/frame
always_ff @(posedge clk_100m) begin
if (frame_sys) begin
if (q1x >= FB_WIDTH - (Q1_SIZE + q1s)) begin // right edge
q1dx <= 1;
q1x <= q1x - q1s;
end else if (q1x < q1s) begin // left edge
q1dx <= 0;
q1x <= q1x + q1s;
end else q1x <= (q1dx) ? q1x - q1s : q1x + q1s;
if (q1y >= FB_HEIGHT - (Q1_SIZE + q1s)) begin // bottom edge
q1dy <= 1;
q1y <= q1y - q1s;
end else if (q1y < q1s) begin // top edge
q1dy <= 0;
q1y <= q1y + q1s;
end else q1y <= (q1dy) ? q1y - q1s : q1y + q1s;
end
end
// draw square in framebuffer
logic [CORDW-1:0] rx0, ry0, rx1, ry1; // shape coords
logic draw_start, drawing, draw_done; // drawing signals
// draw state machine
enum {IDLE, INIT, DRAW, DONE} state;
always_ff @(posedge clk_100m) begin
case (state)
INIT: begin // register coordinates and colour
draw_start <= 1;
state <= DRAW;
rx0 <= q1x;
ry0 <= q1y;
rx1 <= q1x + Q1_SIZE;
ry1 <= q1y + Q1_SIZE;
fb_cidx <= fb_cidx + 1;
end
DRAW: begin
draw_start <= 0;
if (draw_done) state <= DONE;
end
DONE: state <= IDLE;
default: if (frame_sys) state <= INIT; // IDLE
endcase
end
draw_rectangle_fill #(.CORDW(CORDW)) draw_rectangle_inst (
.clk(clk_100m),
.rst(1'b0),
.start(draw_start),
.oe(!fb_busy), // draw when framebuffer isn't busy
.x0(rx0),
.y0(ry0),
.x1(rx1),
.y1(ry1),
.x(fbx),
.y(fby),
.drawing,
.busy(),
.done(draw_done)
);
Our framebuffer remembers all the squares we’ve drawn, so the screen gradually fills with striped colour. While this is a fun effect, it’s not usually what you want.
Clean Movement
There are three approaches we can take to move an object around the screen cleanly:
- Use hardware sprites - suitable for simple 2D graphics
- Use a blitter to cut out and move a framebuffer region - effective for small 2D objects
- Clear the framebuffer and draw from scratch - versatile but requires plenty of bandwidth
For this post, we’ll go with option 3, but more than that, we’ll also introduce double buffering.
Double Buffering
We can’t draw in a framebuffer while the display controller reads it; otherwise, we’ll get tearing. We could limit ourselves to drawing in the vertical blanking interval, but even for 640x480 with its generous blanking, we’d only be able to draw for less than 10% of the time. This is not enough time to do much interesting, especially as we need to clear the framebuffer of old designs every frame.
To draw all the time, we can double up on framebuffers: drawing in one while the display controller reads from the other. That way, we can be drawing all the time and avoid screen tearing. The only downsides are the need for twice the memory, and an extra frame of latency before the new output is visible.
Add the appropriate double buffering module for your board:
- Arty (XC7): [framebuffer_bram_db.sv]:
- iCEBreaker (iCE40): not yet available
The BRAM version for the Arty is show below:
module framebuffer_bram_db #(
parameter CORDW=16, // signed coordinate width (bits)
parameter WIDTH=320, // width of framebuffer in pixels
parameter HEIGHT=180, // height of framebuffer in pixels
parameter CIDXW=4, // colour index data width: 4=16, 8=256 colours
parameter CHANW=4, // width of RGB colour channels (4 or 8 bit)
parameter SCALE=4, // display output scaling factor (>=1)
parameter F_IMAGE="", // image file to load into framebuffer
parameter F_PALETTE="" // palette file to load into CLUT
) (
input wire logic clk_sys, // system clock
input wire logic clk_pix, // pixel clock
input wire logic rst_sys, // reset (clk_sys)
input wire logic rst_pix, // reset (clk_pix)
input wire logic de, // data enable for display (clk_pix)
input wire logic frame, // start a new frame (clk_pix)
input wire logic line, // start a new screen line (clk_pix)
input wire logic we, // write enable
input wire logic signed [CORDW-1:0] x, // horizontal pixel coordinate
input wire logic signed [CORDW-1:0] y, // vertical pixel coordinate
input wire logic [CIDXW-1:0] cidx, // framebuffer colour index
input wire logic [CIDXW-1:0] bgidx, // framebuffer background colour index
input wire logic clear, // clear write buffer on frame start
output logic busy, // busy with reading for display output
output logic wready, // ready to accept writes (after clear)
output logic clip, // pixel coordinate outside buffer
output logic [CHANW-1:0] red, // colour output to display (clk_pix)
output logic [CHANW-1:0] green, // " " " " "
output logic [CHANW-1:0] blue // " " " " "
);
logic frame_sys; // start of new frame in system clock domain
xd xd_frame (.clk_i(clk_pix), .clk_o(clk_sys),
.rst_i(rst_pix), .rst_o(rst_sys), .i(frame), .o(frame_sys));
// buffer selection
logic front_buf;
always_ff @(posedge clk_sys) begin
if (frame_sys) front_buf <= ~front_buf; // swap every frame
if (rst_sys) front_buf <= 0;
end
// framebuffer (FB)
localparam FB_PIXELS = WIDTH * HEIGHT;
localparam FB_BUFSIZE = 2 ** $clog2(FB_PIXELS+1); // align buffers to power-of-two
localparam FB_DEPTH = 2 * FB_BUFSIZE;
localparam FB_ADDRW = $clog2(FB_BUFSIZE);
localparam FB_DATAW = CIDXW;
localparam FB_DUALPORT = 1; // separate read and write ports?
logic [FB_ADDRW-1:0] fb_addr_read, fb_addr_write;
logic [FB_DATAW-1:0] fb_cidx_read, fb_cidx_read_p1;
// write address components
logic signed [CORDW-1:0] x_add; // pixel position on line
logic signed[FB_ADDRW-1:0] fb_addr_line; // address of line for drawing
logic [FB_ADDRW-1:0] fb_addr_clr, fb_addr_clr_p1; // address for clearing screen
// write state machine
enum {IDLE, INIT, CLR, WAIT, ACTIVE} wstate;
always_ff @(posedge clk_sys) begin
case (wstate)
INIT: begin
wstate <= (clear) ? CLR : ACTIVE;
fb_addr_clr_p1 <= 0;
end
CLR: begin // clear whole buffer
if (fb_addr_clr_p1 == FB_BUFSIZE-1) wstate <= WAIT;
fb_addr_clr_p1 <= fb_addr_clr_p1 + 1;
end
WAIT: wstate <= ACTIVE; // one cycle of latency before becoming active
default: if (frame_sys) wstate <= INIT; // IDLE or ACTIVE
endcase
if (rst_sys) wstate <= IDLE;
end
// ready for drawing to begin
always_comb wready = (wstate == ACTIVE);
// calculate write address from pixel coordinates (three stages: mul, add, clear mux)
always_ff @(posedge clk_sys) begin
// first stage
fb_addr_clr <= fb_addr_clr_p1;
fb_addr_line <= WIDTH * y; // write address 1st stage (y could be negative)
x_add <= x; // save x for write address 2nd stage
// second stage
fb_addr_write <= (wready) ? fb_addr_line + x_add : fb_addr_clr;
end
// draw colour and write enable (delay to match address calculation)
logic fb_we, we_in_p1;
logic [FB_DATAW-1:0] fb_cidx_write, cidx_in_p1;
always_ff @(posedge clk_sys) begin
// first stage
we_in_p1 <= ((we && wready) || wstate == CLR); // draw or clear enables write
cidx_in_p1 <= (wstate == CLR) ? bgidx : cidx; // which draw colour?
clip <= (wstate == ACTIVE) && (y < 0 || y >= HEIGHT || x < 0 || x >= WIDTH); // clipped?
// second stage
fb_we <= (busy || clip) ? 0 : we_in_p1; // write if neither busy nor clipped
fb_cidx_write <= cidx_in_p1;
end
// framebuffer memory (BRAM)
bram_sdp #(
.WIDTH(FB_DATAW),
.DEPTH(FB_DEPTH),
.INIT_F(F_IMAGE)
) bram_inst (
.clk_write(clk_sys),
.clk_read(clk_sys),
.we(fb_we),
.addr_write({~front_buf,fb_addr_write}),
.addr_read({front_buf,fb_addr_read}),
.data_in(fb_cidx_write),
.data_out(fb_cidx_read)
);
// linebuffer (LB)
localparam LB_SCALE = SCALE; // scale (horizontal and vertical)
localparam LB_LEN = WIDTH; // line length matches framebuffer
localparam LB_BPC = CHANW; // bits per colour channel
logic lb_data_req; // LB requesting data
logic [$clog2(LB_LEN+1)-1:0] cnt_h; // count pixels in line to read
// LB enable (not corrected for latency)
logic lb_en_in, lb_en_out;
always_comb lb_en_in = cnt_h < LB_LEN;
always_comb lb_en_out = de;
// LB enable in: BRAM, address calc, and CLUT reg add three cycles of latency
localparam LAT = 3; // write latency
logic [LAT-1:0] lb_en_in_sr;
always_ff @(posedge clk_sys) begin
lb_en_in_sr <= {lb_en_in, lb_en_in_sr[LAT-1:1]};
if (rst_sys) lb_en_in_sr <= 0;
end
// Load data from FB into LB
always_ff @(posedge clk_sys) begin
if (fb_addr_read < FB_PIXELS-1) begin
if (lb_data_req) begin
cnt_h <= 0; // start new line
if (!FB_DUALPORT) busy <= 1; // set busy flag if not dual port
end else if (cnt_h < LB_LEN) begin // advance to start of next line
cnt_h <= cnt_h + 1;
fb_addr_read <= fb_addr_read + 1;
end
end else cnt_h <= LB_LEN;
if (frame_sys) begin
fb_addr_read <= 0; // new frame
busy <= 0; // LB reads don't cross frame boundary
end
if (rst_sys) begin
fb_addr_read <= 0;
busy <= 0;
cnt_h <= LB_LEN; // don't start reading after reset
end
if (lb_en_in_sr == 3'b100) busy <= 0; // LB read done: match latency `LAT`
end
// LB colour channels
logic [LB_BPC-1:0] lb_in_0, lb_in_1, lb_in_2;
logic [LB_BPC-1:0] lb_out_0, lb_out_1, lb_out_2;
linebuffer #(
.WIDTH(LB_BPC), // data width of each channel
.LEN(LB_LEN), // length of line
.SCALE(LB_SCALE) // scaling factor (>=1)
) lb_inst (
.clk_in(clk_sys), // input clock
.clk_out(clk_pix), // output clock
.rst_in(rst_sys), // reset (clk_in)
.rst_out(rst_pix), // reset (clk_out)
.data_req(lb_data_req), // request input data (clk_in)
.en_in(lb_en_in_sr[0]), // enable input (clk_in)
.en_out(lb_en_out), // enable output (clk_out)
.frame, // start a new frame (clk_out)
.line, // start a new line (clk_out)
.din_0(lb_in_0), // data in (clk_in)
.din_1(lb_in_1),
.din_2(lb_in_2),
.dout_0(lb_out_0), // data out (clk_out)
.dout_1(lb_out_1),
.dout_2(lb_out_2)
);
// improve timing with register between BRAM and async ROM
always_ff @(posedge clk_sys) fb_cidx_read_p1 <= fb_cidx_read;
// colour lookup table (ROM)
localparam CLUTW = 3 * CHANW;
logic [CLUTW-1:0] clut_colr;
rom_async #(
.WIDTH(CLUTW),
.DEPTH(2**CIDXW),
.INIT_F(F_PALETTE)
) clut (
.addr(fb_cidx_read_p1),
.data(clut_colr)
);
// map colour index to palette using CLUT and read into LB
always_ff @(posedge clk_sys) {lb_in_2, lb_in_1, lb_in_0} <= clut_colr;
logic lb_en_out_p1; // LB enable out: reading from LB BRAM takes one cycle
always_ff @(posedge clk_pix) lb_en_out_p1 <= lb_en_out;
// colour output - combinational because top module should register
always_comb begin
red = lb_en_out_p1 ? lb_out_2 : 0;
green = lb_en_out_p1 ? lb_out_1 : 0;
blue = lb_en_out_p1 ? lb_out_0 : 0;
end
endmodule
This double-buffered BRAM design is around 30 lines longer than the original framebuffer module.
There’s a test bench you can use to exercise the module with Vivado: [xc7/framebuffer_db_tb.sv].
Details of how double buffering works will be added soon.
Hip to be Square Redux
We can cleanly animate a square using our new double-buffered framebuffer:
- Arty (XC7): [xc7/top_db_bounce.sv]
- iCEBreaker (iCE40): not yet available
That seems like a lot of work to replicate what we did in a few lines back at the start of the series, but drawing shapes in a framebuffer is far more versatile.
Demos
To finish, try these double-buffered demos:
Shattered Cube
Back when we learnt about filled triangles we built a cube, now we can tear it apart:
- Arty (XC7): [xc7/top_cube_pieces.sv]
- iCEBreaker (iCE40): not yet available
Teleport
Using our double-buffer and a few animated rectangles we can create a teleport effect:
- Arty (XC7): [xc7/top_teleport.sv]
- iCEBreaker (iCE40): not yet available
The Arty version of the teleport looks like this:
module top_teleport (
input wire logic clk_100m, // 100 MHz clock
input wire logic btn_rst, // reset button (active low)
output logic vga_hsync, // horizontal sync
output logic vga_vsync, // vertical sync
output logic [3:0] vga_r, // 4-bit VGA red
output logic [3:0] vga_g, // 4-bit VGA green
output logic [3:0] vga_b // 4-bit VGA blue
);
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen_480p clock_pix_inst (
.clk(clk_100m),
.rst(!btn_rst), // reset button is active low
.clk_pix,
.clk_locked
);
// display sync signals and coordinates
localparam CORDW = 16;
logic signed [CORDW-1:0] sx, sy;
logic hsync, vsync;
logic frame, line;
display_480p #(.CORDW(CORDW)) display_inst (
.clk_pix,
.rst(!clk_locked),
.sx,
.sy,
.hsync,
.vsync,
.de(),
.frame,
.line
);
logic frame_sys; // start of new frame in system clock domain
xd xd_frame (.clk_i(clk_pix), .clk_o(clk_100m),
.rst_i(1'b0), .rst_o(1'b0), .i(frame), .o(frame_sys));
// framebuffer (FB)
localparam FB_WIDTH = 320;
localparam FB_HEIGHT = 180;
localparam FB_CIDXW = 4;
localparam FB_CHANW = 4;
localparam FB_SCALE = 2;
localparam FB_IMAGE = "";
localparam FB_PALETTE = "teleport_16_colr_4bit_palette.mem";
logic fb_we, fb_busy, fb_wready;
logic signed [CORDW-1:0] fbx, fby; // framebuffer coordinates
logic [FB_CIDXW-1:0] fb_cidx;
logic [FB_CHANW-1:0] fb_red, fb_green, fb_blue; // colours for display
framebuffer_bram_db #(
.WIDTH(FB_WIDTH),
.HEIGHT(FB_HEIGHT),
.CIDXW(FB_CIDXW),
.CHANW(FB_CHANW),
.SCALE(FB_SCALE),
.F_IMAGE(FB_IMAGE),
.F_PALETTE(FB_PALETTE)
) fb_inst (
.clk_sys(clk_100m),
.clk_pix(clk_pix),
.rst_sys(1'b0),
.rst_pix(1'b0),
.de(sy >= 60 && sy < 420 && sx >= 0), // 16:9 letterbox
.frame,
.line,
.we(fb_we),
.x(fbx),
.y(fby),
.cidx(fb_cidx),
.bgidx(4'h0),
.clear(1'b0), // teleport doesn't need clearing
.busy(fb_busy),
.wready(fb_wready),
.clip(),
.red(fb_red),
.green(fb_green),
.blue(fb_blue)
);
// animation steps
localparam ANIM_CNT=5; // five different frames in animation
localparam ANIM_SPEED=4; // display each animation step four times (15 FPS)
logic [$clog2(ANIM_CNT)-1:0] cnt_anim;
logic [$clog2(ANIM_SPEED)-1:0] cnt_anim_speed;
logic [FB_CIDXW-1:0] colr_offs; // colour offset
always_ff @(posedge clk_100m) begin
if (frame_sys) begin
if (cnt_anim_speed == ANIM_SPEED-1) begin
if (cnt_anim == ANIM_CNT-1) begin
cnt_anim <= 0;
colr_offs <= colr_offs + 1;
end else cnt_anim <= cnt_anim + 1;
cnt_anim_speed <= 0;
end else cnt_anim_speed <= cnt_anim_speed + 1;
end
end
// draw squares in framebuffer
localparam SHAPE_CNT=7; // number of shapes to draw
logic [3:0] shape_id; // shape identifier
logic [CORDW-1:0] dx0, dy0, dx1, dy1; // shape coords
logic draw_start, drawing, draw_done; // drawing signals
// draw state machine
enum {IDLE, INIT, CLEAR, DRAW, DONE} state;
always_ff @(posedge clk_100m) begin
case (state)
INIT: begin // register coordinates and colour
if (fb_wready) begin
draw_start <= 1;
state <= DRAW;
case (shape_id)
4'd0: begin // 12 pixels per anim step
dx0 <= 40 - (cnt_anim * 12);
dy0 <= 0 - (cnt_anim * 12);
dx1 <= 279 + (cnt_anim * 12);
dy1 <= 249 + (cnt_anim * 12);
fb_cidx <= colr_offs;
end
4'd1: begin // 8 pixels per anim step
dx0 <= 80 - (cnt_anim * 8);
dy0 <= 10 - (cnt_anim * 8);
dx1 <= 239 + (cnt_anim * 8);
dy1 <= 169 + (cnt_anim * 8);
fb_cidx <= colr_offs + 1;
end
4'd2: begin // 5 pixels per anim step
dx0 <= 105 - (cnt_anim * 5);
dy0 <= 35 - (cnt_anim * 5);
dx1 <= 214 + (cnt_anim * 5);
dy1 <= 144 + (cnt_anim * 5);
fb_cidx <= colr_offs + 2;
end
4'd3: begin // 4 pixels per anim step
dx0 <= 125 - (cnt_anim * 4);
dy0 <= 55 - (cnt_anim * 4);
dx1 <= 194 + (cnt_anim * 4);
dy1 <= 124 + (cnt_anim * 4);
fb_cidx <= colr_offs + 3;
end
4'd4: begin // 3 pixels per anim step
dx0 <= 140 - (cnt_anim * 3);
dy0 <= 70 - (cnt_anim * 3);
dx1 <= 179 + (cnt_anim * 3);
dy1 <= 109 + (cnt_anim * 3);
fb_cidx <= colr_offs + 4;
end
4'd5: begin // 2 pixels per anim step
dx0 <= 150 - (cnt_anim * 2);
dy0 <= 80 - (cnt_anim * 2);
dx1 <= 169 + (cnt_anim * 2);
dy1 <= 99 + (cnt_anim * 2);
fb_cidx <= colr_offs + 5;
end
4'd6: begin // 1 pixel per anim step
dx0 <= 155 - (cnt_anim * 1);
dy0 <= 85 - (cnt_anim * 1);
dx1 <= 164 + (cnt_anim * 1);
dy1 <= 94 + (cnt_anim * 1);
fb_cidx <= colr_offs + 6;
end
default: begin // should never occur
dx0 <= 10; dy0 <= 10;
dx1 <= 20; dy1 <= 20;
fb_cidx <= 4'h7; // white
end
endcase
end
end
DRAW: begin
draw_start <= 0;
if (draw_done) begin
if (shape_id == SHAPE_CNT-1) begin
state <= DONE;
end else begin
shape_id <= shape_id + 1;
state <= INIT;
end
end
end
DONE: state <= IDLE;
default: if (frame_sys) begin // IDLE
state <= INIT;
shape_id <= 0;
end
endcase
end
draw_rectangle_fill #(.CORDW(CORDW)) draw_rectangle_inst (
.clk(clk_100m),
.rst(1'b0),
.start(draw_start),
.oe(!fb_busy), // draw when framebuffer isn't busy
.x0(dx0),
.y0(dy0),
.x1(dx1),
.y1(dy1),
.x(fbx),
.y(fby),
.drawing,
.busy(),
.done(draw_done)
);
// write to framebuffer when drawing
always_comb fb_we = drawing;
// reading from FB takes one cycle: delay display signals to match
logic hsync_p1, vsync_p1;
always_ff @(posedge clk_pix) begin
hsync_p1 <= hsync;
vsync_p1 <= vsync;
end
// VGA output
always_ff @(posedge clk_pix) begin
vga_hsync <= hsync_p1;
vga_vsync <= vsync_p1;
vga_r <= fb_red;
vga_g <= fb_green;
vga_b <= fb_blue;
end
endmodule
In a Spin
Rotate a shape at the push of a button and a little sine & cosine.
- Arty (XC7): [xc7/top_rotate.sv]
- iCEBreaker (iCE40): not yet available
Explore
I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:
Suggestions will be added soon.
What Next?
This is the end of the current series of FPGA Graphics. Watch out for a future series covering more advanced graphics. Until then, why not check out other posts from Project F.
Get in touch: GitHub Issues, 1BitSquared Discord, @WillFlux (Mastodon), @WillFlux (Twitter)