Welcome back to Exploring FPGA Graphics. In this post we’re going to use the designs we created in Framebuffers to experiment with Conway’s Game of Life.
In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020 and 2021. New to the series? Start with Exploring FPGA Graphics.
Designs for iCEBreaker are not yet available pending a BRAM fix.
Updated 2021-01-09. Get in touch with @WillFlux or open an issue on GitHub.
He is Archimedes, Mick Jagger, Salvador Dalí, and Richard Feynman, all rolled into one.
The Guardian, John Horton Conway: the world’s most charismatic mathematician (2015)
Series Outline
- Exploring FPGA Graphics - learn how displays work and animate simple shapes
- FPGA Pong - race the beam to create the arcade classic
- Hardware Sprites - fast, colourful, graphics with minimal resources
- FPGA Ad Astra - demo with hardware sprites and animated starfields
- Framebuffers - driving the display from a bitmap in memory
- Life on Screen (this post) - the screen comes alive with Conway’s Game of Life
More parts to follow.
Requirements
For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. You should be comfortable with programming your FPGA board and reasonably familiar with Verilog.
We’ll be demoing with these boards (FPGA type):
- iCEBreaker (Lattice iCE40) with 12-Bit DVI Pmod (designs available soon)
- Digilent Arty A7-35T (Xilinx Artix-7) with Pmod VGA
Source
The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.
Conway’s Life
John Conway was a remarkable mathematician, active in many fields, who sadly died in 2020. Conway is best known to the public for recreational mathematics with Martin Gardner in Scientific American. I remember playing Sprouts at school and trying to impress people by calculating the day of the week with the Doomsday rule. For now, I’ll be limiting myself to Game of Life, but I highly recommend learning more of Conway: The world’s most charismatic mathematician is an excellent place to start.
Game of Life first appeared in the Mathematical Games column in the October 1970 issue of Scientific American. Anyone can access the article at ibiblio.org, or you can see a scan of the original at JSTOR (requires login via academic institution or library).
The Rules of Life
The universe of the Game of Life is an infinite, two-dimensional, grid of cells. A cell is one of two states: dead or alive. Every cell interacts with its eight neighbours (those to the left, right, top, bottom, and the four diagonals).
The following rules determine the fate of a cell in the next generation:
- Survival: Every cell with two or three neighbours survives.
- Deaths:
a. Every cell with four or more neighbours dies from overpopulation.
b. Every cell with zero or one neighbours dies from isolation. - Births: Every empty cell with exactly three neighbours comes to life.
To understand this, it helps to see some practical examples. Common patterns within the Life universe have been given names. The diagram below shows two patterns: “Beehive” and “Beacon”.
Every live cell in the Beehive pattern has two or three neighbours: so, no cells die. No dead cells in Beehive have three neighbours: so, no cells come to life. This pattern is categorised as a “still life”, because it doesn’t change from one generation to the next.
The Beacon pattern is a bit more interesting: it oscillates between two states. Two dead cells in the centre of the pattern have three neighbours, so come to life. However, they then have four neighbours, so die the next generation due to overcrowding: this pattern repeats endlessly.
However, things start to get really interesting with slightly more complex patters. The following photo shows the Gosper glider gun, which repeatedly generates small repeating patterns called gliders.
It’s even possible to construct logic gates using gliders, and ultimately a universal Turing machine. As this post is primarily about FPGA graphics, we’re not going to dig further into the esoteric possibilities of Conway’s Life, but check out Wikipedia’s article on Conway’s Game of Life.
Life in Hardware
Now we have a basic understanding of Conway’s Life we’re going to create a hardware implementation in Verilog.
We start by rearranging the rules to simplify the design:
- If a cell is alive
a. if it has 2 or 3 neighbours, it’s alive next generation
b. otherwise, it’s dead next generation - If a cell is dead
a. if it has 3 neighbours, it’s alive next generation
b. otherwise, it’s dead next generation
Handily, a bitmap is a two-dimensional grid as required by Life, but our bitmap won’t be infinite! Instead, our bitmap wraps around top and bottom, and between left and right: this is known as toroidal array. Contrast this with a 2D map of Earth: which wraps around at the east and west, but not at the north and south.
Since a cell’s state depends on all those around it, we can’t calculate the new state of each cell sequentially in one bitmap. Instead, we have two buffers: one contains the current generation that we use to calculate the next generation. We swap buffers, repeating the process. This double buffering also allows us to safely update the display without risking any artefacts or tearing: the display is always being driven from a complete bitmap. Our life module is ignorant of the two buffers, this will be handled in the controlling top module, with reads and writes directed to different offsets.
Create a life module with the following design - life.sv:
module life #(
parameter WORLD_WIDTH=6,
parameter WORLD_HEIGHT=6,
parameter ADDRW=$clog2(WORLD_WIDTH * WORLD_HEIGHT)
) (
input wire logic clk,
input wire logic start,
input wire logic run,
output logic [ADDRW-1:0] id,
input wire logic r_status,
output logic w_status,
output logic we,
output logic done
);
// simulation parameters
localparam CELL_COUNT = WORLD_WIDTH * WORLD_HEIGHT; // total number of cells
localparam NEIGHBOURS_COUNT = 8; // number of neighbours each cell has
// number of alive neighbours (could be eight!)
logic [$clog2(NEIGHBOURS_COUNT+1)-1:0] neighbours_alive;
// internal cell and neighbour IDs
logic [ADDRW-1:0] cid, cid_next;
logic [ADDRW-1:0] nid; // adding nid_next would improve timing slack
logic [$clog2(NEIGHBOURS_COUNT)-1:0] npos, npos_next;
// simulation state (once started, only stops after updating a cell)
enum {IDLE, NEXT_CELL, NEIGHBOURS, CURRENT_CELL, UPDATE_CELL} state, state_next;
always_comb begin
case(state)
IDLE: state_next = (start && !done) ? NEXT_CELL : IDLE;
NEXT_CELL: begin
if (done) begin
state_next = IDLE;
end else if (run) begin
state_next = NEIGHBOURS;
end else begin
state_next = NEXT_CELL;
end
end
NEIGHBOURS: state_next = (npos == NEIGHBOURS_COUNT-1) ? CURRENT_CELL : NEIGHBOURS;
CURRENT_CELL: state_next = UPDATE_CELL;
UPDATE_CELL: state_next = NEXT_CELL;
default: state_next = IDLE;
endcase
end
always_ff @(posedge clk) begin
state <= state_next;
cid <= cid_next;
npos <= npos_next;
end
// simulation calculations
always_comb begin
we = (state == UPDATE_CELL) ? 1 : 0; // enable writing when updating
id = cid;
npos_next = npos;
cid_next = cid;
w_status = 0;
case(state)
IDLE: begin
cid_next = 0;
nid = 0;
npos_next = 0;
end
NEIGHBOURS: begin
// map neighbour index onto ID
case (npos)
3'd0: nid = cid - (WORLD_WIDTH + 1);
3'd1: nid = cid - WORLD_WIDTH;
3'd2: nid = cid - (WORLD_WIDTH - 1);
3'd3: nid = cid - 1;
3'd4: nid = cid + 1;
3'd5: nid = cid + (WORLD_WIDTH - 1);
3'd6: nid = cid + WORLD_WIDTH;
3'd7: nid = cid + (WORLD_WIDTH + 1);
endcase
// because the life universe wraps we need to correct for possible under/overflow
if (nid >= CELL_COUNT) begin
if (npos <= 3'd3) begin
nid = CELL_COUNT - (2**ADDRW - nid);
end else begin
nid = nid - CELL_COUNT;
end
end
npos_next = (state == NEIGHBOURS) ? npos + 1 : 0;
id = nid;
end
UPDATE_CELL: begin
if (r_status == 1) begin // if cell is currently alive
w_status = (neighbours_alive == 4'd2 || neighbours_alive == 4'd3) ? 1 : 0;
end else begin // or dead
w_status = (neighbours_alive == 4'd3) ? 1 : 0;
end
// ready for next cell
cid_next = (cid < CELL_COUNT-1) ? cid + 1 : 0;
end
// consider adding a default case
endcase
end
always_ff @(posedge clk) begin
case(state)
IDLE: begin
neighbours_alive <= 0;
done <= 0;
end
// BRAM takes one cycle to read data, so we need to offset by one cycle
NEIGHBOURS: if (npos >= 3'd1) neighbours_alive <= neighbours_alive + {3'b0, r_status};
CURRENT_CELL: neighbours_alive <= neighbours_alive + {3'b0, r_status};
UPDATE_CELL: begin
// prepare for next cell
if (cid < CELL_COUNT-1) begin
neighbours_alive <= 0;
end else begin
done <= 1;
end
end
endcase
end
endmodule
Our module is a simple finite state machine (FSM) that reads all eight neighbours for each cell. This isn’t the most efficient design in terms of memory access: you could reuse the previous neighbour reads at the cost of slightly more complex logic. I may improve this module at a future date.
The cell under consideration is cid
(between 0 and 4,799). npos
is the position of a neighbour relative to the cell we’re dealing with (1 is top left, 2 top middle, 3 top right etc.), while nid
is position in the bitmap of the neighbour (again between 0 and 4,799). Further details on the operation of the Life module will be added in later drafts.
To exercise this module we have a test bench for Vivado: life_tb.sv. If you have a Xilinx Vivado installed, try using the test bench with the included test seed files. You can find instructions for running the simulation in the source README.
Top Life
Now we have our simulation design, we’re ready to draw it with a framebuffer. We can use the design from the Framebuffers post, with the dimensions set to 80x60 and the depth to 1 bit. However, as already discussed, we need a double buffer to run the simulation; we’ve used one larger buffer and an offset to select the front and back portions. The front buffer is read by the display controller and life simulation, and the back buffer is written to by the simulation.
Quick Aside: iCE40 BRAM Ports
The iCE40 FPGA has pseudo dual-port BRAM; you can read from one port, and write to the other, but you can’t read and write to the same port. This explains our somewhat curious buffer design.
The linebuffer and life simulation need to share read access to the front buffer. We use a simple arbitration technique: the life simulation gets to read the buffer in any line when the linebuffer is not reading the framebuffer.
- Xilinx XC7: xc7/top_life.sv
- Lattice iCE40: awaiting BRAM fix
The Xilinx version is shown below:
module top_life (
input wire logic clk_100m, // 100 MHz clock
input wire logic btn_rst, // reset button (active low)
output logic vga_hsync, // horizontal sync
output logic vga_vsync, // vertical sync
output logic [3:0] vga_r, // 4-bit VGA red
output logic [3:0] vga_g, // 4-bit VGA green
output logic [3:0] vga_b // 4-bit VGA blue
);
localparam GEN_FRAMES = 15; // each generation lasts this many frames
localparam SEED_FILE = "simple_life.mem"; // seed to initiate universe with
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen clock_640x480 (
.clk(clk_100m),
.rst(!btn_rst), // reset button is active low
.clk_pix,
.clk_locked
);
// display timings
localparam CORDW = 10; // screen coordinate width in bits
logic [CORDW-1:0] sx, sy;
logic hsync, vsync, de;
display_timings_480p timings_640x480 (
.clk_pix,
.rst(!clk_locked), // wait for clock lock
.sx,
.sy,
.hsync,
.vsync,
.de
);
// size of screen with and without blanking
localparam H_RES_FULL = 800;
localparam V_RES_FULL = 525;
localparam H_RES = 640;
localparam V_RES = 480;
// framebuffer (FB)
localparam FB_COUNT = 2; // double buffered
localparam FB_WIDTH = 80;
localparam FB_HEIGHT = 60;
localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;
localparam FB_DEPTH = FB_COUNT * FB_PIXELS;
localparam FB_ADDRW = $clog2(FB_DEPTH);
localparam FB_DATAW = 1; // colour bits per pixel
localparam FB_IMAGE = SEED_FILE;
logic fb_we;
logic [FB_ADDRW-1:0] fb_addr_read, fb_addr_write;
logic [FB_DATAW-1:0] pix_in, pix_out;
bram_sdp #(
.WIDTH(FB_DATAW),
.DEPTH(FB_DEPTH),
.INIT_F(FB_IMAGE)
) fb_inst (
.clk_read(clk_pix),
.clk_write(clk_pix),
.we(fb_we),
.addr_write(fb_addr_write),
.addr_read(fb_addr_read),
.data_in(pix_in),
.data_out(pix_out)
);
// update frame counter and choose front buffer
logic life_start; // trigger next calculation
logic front_buffer; // which buffer to draw the display from
logic [$clog2(GEN_FRAMES)-1:0] cnt_frames;
always_ff @(posedge clk_pix) begin
if (sy == V_RES_FULL-1 && sx == H_RES_FULL-1)
cnt_frames <= cnt_frames + 1;
if (cnt_frames == GEN_FRAMES - 1) begin
front_buffer <= ~front_buffer;
cnt_frames <= 0;
life_start <= 1;
end else life_start <= 0;
end
logic life_run;
logic [FB_ADDRW-1:0] cell_id;
life #(
.WORLD_WIDTH(FB_WIDTH),
.WORLD_HEIGHT(FB_HEIGHT),
.ADDRW(FB_ADDRW)
) life_sim (
.clk(clk_pix),
.start(life_start),
.run(life_run),
.id(cell_id),
.r_status(pix_out),
.w_status(pix_in),
.we(fb_we),
.done()
);
// linebuffer (LB)
localparam LB_SCALE_V = 8; // scale vertical drawing
localparam LB_SCALE_H = 8; // scale horizontal drawing
localparam LB_LEN = H_RES / LB_SCALE_H; // line length
localparam LB_WIDTH = 4; // bits per colour channel
// LB data in from FB
logic [FB_ADDRW-1:0] lb_fb_addr;
logic lb_en_in, lb_en_in_1; // allow for BRAM latency correction
logic [LB_WIDTH-1:0] lb_in_0, lb_in_1, lb_in_2;
// correct vertical scale: if scale is 0, set to 1
logic [$clog2(LB_SCALE_V+1):0] scale_v_cor;
always_comb scale_v_cor = (LB_SCALE_V == 0) ? 1 : LB_SCALE_V;
// count screen lines for vertical scaling - read when cnt_scale_v==0
logic [$clog2(LB_SCALE_V):0] cnt_scale_v;
always_ff @(posedge clk_pix) begin
if (sx == 0)
cnt_scale_v <= (cnt_scale_v == scale_v_cor-1) ? 0 : cnt_scale_v + 1;
if (sy == V_RES_FULL-1) cnt_scale_v <= 0;
end
logic [$clog2(FB_WIDTH)-1:0] fb_h_cnt; // counter for FB pixels on line
always_ff @(posedge clk_pix) begin
if (sy == V_RES_FULL-1 && sx == H_RES-1) lb_fb_addr <= 0;
// reset horizontal counter at the start of blanking on reading lines
if (cnt_scale_v == 0 && sx == H_RES) begin
if (lb_fb_addr < FB_PIXELS-1) fb_h_cnt <= 0; // read all pixels?
end
// read each pixel on FB line and write to LB
if (fb_h_cnt < FB_WIDTH) begin
lb_en_in <= 1;
fb_h_cnt <= fb_h_cnt + 1;
lb_fb_addr <= lb_fb_addr + 1;
end else begin
lb_en_in <= 0;
end
// enable LB data in with latency correction
lb_en_in_1 <= lb_en_in;
end
// LB data out to display
logic [LB_WIDTH-1:0] lb_out_0, lb_out_1, lb_out_2;
linebuffer #(
.WIDTH(LB_WIDTH),
.LEN(LB_LEN)
) lb_inst (
.clk_in(clk_pix),
.clk_out(clk_pix),
.en_in(lb_en_in_1), // correct for BRAM latency
.en_out(sy < V_RES && sx < H_RES),
.rst_in(sx == H_RES), // reset at start of horizontal blanking
.rst_out(sx == H_RES),
.scale(LB_SCALE_H),
.data_in_0(lb_in_0),
.data_in_1(lb_in_1),
.data_in_2(lb_in_2),
.data_out_0(lb_out_0),
.data_out_1(lb_out_1),
.data_out_2(lb_out_2)
);
// sim can run when linebuffer is not using framebuffer
always_comb life_run = (cnt_scale_v != 0); // OK if FB line < blanking length
// framebuffer address control
always_comb begin
fb_addr_read = (life_run) ? cell_id : lb_fb_addr;
if (front_buffer == 1) fb_addr_read = fb_addr_read + FB_PIXELS;
fb_addr_write = (front_buffer == 1) ? cell_id : cell_id + FB_PIXELS;
end
// read framebuffer pixels into LB
always_ff @(posedge clk_pix) begin
{lb_in_2, lb_in_1, lb_in_0} <= pix_out ? 12'hFC0 : 12'h115;
end
// VGA output
always_ff @(posedge clk_pix) begin
vga_hsync <= hsync;
vga_vsync <= vsync;
vga_r <= de ? lb_out_2 : 4'h0;
vga_g <= de ? lb_out_1 : 4'h0;
vga_b <= de ? lb_out_0 : 4'h0;
end
endmodule
And we need a constraints file for our design:
- Xilinx XC7: arty.xdc
- Lattice iCE40: awaiting BRAM fix
You should be able to build this design and see the simple example with a beehive, blinker, toad, and beacon patterns.
Building the Designs
In the Life on Screen section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.
If that works, try the Gosper glider gun by changing the seed file in top_life.sv
:
parameter SEED_FILE = "gosper_glider.mem"; // seed to initiate universe with
The Gosper pattern repeatedly generates small gliders that move down the screen. Our glider gun doesn’t continue indefinitely because our universe wraps around: the gliders interfere with the gun pattern, and the universe eventually breaks down, before settling into a pattern of simple oscillators and still lives. It’s straightforward to update the life module not to wrap around: imagine a world bordered by dead cells.
You can also change how fast the simulation runs by changing the value of GEN_FRAMES
(a value of 60 makes each generation last 1 second):
parameter GEN_FRAMES = 60; // each generation lasts this many frames
NB. The minimum value for GEN_FRAMES
is 2; if you set it to 1 the life simulation doesn’t have enough time to run correctly.
Explore
I hope you enjoyed this (draft) instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:
- Experiment with your own Game of Life seed files
- Implement a Universal Turing Machine using Life: take a look at Paul Rendell’s Attic to get started
More ideas to follow…
Next Time
In the next part, we’ll learn about drawing lines and triangles: the basis of most 2D and 3D graphics. The next part is expected to be released in late January 2021.
Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.