22 September 2020

Life on Screen

Welcome back to Exploring FPGA Graphics. In this post we’re going to use the designs we created in Framebuffers to experiment with Conway’s Game of Life.

In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how displays work, race the beam with Pong, animate starfields and sprites, paint Michelangelo’s David, simulate life with bitmaps, draw lines and shapes, and finally render simple 3D models. New to the series? Start with Exploring FPGA Graphics.

April 2021: This will post will be updated to use the new framebuffer shortly.

Updated 2021-02-10. Get in touch with @WillFlux or open an issue on GitHub.

He is Archimedes, Mick Jagger, Salvador Dalí, and Richard Feynman, all rolled into one.
The Guardian, John Horton Conway: the world’s most charismatic mathematician (2015)

Series Outline

  • Exploring FPGA Graphics - learn how displays work and animate simple shapes
  • FPGA Pong - race the beam to create the arcade classic
  • Hardware Sprites - fast, colourful, graphics with minimal resources
  • FPGA Ad Astra - demo with hardware sprites and animated starfields
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen (this post) - the screen comes alive with Conway’s Game of Life
  • Lines and Triangles - drawing lines and triangles with a framebuffer
  • FPGA Shapes - filling and animating shapes
  • Simple 3D - models and wireframe rendering (draft coming soon)

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing with these boards (FPGA type):

Source

The SystemVerilog designs featured in this series are available from the projf-explore git repo under the open-source MIT licence: build on them to your heart’s content. The rest of the blog content is subject to standard copyright restrictions: don’t republish it without permission.

Conway’s Life

John Conway was a remarkable mathematician, active in many fields, who sadly died in 2020. Conway is best known to the public for recreational mathematics with Martin Gardner in Scientific American. I remember playing Sprouts at school and trying to impress people by calculating the day of the week with the Doomsday rule. For now, I’ll be limiting myself to Game of Life, but I highly recommend learning more of Conway: The world’s most charismatic mathematician is an excellent place to start.

Game of Life first appeared in the Mathematical Games column in the October 1970 issue of Scientific American. Anyone can access the article at ibiblio.org, or you can see a scan of the original at JSTOR (requires login via academic institution or library).

The Rules of Life

The universe of the Game of Life is an infinite, two-dimensional, grid of cells. A cell is one of two states: dead or alive. Every cell interacts with its eight neighbours (those to the left, right, top, bottom, and the four diagonals).

The following rules determine the fate of a cell in the next generation:

  1. Survival: Every cell with two or three neighbours survives.
  2. Deaths:
    a. Every cell with four or more neighbours dies from overpopulation.
    b. Every cell with zero or one neighbours dies from isolation.
  3. Births: Every empty cell with exactly three neighbours comes to life.

To understand this, it helps to see some practical examples. Common patterns within the Life universe have been given names. The diagram below shows two patterns: “Beehive” and “Beacon”.

Conway’s Life Examples

Every live cell in the Beehive pattern has two or three neighbours: so, no cells die. No dead cells in Beehive have three neighbours: so, no cells come to life. This pattern is categorised as a “still life”, because it doesn’t change from one generation to the next.

The Beacon pattern is a bit more interesting: it oscillates between two states. Two dead cells in the centre of the pattern have three neighbours, so come to life. However, they then have four neighbours, so die the next generation due to overcrowding: this pattern repeats endlessly.

However, things start to get really interesting with slightly more complex patters. The following photo shows the Gosper glider gun, which repeatedly generates small repeating patterns called gliders.

Gosper Glider Gun

It’s even possible to construct logic gates using gliders, and ultimately a universal Turing machine. As this post is primarily about FPGA graphics, we’re not going to dig further into the esoteric possibilities of Conway’s Life, but check out Wikipedia’s article on Conway’s Game of Life.

Life in Hardware

Now we have a basic understanding of Conway’s Life we’re going to create a hardware implementation in Verilog.

We start by rearranging the rules to simplify the design:

  1. If a cell is alive
    a. if it has 2 or 3 neighbours, it’s alive next generation
    b. otherwise, it’s dead next generation
  2. If a cell is dead
    a. if it has 3 neighbours, it’s alive next generation
    b. otherwise, it’s dead next generation

Handily, a bitmap is a two-dimensional grid as required by Life, but our bitmap won’t be infinite! Instead, our bitmap wraps around top and bottom, and between left and right: this is known as toroidal array. Contrast this with a 2D map of Earth: which wraps around at the east and west, but not at the north and south.

Since a cell’s state depends on all those around it, we can’t calculate the new state of each cell sequentially in one bitmap. Instead, we have two buffers: one contains the current generation that we use to calculate the next generation. We swap buffers, repeating the process. This double buffering also allows us to safely update the display without risking any artefacts or tearing: the display is always being driven from a complete bitmap. Our life module is ignorant of the two buffers, this will be handled in the controlling top module, with reads and writes directed to different offsets.

Create a life module with the following design - life.sv:

module life #(
    parameter WORLD_WIDTH=6,
    parameter WORLD_HEIGHT=6,
    parameter ADDRW=$clog2(WORLD_WIDTH * WORLD_HEIGHT)
    ) (
    input  wire logic clk,
    input  wire logic start,
    input  wire logic run,
    output      logic [ADDRW-1:0] id,
    input  wire logic r_status,
    output      logic w_status,
    output      logic we,
    output      logic done
    );

    // simulation parameters
    localparam CELL_COUNT = WORLD_WIDTH * WORLD_HEIGHT;  // total number of cells
    localparam NEIGHBOURS_COUNT = 8;  // number of neighbours each cell has

    // number of alive neighbours (could be eight!)
    logic [$clog2(NEIGHBOURS_COUNT+1)-1:0] neighbours_alive;

    // internal cell and neighbour IDs
    logic [ADDRW-1:0] cid, cid_next;
    logic [ADDRW-1:0] nid;  // adding nid_next would improve timing slack
    logic [$clog2(NEIGHBOURS_COUNT)-1:0] npos, npos_next;

    // simulation state (once started, only stops after updating a cell)
    enum {IDLE, NEXT_CELL, NEIGHBOURS, CURRENT_CELL, UPDATE_CELL} state, state_next;
    always_comb begin
        case(state)
            IDLE: state_next = (start && !done) ? NEXT_CELL : IDLE;
            NEXT_CELL: begin
                if (done) begin
                    state_next = IDLE;
                end else if (run) begin
                    state_next = NEIGHBOURS;
                end else begin
                    state_next = NEXT_CELL;
                end
            end
            NEIGHBOURS: state_next = (npos == NEIGHBOURS_COUNT-1) ? CURRENT_CELL : NEIGHBOURS;
            CURRENT_CELL: state_next = UPDATE_CELL;
            UPDATE_CELL: state_next = NEXT_CELL;
            default: state_next = IDLE;
        endcase
    end

    always_ff @(posedge clk) begin
        state <= state_next;
        cid <= cid_next;
        npos <= npos_next;
    end

    // simulation calculations
    always_comb begin
        we = (state == UPDATE_CELL) ? 1 : 0;  // enable writing when updating
        id = cid;
        npos_next = npos;
        cid_next = cid;
        w_status = 0;

        case(state)
            IDLE: begin
                cid_next = 0;
                nid = 0;
                npos_next = 0;
            end
            NEIGHBOURS: begin
                // map neighbour index onto ID
                case (npos)
                    3'd0: nid = cid - (WORLD_WIDTH + 1);
                    3'd1: nid = cid - WORLD_WIDTH;
                    3'd2: nid = cid - (WORLD_WIDTH - 1);
                    3'd3: nid = cid - 1;
                    3'd4: nid = cid + 1;
                    3'd5: nid = cid + (WORLD_WIDTH - 1);
                    3'd6: nid = cid + WORLD_WIDTH;
                    3'd7: nid = cid + (WORLD_WIDTH + 1);
                endcase

                // because the life universe wraps we need to correct for possible under/overflow
                if (nid >= CELL_COUNT) begin
                    if (npos <= 3'd3) begin
                        nid = CELL_COUNT - (2**ADDRW - nid);
                    end else begin
                        nid = nid - CELL_COUNT;
                    end
                end

                npos_next = (state == NEIGHBOURS) ? npos + 1 : 0;
                id = nid;
            end
            UPDATE_CELL: begin
                if (r_status == 1) begin  // if cell is currently alive
                    w_status = (neighbours_alive == 4'd2 || neighbours_alive == 4'd3) ? 1 : 0;
                end else begin  // or dead
                    w_status = (neighbours_alive == 4'd3) ? 1 : 0;
                end

                // ready for next cell
                cid_next = (cid < CELL_COUNT-1) ? cid + 1 : 0;
            end
            // consider adding a default case
        endcase
    end

    always_ff @(posedge clk) begin
        case(state)
            IDLE: begin
                neighbours_alive <= 0;
                done <= 0;
            end
            // BRAM takes one cycle to read data, so we need to offset by one cycle
            NEIGHBOURS: if (npos >= 3'd1) neighbours_alive <= neighbours_alive + {3'b0, r_status};
            CURRENT_CELL: neighbours_alive <= neighbours_alive + {3'b0, r_status};
            UPDATE_CELL: begin
                // prepare for next cell
                if (cid < CELL_COUNT-1) begin
                    neighbours_alive <= 0;
                end else begin
                    done <= 1;
                end
            end
        endcase
    end
endmodule

Our module is a simple finite state machine (FSM) that reads all eight neighbours for each cell. This isn’t the most efficient design in terms of memory access: you could reuse the previous neighbour reads at the cost of slightly more complex logic. I may improve this module at a future date.

The cell under consideration is cid (between 0 and 4,799). npos is the position of a neighbour relative to the cell we’re dealing with (1 is top left, 2 top middle, 3 top right etc.), while nid is position in the bitmap of the neighbour (again between 0 and 4,799). Further details on the operation of the Life module will be added in later drafts.

To exercise this module we have a test bench for Vivado: life_tb.sv. If you have a Xilinx Vivado installed, try using the test bench with the included test seed files. You can find instructions for running the simulation in the source README.

Top Life

Now we have our simulation design, we’re ready to draw it with a framebuffer. We can use the design from the Framebuffers post, with the dimensions set to 80x60 and the depth to 1 bit. However, as already discussed, we need a double buffer to run the simulation; we’ve used one larger buffer and an offset to select the front and back portions. The front buffer is read by the display controller and life simulation, and the back buffer is written to by the simulation.

Quick Aside: iCE40 BRAM Ports
The iCE40 FPGA has pseudo dual-port BRAM; you can read from one port, and write to the other, but you can’t read and write to the same port. This explains our somewhat curious buffer design.

The linebuffer and life simulation need to share read access to the front buffer. We use a simple arbitration technique: the life simulation gets to read the buffer in any line when the linebuffer is not reading the framebuffer.

The Xilinx version is shown below:

module top_life (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    localparam GEN_FRAMES = 15;  // each generation lasts this many frames
    localparam SEED_FILE = "simple_life.mem";  // seed to initiate universe

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    display_timings_480p timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    // vertical blanking interval (will move to display_timings soon)
    logic vbi;
    always_comb vbi = (sy == V_RES && sx == 0);

    // framebuffer (FB)
    localparam FB_COUNT  = 2;  // double buffered
    localparam FB_WIDTH  = 80;
    localparam FB_HEIGHT = 60;
    localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;
    localparam FB_DEPTH  = FB_COUNT * FB_PIXELS;
    localparam FB_ADDRW  = $clog2(FB_DEPTH);
    localparam FB_DATAW  = 1;  // colour bits per pixel
    localparam FB_IMAGE  = SEED_FILE;

    logic fb_we;
    logic [FB_ADDRW-1:0] fb_addr_read, fb_addr_write;
    logic [FB_DATAW-1:0] pix_in;
    logic [FB_DATAW-1:0] pix_out;

    bram_sdp #(
        .WIDTH(FB_DATAW),
        .DEPTH(FB_DEPTH),
        .INIT_F(FB_IMAGE)
    ) fb_inst (
        .clk_read(clk_pix),
        .clk_write(clk_pix),
        .we(fb_we),
        .addr_write(fb_addr_write),
        .addr_read(fb_addr_read),
        .data_in(pix_in),
        .data_out(pix_out)
    );

    // update frame counter and choose front buffer
    logic life_start;    // trigger next calculation
    logic front_buffer;  // which buffer to draw the display from
    logic [$clog2(GEN_FRAMES)-1:0] cnt_frames;
    always_ff @(posedge clk_pix) begin
        if (sy == V_RES_FULL-1 && sx == H_RES_FULL-1)
            cnt_frames <= cnt_frames + 1;
        if (cnt_frames == GEN_FRAMES - 1) begin
            front_buffer <= ~front_buffer;
            cnt_frames <= 0;
            life_start <= 1;
        end else life_start <= 0;
    end

    logic life_run;
    logic [FB_ADDRW-1:0] cell_id;
    life #(
        .WORLD_WIDTH(FB_WIDTH),
        .WORLD_HEIGHT(FB_HEIGHT),
        .ADDRW(FB_ADDRW)
    ) life_sim (
        .clk(clk_pix),
        .start(life_start),
        .run(life_run),
        .id(cell_id),
        .r_status(pix_out),
        .w_status(pix_in),
        .we(fb_we),
        /* verilator lint_off PINCONNECTEMPTY */
        .done()
        /* verilator lint_on PINCONNECTEMPTY */
    );

    // linebuffer (LB)
    localparam LB_SCALE = 8;       // scale (horizontal and vertical)
    localparam LB_LEN = FB_WIDTH;  // line length matches framebuffer
    localparam LB_BPC = 4;         // bits per colour channel

    // LB output to display
    logic lb_en_out;
    always_comb lb_en_out = de;  // Use 'de' for entire frame

    // Load data from FB into LB
    logic [FB_ADDRW-1:0] lb_fb_addr;
    logic lb_data_req;  // LB requesting data
    logic [$clog2(LB_LEN+1)-1:0] cnt_h;  // count pixels in line to read
    always_ff @(posedge clk_pix) begin
        if (vbi) lb_fb_addr <= 0;   // new frame
        if (lb_data_req && sy != V_RES-1) begin  // load next line of data...
            cnt_h <= 0;                          // ...if not on last line
        end else if (cnt_h < LB_LEN) begin  // advance to start of next line
            cnt_h <= cnt_h + 1;
            lb_fb_addr <= lb_fb_addr == FB_PIXELS-1 ? 0 : lb_fb_addr + 1;
        end
    end

    // FB BRAM and colour pipeline adds two cycles of latency
    logic lb_en_in_1, lb_en_in;
    always_ff @(posedge clk_pix) begin
        lb_en_in_1 <= (cnt_h < LB_LEN);
        lb_en_in <= lb_en_in_1;
    end

    // LB colour channels
    logic [LB_BPC-1:0] lb_in_0, lb_in_1, lb_in_2;
    logic [LB_BPC-1:0] lb_out_0, lb_out_1, lb_out_2;

    linebuffer #(
        .WIDTH(LB_BPC),     // data width of each channel
        .LEN(LB_LEN),       // length of line
        .SCALE(LB_SCALE)    // scaling factor (>=1)
        ) lb_inst (
        .clk_in(clk_pix),       // input clock
        .clk_out(clk_pix),      // output clock
        .data_req(lb_data_req), // request input data (clk_in)
        .en_in(lb_en_in),       // enable input (clk_in)
        .en_out(lb_en_out),     // enable output (clk_out)
        .vbi,                   // start of vertical blanking interval (clk_out)
        .din_0(lb_in_0),        // data in (clk_in)
        .din_1(lb_in_1),
        .din_2(lb_in_2),
        .dout_0(lb_out_0),      // data out (clk_out)
        .dout_1(lb_out_1),
        .dout_2(lb_out_2)
    );

    // sim can run when linebuffer is not using framebuffer
    always_comb life_run = (sy != V_RES && sy[2:0] != 3'b111);

    // framebuffer address control
    always_comb begin
        fb_addr_read = (life_run) ? cell_id : lb_fb_addr;
        if (front_buffer == 1) fb_addr_read = fb_addr_read + FB_PIXELS;
        fb_addr_write = (front_buffer == 1) ? cell_id : cell_id + FB_PIXELS;
    end

    // read framebuffer pixels into LB
    always_ff @(posedge clk_pix) begin
        {lb_in_2, lb_in_1, lb_in_0} <= pix_out ? 12'hFC0 : 12'h115;
    end

    // LB output adds one cycle of latency - need to correct display signals
    logic hsync_1, vsync_1, lb_en_out_1;
    always_ff @(posedge clk_pix) begin
        hsync_1 <= hsync;
        vsync_1 <= vsync;
        lb_en_out_1 <= lb_en_out;
    end

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync_1;
        vga_vsync <= vsync_1;
        vga_r <= lb_en_out_1 ? lb_out_2 : 4'h0;
        vga_g <= lb_en_out_1 ? lb_out_1 : 4'h0;
        vga_b <= lb_en_out_1 ? lb_out_0 : 4'h0;
    end
endmodule

And we need a constraints file for our design:

  • Xilinx XC7: arty.xdc
  • Lattice iCE40: awaiting BRAM fix

You should be able to build this design and see the simple example with a beehive, blinker, toad, and beacon patterns.

Building the Designs
In the Life on Screen section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

If that works, try the Gosper glider gun by changing the seed file in top_life.sv:

parameter SEED_FILE = "gosper_glider.mem";  // seed to initiate universe with

The Gosper pattern repeatedly generates small gliders that move down the screen. Our glider gun doesn’t continue indefinitely because our universe wraps around: the gliders interfere with the gun pattern, and the universe eventually breaks down, before settling into a pattern of simple oscillators and still lives. It’s straightforward to update the life module not to wrap around: imagine a world bordered by dead cells.

You can also change how fast the simulation runs by changing the value of GEN_FRAMES (a value of 60 makes each generation last 1 second):

parameter GEN_FRAMES  = 60;  // each generation lasts this many frames

NB. The minimum value for GEN_FRAMES is 2; if you set it to 1 the life simulation doesn’t have enough time to run correctly.

Explore

I hope you enjoyed this (draft) instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:

  • Experiment with your own Game of Life seed files
  • Implement a Universal Turing Machine using Life: take a look at Paul Rendell’s Attic to get started

More ideas to follow…

Next Time

In part seven of this series, we’ll learn how to draw Lines and Triangles: the basis of 2D and 3D graphics.

Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.

©2021 Will Green, Project F