30 October 2020

Framebuffers

Welcome back to Exploring FPGA Graphics. In the previous two parts, we worked with sprites, but as graphics become more complex, a different approach is needed. Instead of drawing directly to the screen, we draw to a framebuffer, which is read by the display controller. This post provides an introduction to framebuffers and how to scale and fade them Wolfenstein 3D style. In the next part, we’ll use a framebuffer to visualize a simulation of life.

In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020. New to the series? Start with Exploring FPGA Graphics.

Designs for iCEBreaker are coming once I sort out a BRAM issue.

Updated 2020-11-25. Get in touch with @WillFlux or open an issue on GitHub.

Series Outline

  • Exploring FPGA Graphics - learn how displays work and animate simple shapes
  • FPGA Pong - race the beam to create the arcade classic
  • Hardware Sprites - fast, colourful, graphics with minimal resources
  • FPGA Ad Astra - demo with hardware sprites and animated starfields
  • Framebuffers (this post) - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life

More parts to follow.

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. You should be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing with these boards:

Source

The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.

Framebuffer

A framebuffer is an in-memory bitmap that drives pixels on screen. When you write to a memory location within the framebuffer, the corresponding pixel will change on screen. Using a framebuffer provides two big benefits: we’re free to create sophisticated graphics using whatever technique we like, and the setting of pixel colour is separated from the process driving the screen. The flexibility of a framebuffer comes at the cost of increased memory storage and latency.

A Small Buffer

Framebuffers require memory to hold the whole frame. To keep things simple, we want to be able to store our framebuffer in FPGA block memory (BRAM), rather than relying on external ram. The iCE40 FPGA on the iCEBreaker board has 120 Kb of block memory, so that’s what we’ll target for this post. You can learn more about block ram in FPGA Memory Types.

If we divide our 640x480 screen by four in both dimensions, we get 160x120. 160x120 is 19,200 pixels, so a monochrome framebuffer requires 18.75 kilobits of memory (19,200 = 18.75 * 1024).

A Small Bitmap

Michelangelo’s David

A small monochrome framebuffer calls for a striking image: I’ve chosen David by Michelangelo.

The version on the right is the original from Wikipedia; it has 64 shades of grey. I created the middle image by reducing the original to 16 colours using img2fmem with no dithering (we will discuss this tool later in the post). The monochrome image on the left was created by Gerbrant using Floyd-Steinberg dithering.

A Corner of the Screen

Let’s create a top module to output our framebuffer loaded with the monochrome image of David. The Xilinx version uses a new BRAM module, bram_sdp.sv, but is otherwise similar to other top designs we’ve used in previous parts.

top_david_v1 memory usage: 160x120x1: 1x36Kb BRAM on the XC7.

The Xilinx version is shown below:

module top_david_v1 (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    display_timings timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync(vga_hsync),
        .vsync(vga_vsync),
        .de()
    );

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    // framebuffer
    localparam FB_WIDTH  = 160;
    localparam FB_HEIGHT = 120;
    localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;
    localparam FB_ADDRW  = $clog2(FB_PIXELS);
    localparam FB_DATAW  = 1;  // colour bits per pixel
    localparam FB_IMAGE  = "david_1bit.mem";

    logic [FB_ADDRW-1:0] fb_addr_read;
    logic [FB_DATAW-1:0] colr;

    bram_sdp #(
        .WIDTH(FB_DATAW),
        .DEPTH(FB_PIXELS),
        .INIT_F(FB_IMAGE)
    ) framebuffer (
        .clk_read(clk_pix),
        .clk_write(clk_pix),
        .we(0),
        .addr_write(),
        .addr_read(fb_addr_read),
        .data_in(),
        .data_out(colr)
    );

    // flag when framebuffer is active
    logic fb_active;
    always_comb fb_active = (sy < FB_HEIGHT && sx < FB_WIDTH);

    always_ff @(posedge clk_pix) begin
        if (sy == V_RES_FULL-1 && sx == H_RES_FULL-1) begin
            fb_addr_read <= 0;  // reset address at end of frame
        end else if (fb_active) begin
            fb_addr_read <= fb_addr_read + 1;
        end
    end

    // VGA output
    always_comb begin
        vga_r = (fb_active && colr) ? 4'hF : 4'h0;
        vga_g = (fb_active && colr) ? 4'hF : 4'h0;
        vga_b = (fb_active && colr) ? 4'hF : 4'h0;
    end
endmodule

Build this design and program your board. You should see the dithered image of David looking at you from the top left of your screen.

Building the Designs
In the Framebuffers section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

Casting Shade

We can increase the bits assigned to each pixel to support more colours, or in the case of this image of David, more shades. Our video output is 12-bit, which means there are 16 possible shades of grey.

As with the hedgehog graphic in Hardware Sprites, we load the palette from a file, david_palette.mem:

FFF EEE DDD CCC BBB AAA 999 888 777 666 555 444 333 222 111 000

Greyscale

Build the updated top module with 4-bit greyscale David:

top_david_v2 memory usage: 160x120x4: 4x36Kb BRAM on the XC7.

Warm Tones

Because the palette is separate from the image, we can quickly change it. In top_david_v2.sv, update localparam FB_PALETTE to reference david_palette_warm.mem and rebuild.

The warm palette looks like this:

FED EDC DCB CBA BA9 A98 987 876 765 654 543 432 321 210 100 000

Warm shades

There is also an inverted palette, david_palette_invert.mem, or you can create your own.

Quick Aside: Palettes
Wikipedia has a lovely List of color palettes for different systems.

Framebuffer Scaling

Our framebuffer is too small to fill a 640x480 screen, and modern monitors don’t support lower resolutions. To make our framebuffer fill the screen, we need to scale it up.

We’ve made our framebuffer an integer divisor of our display resolution, so scaling ought to be simple. However, practical scaling isn’t quite as simple as it first appears. For example, given our display coordinates sx and sy you could calculate the current framebuffer read address with:

always_comb fb_read_addr = sx / 4 + (sy / 4) * 160;  // ?!

This has at least three problems:

  1. It doesn’t account for memory latency: it takes one or more cycles to read BRAM
  2. It wastes memory bandwidth: every value is read 16 times per frame!
  3. It uses multiplication, which takes up valuable (probably DSP) logic

Using a linebuffer allows us to avoid all three of these problems. The linebuffer uses three simple dual-port BRAMs, one for each colour channel (red, green, blue). We copy a single 160-pixel line of the framebuffer into the linebuffer, then we read this out for four display lines before going back to the framebuffer for the next line. By making efficient use of memory bandwidth, we can better share memory with other systems, such as a CPU.

Add a linebuffer module - linebuffer.sv:

module linebuffer #(
    parameter WIDTH=8,   // data width of each channel
    parameter LEN=2048,  // length of line
    parameter SCALEW=6   // horizontal scaling width
    ) (
    input  wire logic clk_in,
    input  wire logic clk_out,
    input  wire logic en_in,
    input  wire logic en_out,
    input  wire logic rst_in,
    input  wire logic rst_out,
    input  wire logic [SCALEW-1:0] scale,
    input  wire logic [WIDTH-1:0] data_in_0, data_in_1, data_in_2,
    output      logic [WIDTH-1:0] data_out_0, data_out_1, data_out_2
    );

    logic [$clog2(LEN)-1:0] addr_in, addr_out;
    logic [SCALEW-1:0] cnt_scale;

    // correct scale: if scale is 0, set to 1
    logic [SCALEW-1:0] scale_cor;
    always_comb scale_cor = (scale == 0) ? 1 : scale;

    always_ff @(posedge clk_in) begin
        if (en_in) addr_in <= (addr_in == LEN-1) ? 0 : addr_in + 1;
        if (rst_in) addr_in <= 0;  // reset takes precedence
    end

    always_ff @(posedge clk_out) begin
        if (en_out) begin
            cnt_scale <= (cnt_scale == scale_cor-1) ? 0 : cnt_scale + 1;
            if (cnt_scale == scale_cor-1) addr_out <= (addr_out == LEN-1) ? 0 : addr_out + 1;
        end
        if (rst_out) begin  // reset takes precedence
            addr_out <= 0;
            cnt_scale <= 0;
        end
    end

    // channel 0
    bram_sdp #(.WIDTH(WIDTH), .DEPTH(LEN)) ch0 (
        .clk_write(clk_in),
        .clk_read(clk_out),
        .we(en_in),
        .addr_write(addr_in),
        .addr_read(addr_out),
        .data_in(data_in_0),
        .data_out(data_out_0)
    );

    // channel 1
    bram_sdp #(.WIDTH(WIDTH), .DEPTH(LEN)) ch1 (
        .clk_write(clk_in),
        .clk_read(clk_out),
        .we(en_in),
        .addr_write(addr_in),
        .addr_read(addr_out),
        .data_in(data_in_1),
        .data_out(data_out_1)
    );

    // channel 2
    bram_sdp #(.WIDTH(WIDTH), .DEPTH(LEN)) ch2 (
        .clk_write(clk_in),
        .clk_read(clk_out),
        .we(en_in),
        .addr_write(addr_in),
        .addr_read(addr_out),
        .data_in(data_in_2),
        .data_out(data_out_2)
    );
endmodule

The linebuffer module handles horizontal scaling, but not vertical scaling. Vertical scaling requires the repetition of lines; this is dealt with by the top module. We use three separate BRAM instances because colours are output separately to the display and 4-bit-wide buffers are a better fit for BRAM hardware than a single 12-wide buffer.

Quick Aside: Clock Domain Crossing
The linebuffer module has another significant benefit: it allows the pixel clock to be different from the rest of the design. For example, you could draw on the framebuffer at 100 MHz (linebuffer data_in) and display it on a 720p60 screen with a 74.25 MHz pixel clock (linebuffer data_out). CDC is usually achieved with a FIFO, but the linebuffer does it while providing efficient scaling.

Top Scaling

To drive the linebuffer we need to enable and reset its data inputs and outputs at appropriate times. This extract from the top module (source link below listing), shows scaling up by four times:

    // linebuffer (LB)
    localparam LB_SCALE_V = 4;                // factor to scale vertical drawing
    localparam LB_SCALE_H = 4;                // factor to scale horizontal drawing
    localparam LB_LEN  = 640 / LB_SCALE_H;    // line length
    localparam LB_WIDTH = 4;                  // bits per colour channel

    // LB data in from FB
    logic lb_en_in, lb_en_in_1;  // allow for BRAM latency correction
    logic [LB_WIDTH-1:0] lb_in_0, lb_in_1, lb_in_2;

    // correct vertical scale: if scale is 0, set to 1
    logic [$clog2(LB_SCALE_V+1):0] scale_v_cor;
    always_comb scale_v_cor = (LB_SCALE_V == 0) ? 1 : LB_SCALE_V;

    // count screen lines for vertical scaling - read when cnt_scale_v==0
    logic [$clog2(LB_SCALE_V):0] cnt_scale_v;
    always_ff @(posedge clk_pix) begin
        if (sx == 0) cnt_scale_v <= (cnt_scale_v == scale_v_cor-1) ? 0 : cnt_scale_v + 1;
        if (sy == V_RES_FULL-1) cnt_scale_v <= 0;  // reset count in final blanking line
    end

    logic [$clog2(FB_WIDTH)-1:0] fb_h_cnt;  // counter for FB pixels on line
    always_ff @(posedge clk_pix) begin
        if (sy == V_RES_FULL-1 && sx == H_RES-1) fb_addr_read <= 0;  // reset on last frame line

        // reset the horizontal counter at the start of blanking on reading lines
        if (cnt_scale_v == 0 && sx == H_RES) begin
            if (fb_addr_read < FB_PIXELS-1) fb_h_cnt <= 0;  // if we've not read all pixels
        end

        // read each pixel on FB line and write to LB
        if (fb_h_cnt < FB_WIDTH) begin
            lb_en_in <= 1;
            fb_h_cnt <= fb_h_cnt + 1;
            fb_addr_read <= fb_addr_read + 1;
        end else begin
            lb_en_in <= 0;
        end

        // enable LB data in with latency correction
        lb_en_in_1 <= lb_en_in;
    end

    // LB data out to display
    logic [LB_WIDTH-1:0] lb_out_0, lb_out_1, lb_out_2;

    linebuffer #(
        .WIDTH(LB_WIDTH),
        .LEN(LB_LEN)
        ) lb_inst (
        .clk_in(clk_pix),
        .clk_out(clk_pix),
        .en_in(lb_en_in_1),  // correct for BRAM latency
        .en_out(sy < V_RES && sx < H_RES),
        .rst_in(sx == H_RES),  // reset at start of horizontal blanking
        .rst_out(sx == H_RES),
        .scale(LB_SCALE_H),
        .data_in_0(lb_in_0),
        .data_in_1(lb_in_1),
        .data_in_2(lb_in_2),
        .data_out_0(lb_out_0),
        .data_out_1(lb_out_1),
        .data_out_2(lb_out_2)
    );

Build the updated top module with scaled David:

top_david memory usage: 160x120x4 + linebuffer: 4.5 x 36Kb BRAMs on the XC7.

Quick Aside: BRAM Optimization
You’d expect the linebuffer to use 1.5 BRAMs (36Kb), but because all three colour channels are the same for the greyscale palette, the linebuffer is reduced from 3x18Kb BRAMs to 1x18Kb. The warm palette does use 1.5 BRAMs for a total of 5.5.

The Blue Planet

This next section is for the Arty (XC7) only, sorry. The iCEBreaker doesn’t have enough BRAM for larger framebuffers. If there’s enough interest, I’ll consider creating a version using SPRAM.

Earth

Our design is nice and flexible, so it’s not hard to adapt for a larger, more colourful image. If you’ve got an Arty board, or FPGA with at least 640 Kb of BRAM, try this image of Earth (courtesy of NASA):

There are three changes required to the top module to display the Earth.

  1. We increase the framebuffer size (320x240) and depth (6 bits for 64 colours):
    // framebuffer
    localparam FB_WIDTH   = 320;
    localparam FB_HEIGHT  = 240;
    localparam FB_PIXELS  = FB_WIDTH * FB_HEIGHT;
    localparam FB_ADDRW   = $clog2(FB_PIXELS);
    localparam FB_DATAW   = 6;  // colour bits per pixel
    localparam FB_IMAGE   = "earth.mem";
    localparam FB_PALETTE = "earth_palette.mem";
  1. We reduce the linebuffer scaling factor (from 4 to 2):
    // linebuffer
    localparam LB_SCALE_V = 2;                // factor to scale vertical drawing
    localparam LB_SCALE_H = 2;                // factor to scale horizontal drawing
    localparam LB_LINE  = 640 / LB_SCALE_H;   // line length
    localparam LB_WIDTH = 4;                  // bits per colour channel
  1. And we expand the colour lookup table to 64 entries:
    // colour lookup table (CLUT)
    logic [11:0] clut [64];  // 64 x 12-bit colour palette entries
    initial begin
        $display("Loading palette '%s' into CLUT.", FB_PALETTE);
        $readmemh(FB_PALETTE, clut);  // load palette into CLUT
    end

Try building the top_earth module:

top_earth memory usage: 320x240x6 + linebuffer: 19.5 x 36Kb BRAMs on the XC7.

Creating Your Own Images

You can easily create your own images using img2fmem. The script is written in Python and uses the Pillow image library to perform the conversion. You can find it in the Project F FPGA Tools repo. Make sure your images are exactly the same size as the framebuffer you’re using.

To convert an image called acme.png to 4-bit colour with 12-bit palette for use with $readmemh:

img2fmem.py acme.png 4 mem 12

For details on installation and command line options, see the img2fmem README.

Fizzle Out

So far we’ve read from our framebuffer, but we haven’t updated our display in any way. In FPGA Ad Astra, we used linear feedback shift registers (LFSRs) to create animated starfields. LFSRs have another graphical use: creating random dissolve effects, as used in Wolfenstein 3D and explained by Fabien Sanglard in his excellent post: Fizzlefade.

We can use LFSRs to dissolve our image of David using a mask. Our mask is another bitmap, but with only 1 bit per pixel. If a mask pixel is set to 0, then the framebuffer pixel colour is shown, otherwise we show red. By using a mask, rather than altering the original bitmap, we can fade the image back in again or apply other effects.

Xilinx version shown below:

module top_david_fizzle (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic de;
    display_timings timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync(vga_hsync),
        .vsync(vga_vsync),
        .de
    );

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    // framebuffer (FB)
    localparam FB_WIDTH   = 160;
    localparam FB_HEIGHT  = 120;
    localparam FB_PIXELS  = FB_WIDTH * FB_HEIGHT;
    localparam FB_ADDRW   = $clog2(FB_PIXELS);
    localparam FB_DATAW   = 4;  // colour bits per pixel
    localparam FB_IMAGE   = "david.mem";
    localparam FB_PALETTE = "david_palette.mem";

    logic [FB_ADDRW-1:0] fb_addr_read;
    logic [FB_DATAW-1:0] colr_idx;

    bram_sdp #(
        .WIDTH(FB_DATAW),
        .DEPTH(FB_PIXELS),
        .INIT_F(FB_IMAGE)
    ) fb_inst (
        .clk_read(clk_pix),
        .clk_write(clk_pix),
        .we(0),
        .addr_write(),
        .addr_read(fb_addr_read),
        .data_in(),
        .data_out(colr_idx)
    );

    // fizzlebuffer (FZ)
    logic [FB_ADDRW-1:0] fz_addr_write;
    logic fz_en_in, fz_en_out;
    logic fz_we;

    bram_sdp #(
        .WIDTH(1),
        .DEPTH(FB_PIXELS),
        .INIT_F("")
    ) fz_inst (
        .clk_read(clk_pix),
        .clk_write(clk_pix),
        .we(fz_we),
        .addr_write(fz_addr_write),
        .addr_read(fb_addr_read),  // share read address with framebuffer
        .data_in(fz_en_in),
        .data_out(fz_en_out)
    );

    // 15-bit LFSR (160x120 < 2^15)
    logic lfsr_en;
    logic [14:0] lfsr;
    lfsr #(
        .LEN(15),
        .TAPS(15'b110000000000000)
    ) lsfr_fz (
        .clk(clk_pix),
        .rst(!clk_locked),
        .en(lfsr_en),
        .sreg(lfsr)
    );

    localparam FADE_WAIT = 240;   // wait for 240 frames before fading
    localparam FADE_RATE = 3200;  // every 3200 pixel clocks update LFSR
    logic [$clog2(FADE_WAIT)-1:0] cnt_fade_wait;
    logic [$clog2(FADE_RATE)-1:0] cnt_fade_rate;
    always_ff @(posedge clk_pix) begin
        if (sy == V_RES && sx == H_RES) begin  // start of blanking
            cnt_fade_wait <= (cnt_fade_wait != FADE_WAIT-1) ? cnt_fade_wait + 1 : cnt_fade_wait;
        end
        if (cnt_fade_wait == FADE_WAIT-1) begin
            cnt_fade_rate <= (cnt_fade_rate == FADE_RATE) ? 0 : cnt_fade_rate + 1;
        end
    end

    always_comb begin
        fz_addr_write = lfsr;
        if (cnt_fade_rate == FADE_RATE) begin
            lfsr_en = 1;
            fz_we = 1;
            fz_en_in = 1;
        end else begin
            lfsr_en = 0;
            fz_we = 0;
            fz_en_in = 0;
        end
    end

    // linebuffer (LB)
    localparam LB_SCALE_V = 4;                // factor to scale vertical drawing
    localparam LB_SCALE_H = 4;                // factor to scale horizontal drawing
    localparam LB_LEN  = 640 / LB_SCALE_H;    // line length
    localparam LB_WIDTH = 4;                  // bits per colour channel

    // LB data in from FB
    logic lb_en_in, lb_en_in_1;  // allow for BRAM latency correction
    logic [LB_WIDTH-1:0] lb_in_0, lb_in_1, lb_in_2;

    // correct vertical scale: if scale is 0, set to 1
    logic [$clog2(LB_SCALE_V+1):0] scale_v_cor;
    always_comb scale_v_cor = (LB_SCALE_V == 0) ? 1 : LB_SCALE_V;

    // count screen lines for vertical scaling - read when cnt_scale_v==0
    logic [$clog2(LB_SCALE_V):0] cnt_scale_v;
    always_ff @(posedge clk_pix) begin
        if (sx == 0) cnt_scale_v <= (cnt_scale_v == scale_v_cor-1) ? 0 : cnt_scale_v + 1;
        if (sy == V_RES_FULL-1) cnt_scale_v <= 0;  // reset count in final blanking line
    end

    logic [$clog2(FB_WIDTH)-1:0] fb_h_cnt;  // counter for FB pixels on line
    always_ff @(posedge clk_pix) begin
        if (sy == V_RES_FULL-1 && sx == H_RES-1) fb_addr_read <= 0;  // reset on last frame line

        // reset the horizontal counter at the start of blanking on reading lines
        if (cnt_scale_v == 0 && sx == H_RES) begin
            if (fb_addr_read < FB_PIXELS-1) fb_h_cnt <= 0;  // if we've not read all pixels
        end

        // read each pixel on FB line and write to LB
        if (fb_h_cnt < FB_WIDTH) begin
            lb_en_in <= 1;
            fb_h_cnt <= fb_h_cnt + 1;
            fb_addr_read <= fb_addr_read + 1;
        end else begin
            lb_en_in <= 0;
        end

        // enable LB data in with latency correction
        lb_en_in_1 <= lb_en_in;
    end

    // LB data out to display
    logic [LB_WIDTH-1:0] lb_out_0, lb_out_1, lb_out_2;

    linebuffer #(
        .WIDTH(LB_WIDTH),
        .LEN(LB_LEN)
        ) lb_inst (
        .clk_in(clk_pix),
        .clk_out(clk_pix),
        .en_in(lb_en_in_1),  // correct for BRAM latency
        .en_out(sy < V_RES && sx < H_RES),
        .rst_in(sx == H_RES),  // reset at start of horizontal blanking
        .rst_out(sx == H_RES),
        .scale(LB_SCALE_H),
        .data_in_0(lb_in_0),
        .data_in_1(lb_in_1),
        .data_in_2(lb_in_2),
        .data_out_0(lb_out_0),
        .data_out_1(lb_out_1),
        .data_out_2(lb_out_2)
    );

    // colour lookup table (CLUT)
    logic [11:0] clut [16];  // 16 x 12-bit colour palette entries
    initial begin
        $display("Loading palette '%s' into CLUT.", FB_PALETTE);
        $readmemh(FB_PALETTE, clut);  // load palette into CLUT
    end

    // map colour index to palette using CLUT and read into LB
    always_ff @(posedge clk_pix) begin
        {lb_in_2, lb_in_1, lb_in_0} <= !fz_en_out ? clut[colr_idx] : 12'hA00;
    end

    // VGA output
    always_comb begin
        vga_r = de ? lb_out_2 : 4'h0;
        vga_g = de ? lb_out_1 : 4'h0;
        vga_b = de ? lb_out_0 : 4'h0;
    end
endmodule

You may have noticed that the top-left pixel doesn’t change; this is because the LFSR produces every value except zero. To fix this, add some logic to specifically update the pixel at address zero when fading.

Explore

I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:

  • Load your own picture into the framebuffer using img2mem
  • Cross-fade between two images
  • Fade an image by adjusting the intensity of the palette entries
  • Detect edges and highlight them in another colour

Next Time

In the next part, we’ll implement Conway’s Game of Life and sine waves in Life on Screen.

©2020 Will Green, Project F