Welcome back to Exploring FPGA Graphics. In the previous two parts, we worked with sprites, but as graphics become more complex, a different approach is needed. Instead of drawing directly to the screen, we draw to a framebuffer, which is read out when required by the screen. This post provides an introduction to framebuffers and how to scale them. We’ll also learn how to fizzlefade graphics Wolfenstein 3D style. In the next part, we’ll use a framebuffer to visualize a simulation of life.
In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020 and 2021. New to the series? Start with Exploring FPGA Graphics.
Updated 2021-01-19. Get in touch with @WillFlux or open an issue on GitHub.
Series Outline
- Exploring FPGA Graphics - learn how displays work and animate simple shapes
- FPGA Pong - race the beam to create the arcade classic
- Hardware Sprites - fast, colourful, graphics with minimal resources
- FPGA Ad Astra - demo with hardware sprites and animated starfields
- Framebuffers (this post) - driving the display from a bitmap in memory
- Life on Screen - the screen comes alive with Conway’s Game of Life
More parts to follow.
Requirements
For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will do. You should be comfortable with programming your FPGA board and reasonably familiar with Verilog.
We’ll be demoing with these boards:
- iCEBreaker (Lattice iCE40) with 12-Bit DVI Pmod
- Digilent Arty A7-35T (Xilinx Artix-7) with Pmod VGA
Source
The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.
Framebuffer
A framebuffer is an in-memory bitmap that drives pixels on screen. When you write to a memory location within the framebuffer, the corresponding pixel will change on screen. Using a framebuffer provides two big benefits: we’re free to create sophisticated graphics using whatever technique we like, and the setting of pixel colour is separated from the process driving the screen. The flexibility of a framebuffer comes at the cost of increased memory storage and latency.
A Small Buffer
A framebuffer requires enough memory to hold the complete frame. To keep things simple, we’re going to store our framebuffer in internal FPGA block memory (BRAM). The iCEBreaker’s iCE40 FPGA has thirty 4kb BRAMs, for 120 kb in total, so that’s what we’ll target. You can learn more about block ram in FPGA Memory Types.
If we divide our 640x480 screen by four in both dimensions, we get 160x120. 160x120 is 19,200 pixels, so a monochrome framebuffer requires 18.75 kilobits of memory (19,200 = 18.75 * 1024).
We’ll start by wiring up our usual display timings with a framebuffer based on BRAM, then draw a simple horizontal line in it to confirm everything is working.
- Xilinx XC7: xc7/top_line.sv
- Lattice iCE40: ice40/top_line.sv
The Xilinx version is shown below:
module top_line (
input wire logic clk_100m, // 100 MHz clock
input wire logic btn_rst, // reset button (active low)
output logic vga_hsync, // horizontal sync
output logic vga_vsync, // vertical sync
output logic [3:0] vga_r, // 4-bit VGA red
output logic [3:0] vga_g, // 4-bit VGA green
output logic [3:0] vga_b // 4-bit VGA blue
);
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen clock_640x480 (
.clk(clk_100m),
.rst(!btn_rst), // reset button is active low
.clk_pix,
.clk_locked
);
// display timings
localparam CORDW = 10; // screen coordinate width in bits
logic [CORDW-1:0] sx, sy;
logic hsync, vsync;
display_timings_480p timings_640x480 (
.clk_pix,
.rst(!clk_locked), // wait for clock lock
.sx,
.sy,
.hsync,
.vsync,
.de()
);
// size of screen with and without blanking
localparam H_RES_FULL = 800;
localparam V_RES_FULL = 525;
localparam H_RES = 640;
localparam V_RES = 480;
// framebuffer
localparam FB_WIDTH = 160;
localparam FB_HEIGHT = 120;
localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;
localparam FB_ADDRW = $clog2(FB_PIXELS);
localparam FB_DATAW = 1; // colour bits per pixel
logic fb_we;
logic [FB_ADDRW-1:0] fb_addr_write, fb_addr_read;
logic [FB_DATAW-1:0] fb_colr_write, fb_colr_read;
bram_sdp #(
.WIDTH(FB_DATAW),
.DEPTH(FB_PIXELS)
) framebuffer (
.clk_write(clk_pix),
.clk_read(clk_pix),
.we(fb_we),
.addr_write(fb_addr_write),
.addr_read(fb_addr_read),
.data_in(fb_colr_write),
.data_out(fb_colr_read)
);
// draw a horizontal line at the top of the framebuffer
always @(posedge clk_pix) begin
if (sy >= V_RES) begin // draw in blanking interval
if (fb_we == 0 && fb_addr_write != FB_WIDTH-1) begin
fb_colr_write <= 1;
fb_we <= 1;
end else if (fb_addr_write != FB_WIDTH-1) begin
fb_addr_write <= fb_addr_write + 1;
end else begin
fb_colr_write <= 0;
fb_we <= 0;
end
end
end
// determine when framebuffer is active for reading
logic fb_active;
always_comb fb_active = (sy < FB_HEIGHT && sx < FB_WIDTH);
// calculate framebuffer read address for output to display
always_ff @(posedge clk_pix) begin
if (sy == V_RES_FULL-1 && sx == H_RES_FULL-1) begin
fb_addr_read <= 0; // reset address at end of frame
end else if (fb_active) begin
fb_addr_read <= fb_addr_read + 1;
end
end
// VGA output
always_ff @(posedge clk_pix) begin
vga_hsync <= hsync;
vga_vsync <= vsync;
vga_r <= (fb_active && fb_colr_read) ? 4'hF : 4'h0;
vga_g <= (fb_active && fb_colr_read) ? 4'hF : 4'h0;
vga_b <= (fb_active && fb_colr_read) ? 4'hF : 4'h0;
end
endmodule
Build this design and you should see a white horizontal line at the top left of your screen. It’s only 160 pixels long, so it won’t go right across the screen. If you’re using a VGA monitor you may need to tweak your monitor settings to ensure the line is visible.
Building the Designs
In the Framebuffers section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.
A Small Bitmap
Now our framebuffer wired up correctly let’s load something into it. A small monochrome framebuffer calls for a striking image: I’ve chosen David by Michelangelo.
The version on the right is the original from Wikipedia; it has 64 shades of grey. I created the middle image by reducing the original to 16 colours using img2fmem with no dithering (we will discuss this tool later in the post). The monochrome image on the left was created by Gerbrant using Floyd-Steinberg dithering.
Create a new top module to load our monochrome image of David
- Xilinx XC7: xc7/top_david_v1.sv
- Lattice iCE40: ice40/top_david_v1.sv
top_david_v1 BRAM usage: 1x36Kb on XC7 and 10x4Kb on iCE40
The iCE40 version is shown below:
module top_david_v1 (
input wire logic clk_12m, // 12 MHz clock
input wire logic btn_rst, // reset button (active high)
output logic dvi_clk, // DVI pixel clock
output logic dvi_hsync, // DVI horizontal sync
output logic dvi_vsync, // DVI vertical sync
output logic dvi_de, // DVI data enable
output logic [3:0] dvi_r, // 4-bit DVI red
output logic [3:0] dvi_g, // 4-bit DVI green
output logic [3:0] dvi_b // 4-bit DVI blue
);
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen clock_640x480 (
.clk(clk_12m),
.rst(btn_rst),
.clk_pix,
.clk_locked
);
// display timings
localparam CORDW = 10; // screen coordinate width in bits
logic [CORDW-1:0] sx, sy;
logic hsync, vsync, de;
display_timings_480p timings_640x480 (
.clk_pix,
.rst(!clk_locked), // wait for clock lock
.sx,
.sy,
.hsync,
.vsync,
.de
);
// size of screen with and without blanking
localparam H_RES_FULL = 800;
localparam V_RES_FULL = 525;
localparam H_RES = 640;
localparam V_RES = 480;
// framebuffer
localparam FB_WIDTH = 160;
localparam FB_HEIGHT = 120;
localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;
localparam FB_ADDRW = $clog2(FB_PIXELS);
localparam FB_DATAW = 1; // colour bits per pixel
localparam FB_IMAGE = "../res/david/david_1bit.mem";
logic fb_we;
logic [FB_ADDRW-1:0] fb_addr_write, fb_addr_read;
logic [FB_DATAW-1:0] fb_colr_write, fb_colr_read;
bram_sdp #(
.WIDTH(FB_DATAW),
.DEPTH(FB_PIXELS),
.INIT_F(FB_IMAGE)
) framebuffer (
.clk_write(clk_pix),
.clk_read(clk_pix),
.we(fb_we),
.addr_write(fb_addr_write),
.addr_read(fb_addr_read),
.data_in(fb_colr_write),
.data_out(fb_colr_read)
);
// draw a horizontal line at the top of the framebuffer
always @(posedge clk_pix) begin
if (sy >= V_RES) begin // draw in blanking interval
if (fb_we == 0 && fb_addr_write != FB_WIDTH-1) begin
fb_colr_write <= 1;
fb_we <= 1;
end else if (fb_addr_write != FB_WIDTH-1) begin
fb_addr_write <= fb_addr_write + 1;
end else begin
fb_colr_write <= 0;
fb_we <= 0;
end
end
end
// flag when framebuffer is active for display
logic fb_active;
always_comb fb_active = (sy < FB_HEIGHT && sx < FB_WIDTH);
always_ff @(posedge clk_pix) begin
if (sy == V_RES_FULL-1 && sx == H_RES_FULL-1) begin
fb_addr_read <= 0; // reset address at end of frame
end else if (fb_active) begin
fb_addr_read <= fb_addr_read + 1;
end
end
logic [3:0] red, green, blue; // output colour
always_comb begin
red = (fb_active && fb_colr_read) ? 4'hF : 4'h0;
green = (fb_active && fb_colr_read) ? 4'hF : 4'h0;
blue = (fb_active && fb_colr_read) ? 4'hF : 4'h0;
end
// Output DVI clock: 180° out of phase with other DVI signals
SB_IO #(
.PIN_TYPE(6'b010000) // PIN_OUTPUT_DDR
) dvi_clk_io (
.PACKAGE_PIN(dvi_clk),
.OUTPUT_CLK(clk_pix),
.D_OUT_0(1'b0),
.D_OUT_1(1'b1)
);
// Output DVI signals
SB_IO #(
.PIN_TYPE(6'b010100) // PIN_OUTPUT_REGISTERED
) dvi_signal_io [14:0] (
.PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
.OUTPUT_CLK(clk_pix),
.D_OUT_0({hsync, vsync, de, red, green, blue}),
.D_OUT_1()
);
endmodule
Build this design and program your board. You should see the dithered image of David looking at you from the top left of your screen. We’re still drawing a white line at the top of the framebuffer, so this replaces the top line of the bitmap we’re loading.
iCE40: One Bit Too Many
The iCE40 version uses more block ram than you might expect: 160x120 should fit into five 4kb BRAMs, but thedavid_v1
design requires ten. This is because the iCE40 BRAMs are a minimum of two bits wide. When we expand the colour depth to 4-bit, the BRAM usage will be the expected twenty. Learn more in the Lattice ICE Technology Library.
Casting Shade
We can increase the bits assigned to each pixel to support more colours, or in the case of this image of David, more shades. Our video output is 12-bit, which means there are 16 possible shades of grey.
As with the hedgehog graphic in Hardware Sprites, we store the palette in a file: david_palette.mem.
FFF EEE DDD CCC BBB AAA 999 888 777 666 555 444 333 222 111 000
We load the palette file into a colour lookup table (CLUT) ROM, which is used to find the colour of each pixel for display. If you need a refresher on CLUTs, see A Refined Palette.
Build the updated top module with 4-bit greyscale David:
- Xilinx XC7: xc7/top_david_v2.sv
- Lattice iCE40: xc7/top_david_v2.sv
top_david_v2 BRAM usage: 4x36Kb on XC7 and 20x4Kb on iCE40
Warm Tones
Because the palette is separate from the image, we can quickly change it. In top_david_v2.sv
, update localparam FB_PALETTE
to reference david_palette_warm.mem
and rebuild.
The warm palette looks like this:
FED EDC DCB CBA BA9 A98 987 876 765 654 543 432 321 210 100 000
There is also an inverted palette, david_palette_invert.mem
, or you can create your own.
Quick Aside: Palettes
Wikipedia has a lovely list of color palettes for different hardware systems.
Framebuffer Scaling
Our framebuffer is too small to fill a 640x480 screen, and modern monitors don’t support lower resolutions. To make our framebuffer fill the screen, we need to scale it up.
We’ve made our framebuffer an integer divisor of our display resolution, so scaling ought to be simple. However, practical scaling isn’t quite as simple as it first appears. For example, given our display coordinates sx
and sy
you could calculate the current framebuffer read address with:
always_comb fb_read_addr = sx / 4 + (sy / 4) * 160; // ?!
This has at least three problems:
- It doesn’t account for memory latency: it takes one or more cycles to read BRAM
- It wastes memory bandwidth: every value is read 16 times per frame!
- It uses multiplication, which takes up valuable (probably DSP) logic
Using a linebuffer allows us to avoid all three of these problems. The linebuffer uses three simple dual-port BRAMs, one for each colour channel (red, green, blue). We copy a single 160-pixel line of the framebuffer into the linebuffer, then we read this out for four display lines before going back to the framebuffer for the next line. By making efficient use of memory bandwidth, we can better share memory with other systems, such as a CPU.
Add a linebuffer module - linebuffer.sv:
module linebuffer #(
parameter WIDTH=8, // data width of each channel
parameter LEN=2048, // length of line
parameter SCALEW=6 // horizontal scaling width
) (
input wire logic clk_in,
input wire logic clk_out,
input wire logic en_in,
input wire logic en_out,
input wire logic rst_in,
input wire logic rst_out,
input wire logic [SCALEW-1:0] scale,
input wire logic [WIDTH-1:0] data_in_0, data_in_1, data_in_2,
output logic [WIDTH-1:0] data_out_0, data_out_1, data_out_2
);
logic [$clog2(LEN)-1:0] addr_in, addr_out;
logic [SCALEW-1:0] cnt_scale;
// correct scale: if scale is 0, set to 1
logic [SCALEW-1:0] scale_cor;
always_comb scale_cor = (scale == 0) ? 1 : scale;
always_ff @(posedge clk_in) begin
if (en_in) addr_in <= (addr_in == LEN-1) ? 0 : addr_in + 1;
if (rst_in) addr_in <= 0; // reset takes precedence
end
always_ff @(posedge clk_out) begin
if (en_out) begin
cnt_scale <= (cnt_scale == scale_cor-1) ? 0 : cnt_scale + 1;
if (cnt_scale == scale_cor-1)
addr_out <= (addr_out == LEN-1) ? 0 : addr_out + 1;
end
if (rst_out) begin // reset takes precedence
addr_out <= 0;
cnt_scale <= 0;
end
end
// channel 0
bram_sdp #(.WIDTH(WIDTH), .DEPTH(LEN)) ch0 (
.clk_write(clk_in),
.clk_read(clk_out),
.we(en_in),
.addr_write(addr_in),
.addr_read(addr_out),
.data_in(data_in_0),
.data_out(data_out_0)
);
// channel 1
bram_sdp #(.WIDTH(WIDTH), .DEPTH(LEN)) ch1 (
.clk_write(clk_in),
.clk_read(clk_out),
.we(en_in),
.addr_write(addr_in),
.addr_read(addr_out),
.data_in(data_in_1),
.data_out(data_out_1)
);
// channel 2
bram_sdp #(.WIDTH(WIDTH), .DEPTH(LEN)) ch2 (
.clk_write(clk_in),
.clk_read(clk_out),
.we(en_in),
.addr_write(addr_in),
.addr_read(addr_out),
.data_in(data_in_2),
.data_out(data_out_2)
);
endmodule
The linebuffer module handles horizontal scaling, but not vertical scaling. Vertical scaling requires the repetition of lines; this is dealt with by the top module. We use three separate BRAM instances because colours are output separately to the display and 4-bit-wide buffers are a better fit for BRAM hardware than a single 12-wide buffer.
Quick Aside: Clock Domain Crossing
The linebuffer module has another significant benefit: it allows the pixel clock to be different from the rest of the design. For example, you could draw on the framebuffer at 100 MHz (linebuffer data_in) and display it on a 720p60 screen with a 74.25 MHz pixel clock (linebuffer data_out). CDC is usually achieved with a FIFO, but the linebuffer does it while providing efficient scaling.
Top Scaling
To drive the linebuffer we need to enable and reset its data inputs and outputs at appropriate times. This extract from the top module (source link below listing), shows vertical scaling by a factor of four:
// linebuffer (LB)
localparam LB_SCALE_V = 4; // scale vertical drawing
localparam LB_SCALE_H = 4; // scale horizontal drawing
localparam LB_LEN = H_RES / LB_SCALE_H; // line length
localparam LB_WIDTH = 4; // bits per colour channel
// LB data in from FB
logic lb_en_in, lb_en_in_1; // allow for BRAM latency correction
logic [LB_WIDTH-1:0] lb_in_0, lb_in_1, lb_in_2;
// correct vertical scale: if scale is 0, set to 1
logic [$clog2(LB_SCALE_V+1):0] scale_v_cor;
always_comb scale_v_cor = (LB_SCALE_V == 0) ? 1 : LB_SCALE_V;
// count screen lines for vertical scaling - read when cnt_scale_v==0
logic [$clog2(LB_SCALE_V):0] cnt_scale_v;
always_ff @(posedge clk_pix) begin
if (sx == 0)
cnt_scale_v <= (cnt_scale_v == scale_v_cor-1) ? 0 : cnt_scale_v + 1;
if (sy == V_RES_FULL-1) cnt_scale_v <= 0;
end
logic [$clog2(FB_WIDTH)-1:0] fb_h_cnt; // counter for FB pixels on line
always_ff @(posedge clk_pix) begin
if (sy == V_RES_FULL-1 && sx == H_RES-1) fb_addr_read <= 0;
// reset horizontal counter at the start of blanking on reading lines
if (cnt_scale_v == 0 && sx == H_RES) begin
if (fb_addr_read < FB_PIXELS-1) fb_h_cnt <= 0; // read all pixels?
end
// read each pixel on FB line and write to LB
if (fb_h_cnt < FB_WIDTH) begin
lb_en_in <= 1;
fb_h_cnt <= fb_h_cnt + 1;
fb_addr_read <= fb_addr_read + 1;
end else begin
lb_en_in <= 0;
end
// enable LB data in with latency correction
lb_en_in_1 <= lb_en_in;
end
// LB data out to display
logic [LB_WIDTH-1:0] lb_out_0, lb_out_1, lb_out_2;
linebuffer #(
.WIDTH(LB_WIDTH),
.LEN(LB_LEN)
) lb_inst (
.clk_in(clk_pix),
.clk_out(clk_pix),
.en_in(lb_en_in_1), // correct for BRAM latency
.en_out(sy < V_RES && sx < H_RES),
.rst_in(sx == H_RES), // reset at start of horizontal blanking
.rst_out(sx == H_RES),
.scale(LB_SCALE_H),
.data_in_0(lb_in_0),
.data_in_1(lb_in_1),
.data_in_2(lb_in_2),
.data_out_0(lb_out_0),
.data_out_1(lb_out_1),
.data_out_2(lb_out_2)
);
Build the updated top module with scaled David:
- Xilinx XC7: xc7/top_david.sv
- Lattice iCE40: ice40/top_david.sv
top_david BRAM usage: 4.5x36Kb on XC7 and 23x4Kb on iCE40
Quick Aside: BRAM Optimization
You’d expect the linebuffer to use 3x18Kb BRAMs (1.5x36Kb) on the Xilinx FPGA, but because all three colour channels are the same for the greyscale palette, the linebuffers are optimised from 3x18Kb BRAMs to 1x18Kb. The warm palette does use 1.5x36Kb BRAMs for a total of 5.5.
Creating Your Own Images
You can easily create your own images using img2fmem. The script is written in Python and uses the Pillow image library to perform the conversion. You can find it in the Project F FPGA Tools repo. Make sure your images are the same dimensions as the framebuffer you’re using.
To convert an image called acme.png
to 4-bit colour with 12-bit palette for use with $readmemh
:
img2fmem.py acme.png 4 mem 12
For details on installation and command line options, see the img2fmem README.
Fizzle Out
In FPGA Ad Astra, we used linear feedback shift registers (LFSRs) to create animated starfields. LFSRs have another graphical use: creating random dissolve effects, as used in Wolfenstein 3D and explained by Fabien Sanglard in his excellent post: Fizzlefade.
We can use LFSRs to dissolve our image of David using a mask. Our mask is another bitmap, but with only 1 bit per pixel. If a mask pixel is set to 0, then the framebuffer pixel colour is shown, otherwise we show red. By using a mask, rather than altering the original bitmap, we can fade the image back in again or apply other effects.
As previously discussed, the iCE40 doesn’t support 1-bit wide BRAM, so we’ve had to reduce the horizontal resolution of the fizzle bitmap mask so the complete designs still fits into the 30 iCE40 BRAMs.
- Xilinx XC7: xc7/top_david_fizzle.sv
- Lattice iCE40: ice40/top_david_fizzle.sv
top_david_fizzle BRAM usage: 6x36Kb on XC7 and 28x4Kb on iCE40
Xilinx version shown below:
module top_david_fizzle (
input wire logic clk_100m, // 100 MHz clock
input wire logic btn_rst, // reset button (active low)
output logic vga_hsync, // horizontal sync
output logic vga_vsync, // vertical sync
output logic [3:0] vga_r, // 4-bit VGA red
output logic [3:0] vga_g, // 4-bit VGA green
output logic [3:0] vga_b // 4-bit VGA blue
);
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen clock_640x480 (
.clk(clk_100m),
.rst(!btn_rst), // reset button is active low
.clk_pix,
.clk_locked
);
// display timings
localparam CORDW = 10; // screen coordinate width in bits
logic [CORDW-1:0] sx, sy;
logic hsync, vsync, de;
display_timings_480p timings_640x480 (
.clk_pix,
.rst(!clk_locked), // wait for clock lock
.sx,
.sy,
.hsync,
.vsync,
.de
);
// size of screen with and without blanking
localparam H_RES_FULL = 800;
localparam V_RES_FULL = 525;
localparam H_RES = 640;
localparam V_RES = 480;
// framebuffer (FB)
localparam FB_WIDTH = 160;
localparam FB_HEIGHT = 120;
localparam FB_PIXELS = FB_WIDTH * FB_HEIGHT;
localparam FB_ADDRW = $clog2(FB_PIXELS);
localparam FB_DATAW = 4; // colour bits per pixel
localparam FB_IMAGE = "david.mem";
localparam FB_PALETTE = "david_palette.mem";
logic fb_we;
logic [FB_ADDRW-1:0] fb_addr_write, fb_addr_read;
logic [FB_DATAW-1:0] fb_cidx_write, fb_cidx_read;
bram_sdp #(
.WIDTH(FB_DATAW),
.DEPTH(FB_PIXELS),
.INIT_F(FB_IMAGE)
) framebuffer (
.clk_read(clk_pix),
.clk_write(clk_pix),
.we(fb_we),
.addr_write(fb_addr_write),
.addr_read(fb_addr_read),
.data_in(fb_cidx_write),
.data_out(fb_cidx_read)
);
// draw a horizontal line at the top of the framebuffer
always @(posedge clk_pix) begin
if (sy >= V_RES) begin // draw in blanking interval
if (fb_we == 0 && fb_addr_write != FB_WIDTH-1) begin
fb_cidx_write <= 4'h0; // first palette entry (white)
fb_we <= 1;
end else if (fb_addr_write != FB_WIDTH-1) begin
fb_addr_write <= fb_addr_write + 1;
end else begin
fb_we <= 0;
end
end
end
// fizzlebuffer (FZ)
logic [FB_ADDRW-1:0] fz_addr_write;
logic fz_en_in, fz_en_out;
logic fz_we;
bram_sdp #(
.WIDTH(1),
.DEPTH(FB_PIXELS),
.INIT_F("")
) fz_inst (
.clk_write(clk_pix),
.clk_read(clk_pix),
.we(fz_we),
.addr_write(fz_addr_write),
.addr_read(fb_addr_read), // share read address with FB
.data_in(fz_en_in),
.data_out(fz_en_out)
);
// 15-bit LFSR (160x120 < 2^15)
logic lfsr_en;
logic [14:0] lfsr;
lfsr #(
.LEN(15),
.TAPS(15'b110000000000000)
) lsfr_fz (
.clk(clk_pix),
.rst(!clk_locked),
.en(lfsr_en),
.sreg(lfsr)
);
localparam FADE_WAIT = 600; // wait for 600 frames before fading
localparam FADE_RATE = 3200; // every 3200 pixel clocks update LFSR
logic [$clog2(FADE_WAIT)-1:0] cnt_fade_wait;
logic [$clog2(FADE_RATE)-1:0] cnt_fade_rate;
always_ff @(posedge clk_pix) begin
if (sy == V_RES && sx == H_RES) begin // start of blanking
cnt_fade_wait <= (cnt_fade_wait != FADE_WAIT-1) ?
cnt_fade_wait + 1 : cnt_fade_wait;
end
if (cnt_fade_wait == FADE_WAIT-1) begin
cnt_fade_rate <= (cnt_fade_rate == FADE_RATE) ?
0 : cnt_fade_rate + 1;
end
end
always_comb begin
fz_addr_write = lfsr;
if (cnt_fade_rate == FADE_RATE) begin
lfsr_en = 1;
fz_we = 1;
fz_en_in = 1;
end else begin
lfsr_en = 0;
fz_we = 0;
fz_en_in = 0;
end
end
// linebuffer (LB)
localparam LB_SCALE_V = 4; // scale vertical drawing
localparam LB_SCALE_H = 4; // scale horizontal drawing
localparam LB_LEN = H_RES / LB_SCALE_H; // line length
localparam LB_WIDTH = 4; // bits per colour channel
// LB data in from FB
logic lb_en_in, lb_en_in_1; // allow for BRAM latency correction
logic [LB_WIDTH-1:0] lb_in_0, lb_in_1, lb_in_2;
// correct vertical scale: if scale is 0, set to 1
logic [$clog2(LB_SCALE_V+1):0] scale_v_cor;
always_comb scale_v_cor = (LB_SCALE_V == 0) ? 1 : LB_SCALE_V;
// count screen lines for vertical scaling - read when cnt_scale_v==0
logic [$clog2(LB_SCALE_V):0] cnt_scale_v;
always_ff @(posedge clk_pix) begin
if (sx == 0)
cnt_scale_v <= (cnt_scale_v == scale_v_cor-1) ? 0 : cnt_scale_v + 1;
if (sy == V_RES_FULL-1) cnt_scale_v <= 0;
end
logic [$clog2(FB_WIDTH)-1:0] fb_h_cnt; // counter for FB pixels on line
always_ff @(posedge clk_pix) begin
if (sy == V_RES_FULL-1 && sx == H_RES-1) fb_addr_read <= 0;
// reset horizontal counter at the start of blanking on reading lines
if (cnt_scale_v == 0 && sx == H_RES) begin
if (fb_addr_read < FB_PIXELS-1) fb_h_cnt <= 0; // read all pixels?
end
// read each pixel on FB line and write to LB
if (fb_h_cnt < FB_WIDTH) begin
lb_en_in <= 1;
fb_h_cnt <= fb_h_cnt + 1;
fb_addr_read <= fb_addr_read + 1;
end else begin
lb_en_in <= 0;
end
// enable LB data in with latency correction
lb_en_in_1 <= lb_en_in;
end
// LB data out to display
logic [LB_WIDTH-1:0] lb_out_0, lb_out_1, lb_out_2;
linebuffer #(
.WIDTH(LB_WIDTH),
.LEN(LB_LEN)
) lb_inst (
.clk_in(clk_pix),
.clk_out(clk_pix),
.en_in(lb_en_in_1), // correct for BRAM latency
.en_out(sy < V_RES && sx < H_RES),
.rst_in(sx == H_RES), // reset at start of horizontal blanking
.rst_out(sx == H_RES),
.scale(LB_SCALE_H),
.data_in_0(lb_in_0),
.data_in_1(lb_in_1),
.data_in_2(lb_in_2),
.data_out_0(lb_out_0),
.data_out_1(lb_out_1),
.data_out_2(lb_out_2)
);
// colour lookup table (ROM) 16x12-bit entries
logic [11:0] clut_colr;
rom_async #(
.WIDTH(12),
.DEPTH(16),
.INIT_F(FB_PALETTE)
) clut (
.addr(fb_cidx_read),
.data(clut_colr)
);
// map colour index to palette using CLUT and read into LB
always_ff @(posedge clk_pix) begin
{lb_in_2, lb_in_1, lb_in_0} <= fz_en_out ? 12'hA00 : clut_colr;
end
// VGA output
always_ff @(posedge clk_pix) begin
vga_hsync <= hsync;
vga_vsync <= vsync;
vga_r <= de ? lb_out_2 : 4'h0;
vga_g <= de ? lb_out_1 : 4'h0;
vga_b <= de ? lb_out_0 : 4'h0;
end
endmodule
You may have noticed that the top-left pixel doesn’t change; this is because the LFSR produces every value except zero. To fix this, add some logic to specifically update the pixel at address zero when fading.
Explore
I hope you enjoyed this instalment of Exploring FPGA Graphics, but nothing beats creating your own designs. Here are a few suggestions to get you started:
- Load your own picture into the framebuffer using img2fmem
- Cross-fade between two images
- Fade an image by adjusting the intensity of the palette entries
- Try moving the image up and down the screen by chaging the initial value of
fb_addr_read
Next Time
In the next part, we’ll take our framebuffer and implement Conway’s Game of Life in Life on Screen. We’ll then move onto drawing lines and shapes in early 2021: stay tuned!
Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.