20 May 2020

Beginning FPGA Graphics

Welcome to Exploring FPGA Graphics. In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how screens work, play Pong, create starfields and sprites, paint Michelangelo’s David, simulate life, draw lines and triangles, and animate characters and shapes. Along the way, you’ll experience a range of designs and techniques, from memory and finite state machines to crossing clock domains and translating C algorithms into Verilog. This post was last updated in May 2022.

This post was completely revised in March 2022.

Get in touch: GitHub Issues, 1BitSquared Discord, @WillFlux (Mastodon), @WillFlux (Twitter)

Series Outline

Sponsor My Work
If you like what I do, consider sponsoring me on GitHub.
I love FPGAs and want to help more people discover and use them in their projects.
My hardware designs are open source, and my blog is advert free.

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will work. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing the designs with two boards:

An increasing number of the designs are also available as Verilator simulations.

iCEBreaker and Arty Dev Boards

Source

The SystemVerilog designs featured in this series are available from the projf-explore git repo under the open-source MIT licence: build on them to your heart’s content. The rest of the blog content is subject to standard copyright restrictions: don’t republish it without permission.

SystemVerilog
We’ll use a few choice features from SystemVerilog to make Verilog a little more pleasant. If you’re familiar with Verilog, you’ll have no trouble. All the SystemVerilog features used are compatible with recent versions of Verilator, Yosys, and Xilinx Vivado.

Space and Time

A screen is a miniature universe with its own space and time.

Seem from afar, a screen shows a smooth two-dimensional image. Up close, it breaks up into many individual blocks of colour: red, green, and blue. We hide this complexity behind the abstract idea of a pixel: the smallest part of the screen we can control. A typical HD screen is 1920 by 1080: two million pixels in total. Even a 640x480 display has more than 300,000 pixels.

A screen creates the illusion of movement by refreshing many times every second. At 60 Hz, a 1920x1080 screen draws 124 million pixels every second! The need to quickly handle so much data is a big part of the challenge of working with graphics at a hardware level.

Display connectors and cabling vary, but VGA, HDMI, and DisplayPort have a similar data design. There are three channels for colour, usually red, green, and blue, and horizontal and vertical sync signals. There may also be audio and configuration data, but that’s not important right now.

The red, green, and blue channels carry the colour of each pixel in turn. A screen begins a new line when it receives a horizontal sync and a new frame on a vertical sync. The sync signals are part of blanking intervals.

Blanking intervals allow time for the electron gun in cathode ray tubes (CRTs) to move to the following line (horizontal retrace) or top of the screen (vertical retrace). Modern digital displays have retained the blanking intervals and repurposed them to transmit audio and other data.

Check out Tim Hunkin’s Secret Life of the Television (1987) to see a CRT television cut in half and its inner workings revealed.

Raster scan on a CRT Monitor

Display Timings

A screen mode is defined by its display timings. Standard timings are set by VESA and the CTA.

In this series, we’ll use 640x480 at 60Hz. Almost all displays support 640x480, and its low resource requirements make it possible to work with even the smallest FPGAs.

Display timings for 640x480 at 60Hz in units of pixels:

Parameter Horizontal Vertical
Active Pixels 640 480
Front Porch 16 10
Sync Width 96 2
Back Porch 48 33
Total Blanking 160 45
Total Pixels 800 525
Sync Polarity negative negative

For other screen modes, see Video Timings: VGA, SVGA, 720p, 1080p.

The blanking interval has three parts: front porch, sync, and back porch. The front porch occurs before the sync signal, the back porch after.

If your screen showed all parts of the signal, it would look something like this:

Display Timings Visualized

Including blanking, we have a total of 800x525 pixels.

The refresh rate is 60 Hz, so the total number of pixels per second is:

800 x 525 x 60 = 25,200,000

Therefore, we want a pixel clock of 25.2 MHz.

Driving a Display

Having selected our display timings, we’re ready to create a video signal. There are four stages:

  1. Pixel Clock
  2. Display Signals
  3. Drawing Graphics
  4. Video Output (VGA, HDMI, DisplayPort)

Driving a Display

Pixel Clock

We know we want a frequency of 25.2 MHz, but how to reach it?

FPGAs include phase-locked loops (PLLs) to generate custom clock frequencies. Alas, there isn’t a standard way to configure a PLL; we need a vendor-specific design.

I have provided implementations for the Arty (Xilinx 7 Series) and iCEBreaker (iCE40):

NB. The iCEBreaker can’t generate 25.2 MHz but runs fine at 25.125 MHz.

For other FPGA architectures, you’ll need to consult your vendor documentation. If you can’t reach 25.2 MHz exactly, 25 MHz or thereabouts should be fine.

CAUTION: CRT Monitors
Modern displays, including multisync CRTs, should be fine with a 25.2 or 25 MHz pixel clock. Fixed-frequency CRTs, such as the original IBM 85xx series, could be damaged by an out-of-spec signal. Use these designs at your own risk.

Display Signals

Next, we can generate sync signals from our pixel clock and display timings. We also want to report the current screen position to know when to draw things.

We do both of these things with a simple display module [simple_480p.sv]:

module simple_480p (
    input  wire logic clk_pix,   // pixel clock
    input  wire logic rst_pix,   // reset in pixel clock domain
    output      logic [9:0] sx,  // horizontal screen position
    output      logic [9:0] sy,  // vertical screen position
    output      logic hsync,     // horizontal sync
    output      logic vsync,     // vertical sync
    output      logic de         // data enable (low in blanking interval)
    );

    // horizontal timings
    parameter HA_END = 639;           // end of active pixels
    parameter HS_STA = HA_END + 16;   // sync starts after front porch
    parameter HS_END = HS_STA + 96;   // sync ends
    parameter LINE   = 799;           // last pixel on line (after back porch)

    // vertical timings
    parameter VA_END = 479;           // end of active pixels
    parameter VS_STA = VA_END + 10;   // sync starts after front porch
    parameter VS_END = VS_STA + 2;    // sync ends
    parameter SCREEN = 524;           // last line on screen (after back porch)

    always_comb begin
        hsync = ~(sx >= HS_STA && sx < HS_END);  // invert: negative polarity
        vsync = ~(sy >= VS_STA && sy < VS_END);  // invert: negative polarity
        de = (sx <= HA_END && sy <= VA_END);
    end

    // calculate horizontal and vertical screen position
    always_ff @(posedge clk_pix) begin
        if (sx == LINE) begin  // last pixel on line?
            sx <= 0;
            sy <= (sy == SCREEN) ? 0 : sy + 1;  // last line on screen?
        end else begin
            sx <= sx + 1;
        end
        if (rst_pix) begin
            sx <= 0;
            sy <= 0;
        end
    end
endmodule

ProTip: The last assignment wins in Verilog, so the reset overrides existing values for sx and sy.

sx and sy store the horizontal and vertical screen position. Counting starts at zero, so the maximum values are 799 for sx and 524 for sy, requiring 10 bits to hold the coordinates (210 = 1024).

For simplicity, we put blanking after the visible pixels; that way, (0,0) is the top-left visible pixel and (639,479) the bottom right.

The following diagram shows the display signals to scale with two 64x64 pixel squares drawn:

Simple Display Signals

de is data enable, which is low during the blanking interval (grey area of above diagram) and tells us when it’s safe to draw.

From the display timings, we know our sync polarity is negative for both hsync and vsync. Negative polarity means that a low voltage indicates a sync.

The following simulation shows the vertical sync starting at line 489. The vertical sync is low for two lines, as expected from the display timings. Note the horizontal sync at the end of each line.

Sync Signal Simulation

Test Benches

If you’re using Vivado, try exercising the designs with these test benches:

Some things to check:

  • What is the pixel clock period?
  • How long does the pixel clock take to lock?
  • Does a frame last exactly 1/60th of a second?
  • How much time does a single line last?
  • What is the maximum values of sx and sy when de is low?

You can find instructions for running the Vivado simulations in the source README.

Drawing Graphics

For our first design, we’re going to draw a square like this:

Square

We use the screen coordinates (sx,sy) to define a square in the centre of the screen:

logic square;
always_comb begin
    square = (sx > 220 && sx < 420) && (sy > 140 && sy < 340);
end

12-bit Colour

The VGA and DVI Pmods output 12-bit colour with three 4-bit channels: red, green, and blue.

We can represent a specific colour using a hex triplet:

  • #F00 - bright red
  • #FA0 - orange
  • #0E3 - bright green
  • #137 - dark blue
  • #FFF - white

In Verilog, hex literals use the letter h, so we can set our colours as follows:

logic [3:0] paint_r, paint_g, paint_b;
always_comb begin
    paint_r = (square) ? 4'hF : 4'h1;
    paint_g = (square) ? 4'hF : 4'h3;
    paint_b = (square) ? 4'hF : 4'h7;
end

We generate a separate signal for each colour channel, ready for video output.

Video Output

Video output works differently for each board and simulation, so we’ll cover them in turn.

Arty VGA

VGA output is straightforward. We register each signal to improve timing and avoid skew:

// VGA Pmod output
always_ff @(posedge clk_pix) begin
    vga_hsync <= hsync;
    vga_vsync <= vsync;
    if (de) begin
        vga_r <= paint_r;
        vga_g <= paint_g;
        vga_b <= paint_b;
    end else begin  // VGA colour should be black in blanking interval
        vga_r <= 4'h0;
        vga_g <= 4'h0;
        vga_b <= 4'h0;
    end
end

The VGA Pmod handles the conversion of digital colour signals into analogue voltages. You need to ensure the colour is black during blanking; otherwise, you’ll see corrupted output on some screens.

iCEBreaker DVI

The TFP410 chip on the DVI Pmod takes our colour and sync signals and encodes them into DVI using Transition-minimized differential signalling (TMDS).

We use the SB_IO primitive to produce high-quality output from the iCE40 FPGA. It’s not necessary to understand how SB_IO works for this series; use this snippet in your designs, and all will be well:

// DVI Pmod output
SB_IO #(
    .PIN_TYPE(6'b010100)  // PIN_OUTPUT_REGISTERED
) dvi_signal_io [14:0] (
    .PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
    .OUTPUT_CLK(clk_pix),
    .D_OUT_0({hsync, vsync, de, paint_r, paint_g, paint_b}),
    .D_OUT_1()
);

// DVI Pmod clock output: 180° out of phase with other DVI signals
SB_IO #(
    .PIN_TYPE(6'b010000)  // PIN_OUTPUT_DDR
) dvi_clk_io (
    .PACKAGE_PIN(dvi_clk),
    .OUTPUT_CLK(clk_pix),
    .D_OUT_0(1'b0),
    .D_OUT_1(1'b1)
);

Lattice SB_IO
The SB_IO primitive (with registered outputs) ensures our DVI signals are in sync when they leave the FPGA. The DVI clock is 180 degrees out of phase so that the TFP410 will sample the middle of the colour values. You can learn more about iCE primitives from the Lattice ICE Technology Library.

Verilator Sim

The simulation output is similar to the Arty VGA, but it expects eight bits per colour channel:

// SDL output (8 bits per colour channel)
always_ff @(posedge clk_pix) begin
    sdl_sx <= sx;
    sdl_sy <= sy;
    sdl_de <= de;
    sdl_r <= {2{paint_r}};  // double signal width from 4 to 8 bits
    sdl_g <= {2{paint_g}};
    sdl_b <= {2{paint_b}};
end

Square One

Bringing the four stages together we have a complete top module:

All of these top modules are listed in full, below. See if you can match each of the four stages of driving a display with the Verilog for your top module.

Arty VGA Square

module top_square (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst_n,    // reset button
    output      logic vga_hsync,    // VGA horizontal sync
    output      logic vga_vsync,    // VGA vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_pix_locked;
    clock_480p clock_pix_inst (
       .clk_100m,
       .rst(!btn_rst_n),  // reset button is active low
       .clk_pix,
       .clk_pix_5x(),  // not used for VGA output
       .clk_pix_locked
    );

    // display sync signals and coordinates
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    simple_480p display_inst (
        .clk_pix,
        .rst_pix(!clk_pix_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // define a square with screen coordinates
    logic square;
    always_comb begin
        square = (sx > 220 && sx < 420) && (sy > 140 && sy < 340);
    end

    // paint colours: white inside square, blue outside
    logic [3:0] paint_r, paint_g, paint_b;
    always_comb begin
        paint_r = (square) ? 4'hF : 4'h1;
        paint_g = (square) ? 4'hF : 4'h3;
        paint_b = (square) ? 4'hF : 4'h7;
    end

    // VGA Pmod output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync;
        vga_vsync <= vsync;
        if (de) begin
            vga_r <= paint_r;
            vga_g <= paint_g;
            vga_b <= paint_b;
        end else begin  // VGA colour should be black in blanking interval
            vga_r <= 4'h0;
            vga_g <= 4'h0;
            vga_b <= 4'h0;
        end
    end
endmodule

iCEBreaker DVI Square

module top_square (
    input  wire logic clk_12m,      // 12 MHz clock
    input  wire logic btn_rst,      // reset button
    output      logic dvi_clk,      // DVI pixel clock
    output      logic dvi_hsync,    // DVI horizontal sync
    output      logic dvi_vsync,    // DVI vertical sync
    output      logic dvi_de,       // DVI data enable
    output      logic [3:0] dvi_r,  // 4-bit DVI red
    output      logic [3:0] dvi_g,  // 4-bit DVI green
    output      logic [3:0] dvi_b   // 4-bit DVI blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_pix_locked;
    clock_480p clock_pix_inst (
       .clk_12m,
       .rst(btn_rst),
       .clk_pix,
       .clk_pix_locked
    );

    // display sync signals and coordinates
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    simple_480p display_inst (
        .clk_pix,
        .rst_pix(!clk_pix_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // define a square with screen coordinates
    logic square;
    always_comb begin
        square = (sx > 220 && sx < 420) && (sy > 140 && sy < 340);
    end

    // paint colours: white inside square, blue outside
    logic [3:0] paint_r, paint_g, paint_b;
    always_comb begin
        paint_r = (square) ? 4'hF : 4'h1;
        paint_g = (square) ? 4'hF : 4'h3;
        paint_b = (square) ? 4'hF : 4'h7;
    end

    // DVI Pmod output
    SB_IO #(
        .PIN_TYPE(6'b010100)  // PIN_OUTPUT_REGISTERED
    ) dvi_signal_io [14:0] (
        .PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0({hsync, vsync, de, paint_r, paint_g, paint_b}),
        .D_OUT_1()
    );

    // DVI Pmod clock output: 180° out of phase with other DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010000)  // PIN_OUTPUT_DDR
    ) dvi_clk_io (
        .PACKAGE_PIN(dvi_clk),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0(1'b0),
        .D_OUT_1(1'b1)
    );
endmodule

Verilator Sim

The Verilator simulation works a little differently; we output the coordinates sdl_sx, and sdl_sy, as well as the colour information.

module top_square #(parameter CORDW=10) (  // coordinate width
    input  wire logic clk_pix,             // pixel clock
    input  wire logic sim_rst,             // sim reset
    output      logic [CORDW-1:0] sdl_sx,  // horizontal SDL position
    output      logic [CORDW-1:0] sdl_sy,  // vertical SDL position
    output      logic sdl_de,              // data enable (low in blanking interval)
    output      logic [7:0] sdl_r,         // 8-bit red
    output      logic [7:0] sdl_g,         // 8-bit green
    output      logic [7:0] sdl_b          // 8-bit blue
    );

    // display sync signals and coordinates
    logic [CORDW-1:0] sx, sy;
    logic de;
    simple_480p display_inst (
        .clk_pix,
        .rst_pix(sim_rst),
        .sx,
        .sy,
        .hsync(),
        .vsync(),
        .de
    );

    // define a square with screen coordinates
    logic square;
    always_comb begin
        square = (sx > 220 && sx < 420) && (sy > 140 && sy < 340);
    end

    // paint colours: white inside square, blue outside
    logic [3:0] paint_r, paint_g, paint_b;
    always_comb begin
        paint_r = (square) ? 4'hF : 4'h1;
        paint_g = (square) ? 4'hF : 4'h3;
        paint_b = (square) ? 4'hF : 4'h7;
    end

    // SDL output (8 bits per colour channel)
    always_ff @(posedge clk_pix) begin
        sdl_sx <= sx;
        sdl_sy <= sy;
        sdl_de <= de;
        sdl_r <= {2{paint_r}};  // double signal width from 4 to 8 bits
        sdl_g <= {2{paint_g}};
        sdl_b <= {2{paint_b}};
    end
endmodule

NB. The Verilator simulation receives its pixel clock from the C++ wrapper, so there’s no pixel clock generation in this version.

Constraints

Before building the design, we need board constraints. The constraints map the pins on the FPGA to the signals in our design. For example, we need to know which FPGA pin connects to the reset button and which to the vertical sync.

Take a look at the constraints for your board:

The Verilator sim doesn’t require constraints.

Building

Each part of this series includes a README with build instructions. I have provided basic instructions here to get you started. If you need help with your board or tools, I recommend the Digilent Forum for Arty and 1BitSquared Discord for iCEBreaker.

Arty

We build Arty designs using Xilinx Vivado. To create a Vivado project, clone the projf-explore repo from GitHub. Then, start Vivado and run the following in the Tcl Console:

cd projf-explore/graphics/fpga-graphics/xc7/vivado
source ./create_project.tcl

This creates a Vivado project with all four designs from this post.

iCEBreaker

We build iCEBreaker designs with the open-source toolchain of Yosys, nextpnr, and IceStorm Tools. If you don’t already have these tools see the README.

To build and program the square design; clone the projf-explore repo, then in a shell:

cd projf-explore/graphics/fpga-graphics/ice40
make square
iceprog square.bin

If you have problems building the iCE40 designs, make sure you’re using Yosys 0.10 or later.

Verilator Simulation

For details on installing Verilator and building the design, see Verilog Simulation with Verilator and SDL and the Simulation README.

Flags

Our first design is not only a square but also the naval signal flag for the letter ‘P’ (blue Peter).

I have created designs for two more flags: Ethiopia and Sweden. Take a look at these examples, then have a go at drawing a flag yourself.

Flag of Ethiopia

The traditional flag of Ethiopia is a tricolour of green, yellow, and red.

Traditional Flag of Ethiopia

We only need the horizontal screen coordinate, sy, to define this flag:

logic [3:0] paint_r, paint_g, paint_b;
always_comb begin
    if (sy < 160) begin  // top of flag is green
        paint_r = 4'h0;
        paint_g = 4'h9;
        paint_b = 4'h3;
    end else if (sy < 320) begin  // middle of flag is yellow
        paint_r = 4'hF;
        paint_g = 4'hE;
        paint_b = 4'h1;
    end else begin  // bottom of flag is red
        paint_r = 4'hE;
        paint_g = 4'h1;
        paint_b = 4'h2;
    end
end

You can find the full flag design in git:

Flag of Sweden

The flag of Sweden consists of a yellow Nordic cross on a blue background.

Flag of Sweden

The official flag has a ratio of 8:5, which equates to 640x400 on our screen:

logic [3:0] paint_r, paint_g, paint_b;
always_comb begin
    if (sy >= 400) begin  // black outside the flag area
        paint_r = 4'h0;
        paint_g = 4'h0;
        paint_b = 4'h0;
    end else if (sy > 160 && sy < 240) begin  // yellow cross horizontal
        paint_r = 4'hF;
        paint_g = 4'hC;
        paint_b = 4'h0;
    end else if (sx > 200 && sx < 280) begin  // yellow cross vertical
        paint_r = 4'hF;
        paint_g = 4'hC;
        paint_b = 4'h0;
    end else begin  // blue flag background
        paint_r = 4'h0;
        paint_g = 4'h6;
        paint_b = 4'hA;
    end
end

You can find the full flag design in git:

Colour Test

No introduction to graphics hardware would be complete without a colour gradient. 12-bit graphics can handle 4,096 colours; this demo shows 256 of them.

Colour Gradient Test

For this example, I’ve kept the blue level contant while varying the red and green:

logic [3:0] paint_r, paint_g, paint_b;
always_comb begin
    if (sx < 256 && sy < 256) begin  // colour square in top-left 256x256 pixels
        paint_r = sx[7:4];  // 16 horizontal pixels of each red level
        paint_g = sy[7:4];  // 16 vertical pixels of each green level
        paint_b = 4'h4;     // constant blue level
    end else begin  // otherwise black
        paint_r = 4'h0;
        paint_g = 4'h0;
        paint_b = 4'h0;
    end
end

We select bits [7:4] from sx and sy, so the colour level changes every 16 pixels.

You can find the full design in git:

Next Time

I hope you enjoyed this introduction to FPGA Graphics. Next time, we’ll be Racing the Beam to create simple demos with our new graphics skills. Check out the site map for more FPGA projects and tutorials.

What’s Possible?

Here are some projects to inspire you:

Get in touch: GitHub Issues, 1BitSquared Discord, @WillFlux (Mastodon), @WillFlux (Twitter)

©2022 Will Green, Project F