20 May 2020

FPGA Graphics

Welcome to Exploring FPGA Graphics. In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We’ll learn how displays work, race the beam with Pong, animate starfields and sprites, paint Michelangelo’s David, simulate life with bitmaps, draw lines and shapes, and create smooth animation with double buffering. Along the way, you’ll experience a Smörgåsbord of designs and techniques, from BRAM and finite state machines to crossing clock domains and translating C algorithms into Verilog.

We begin by learning the basics of computer displays and drawing simple graphics.

You can watch an FPGA Graphics demo reel with designs from across this series.

Updated 2021-09-17. Get in touch with @WillFlux or open an issue on GitHub.

In all beginnings dwells a magic force
Herman Hesse, Stages from The Glass Bead Game

Series Outline

  • FPGA Graphics (this post) - learn how displays work and draw your first graphics
  • Pong - race the beam to create the arcade classic
  • Hardware Sprites - fast, colourful, graphics with minimal logic
  • Ad Astra - graphics demo with starfields and hardware sprites
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life
  • Lines and Triangles - drawing lines and triangles with a framebuffer
  • 2D Shapes - filling shapes and drawing pictures
  • Animated Shapes - animation and double-buffering

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will work. It helps to be comfortable with programming your FPGA board and reasonably familiar with Verilog.

We’ll be demoing the designs with two boards:

Neither of these boards has built-in video, but it’s easy to add with VGA or DVI Pmod.

NB. The original Arty (without the A7) is the same as the Arty A7-35T for our purposes.

iCEBreaker and Arty Dev Boards

Source

The SystemVerilog designs featured in this series are available from the projf-explore git repo under the open-source MIT licence: build on them to your heart’s content. The rest of the blog content is subject to standard copyright restrictions: don’t republish it without permission.

SystemVerilog
We’ll use a few choice features from SystemVerilog to make Verilog a little more pleasant. If you’re familiar with Verilog, you’ll have no trouble. All the SystemVerilog features used are compatible with recent versions of Verilator, Yosys, and Xilinx Vivado.

Space and Time

A screen is a miniature universe with its own rules of space and time.

Seem from afar, a screen shows a smooth two-dimensional image. Up close, it breaks up into many individual blocks of primary colour: red, green, and blue. We hide this complexity behind the abstract idea of a pixel: the smallest part of the screen we can control. A typical HD screen is 1920 by 1080: two million pixels in total. Even an ancient 640x480 VGA screen has more than 300,000 pixels.

To create the illusion of movement, a screen must be redrawn many times every second. A 60 Hz refresh rate is typical of computer screens in 2021. At 60 Hz, a 1920x1080 screen is drawing 124 million pixels every second! The need to quickly handle so much data is a big part of the challenge of working with graphics at a hardware level. Ever wondered why you don’t see many microcontrollers hooked up to TVs?

Display connectors and cabling vary, but VGA, HDMI, and DisplayPort have a shared data design. There are three channels for colour, usually red, green, and blue, and there are horizontal and vertical sync signals. There may also be audio and configuration data, but that’s not important right now.

The red, green, and blue channels carry the colour of each pixel in turn. A screen begins a new line when it receives a horizontal sync and a new frame on a vertical sync. The sync signals are part of blanking intervals. Blanking intervals exist to allow time for the electron gun in cathode ray tubes (CRTs) to move to the following line (horizontal retrace) or top of the screen (vertical retrace). Modern digital displays retained blanking intervals, repurposing them to transmit audio and other data.

Raster scan on a CRT Monitor

Display Timings

In this series, we’re going to use 640x480 at 60Hz. Almost all displays support 640x480, and its low resource requirements make it simple to work with on small FPGAs. The same principles apply at higher resolutions, such as 1280x720 and 4K.

We’ll use traditional horizontal and vertical timings based on the original VGA specification:

    640x480 Timings      HOR    VER
    -------------------------------
    Active Pixels        640    480
    Front Porch           16     10
    Sync Width            96      2
    Back Porch            48     33
    Blanking Total       160     45
    Total Pixels         800    525
    Sync Polarity        neg    neg

Learn more from Video Timings: VGA, SVGA, 720p, 1080p.

The blanking interval has three parts: front porch, sync, and back porch. The front porch occurs before the sync signal, the back porch after. Taking blanking into account, we have a total of 800x525 pixels. A typical LCD refreshes 60 times a second, so the number of pixels per second is 800 x 525 x 60 = 25,200,000, which equates to a pixel clock of 25.2 MHz.

If the different parts of the display timings were made visible, they’d look something like this:

Display Timings

CAUTION: CRT Monitors
Modern displays, including multisync CRTs, should be fine with a 25.2 or 25 MHz pixel clock. Fixed-frequency CRTs, such as the original IBM 85xx series, could be damaged by an out-of-spec signal. Use these designs at your own risk.

Running to Time

We’ve decided we need a 25.2 MHz pixel clock, but neither of our demo boards includes such a clock. To reach the required frequency, we’re going to use a phase-locked loop (PLL). Almost all FPGAs include one or more PLLs, but there isn’t a standard way to configure them in Verilog, so we have to use vendor-specific designs.

We have provided implementations for the Arty (XC7) and iCEBreaker (iCE40); for other FPGAs, you’ll need to consult your vendor documentation. If you can’t reach 25.2 MHz exactly, then 25 MHz or thereabouts should be fine. The iCE40 can’t generate 25.2 MHz using the oscillators on iCEBreaker but works fine at 25.125 MHz.

Clock Generator Modules

These modules are part of the Project F Library of useful Verilog designs.

Simple Display Timings

Using our ~25 MHz pixel clock, we can generate timings for our 640x480 display. Creating display timings is straightforward: there’s one counter for horizontal position and one for vertical. We use these counters to decide on the correct time for sync signals.

Create a module for 640x480 60Hz display timings [simple_display_timings_480p.sv]:

module simple_display_timings_480p (
    input  wire logic clk_pix,   // pixel clock
    input  wire logic rst,       // reset
    output      logic [9:0] sx,  // horizontal screen position
    output      logic [9:0] sy,  // vertical screen position
    output      logic hsync,     // horizontal sync
    output      logic vsync,     // vertical sync
    output      logic de         // data enable (low in blanking interval)
    );

    // horizontal timings
    parameter HA_END = 639;           // end of active pixels
    parameter HS_STA = HA_END + 16;   // sync starts after front porch
    parameter HS_END = HS_STA + 96;   // sync ends
    parameter LINE   = 799;           // last pixel on line (after back porch)

    // vertical timings
    parameter VA_END = 479;           // end of active pixels
    parameter VS_STA = VA_END + 10;   // sync starts after front porch
    parameter VS_END = VS_STA + 2;    // sync ends
    parameter SCREEN = 524;           // last line on screen (after back porch)

    always_comb begin
        hsync = ~(sx >= HS_STA && sx < HS_END);  // invert: negative polarity
        vsync = ~(sy >= VS_STA && sy < VS_END);  // invert: negative polarity
        de = (sx <= HA_END && sy <= VA_END);
    end

    // calculate horizontal and vertical screen position
    always_ff @(posedge clk_pix) begin
        if (sx == LINE) begin  // last pixel on line?
            sx <= 0;
            sy <= (sy == SCREEN) ? 0 : sy + 1;  // last line on screen?
        end else begin
            sx <= sx + 1;
        end
        if (rst) begin
            sx <= 0;
            sy <= 0;
        end
    end
endmodule

ProTip: The last assignment wins in Verilog, so the reset overrides existing values for sx and sy.

sx and sy store the horizontal and vertical position; their maximum values are 800 and 525, respectively, so we need 10 bits to hold them (210 = 1024).

de is data enable, which is low during the blanking interval: we use it to decide when to draw pixels.

Display modes vary in their sync polarity; for traditional 640x480, the polarity is negative for both hsync and vsync. Negative polarity means the voltage is normally high, with low voltage indicating a sync.

The following simulation shows the vertical sync starting at line 489; it’s low for two lines as expected from our display timings. Note the horizontal sync at the end of every line.

Sync Signal Simulation

Test Benches

You can exercise the designs with the included test benches (Xilinx only):

Some things to check:

  • What is the pixel clock period?
  • How long does the pixel clock take to lock?
  • Does a frame last exactly 1/60th of a second?
  • How much time does a single line last?
  • What is the maximum values of sx and sy when de is low?

You can find instructions for running the simulation in the source README.

Driving a Display

Now we have our pixel clock and display signals, we’re ready to drive a display. The following diagram shows the basic architecture, with the three Verilog modules and their key signals:

Driving a Display

Square One

We’ll keep things simple for the first outing of our display logic: let’s draw a square. When the screen coordinates, sx and sy, are both less than 32, we draw orange; otherwise, we draw blue.

There are two versions of the square top module, one for each demo board:

We’ll review both of them in turn then explain how to build the designs for your board.

Arty VGA

Take a look at the VGA output of the Arty square module, below. For each colour, we check whether we’re in the blanking interval (when de is 0). If we are in the blanking interval, we set the colour intensity to zero. Otherwise, we look at the value of q_draw: if true, we set the pixel to orange; if false, we set it to blue. The colour intensity must be zero in the blanking interval; otherwise, your display may be garbled or misaligned. We want the output signals to be registered and in sync with each other, so we use an always_ff with all five output signals.

module top_square (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    simple_display_timings_480p display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // 32 x 32 pixel square
    logic q_draw;
    always_comb q_draw = (sx < 32 && sy < 32) ? 1 : 0;

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync;
        vga_vsync <= vsync;
        vga_r <= !de ? 4'h0 : (q_draw ? 4'hF : 4'h0);
        vga_g <= !de ? 4'h0 : (q_draw ? 4'h8 : 4'h8);
        vga_b <= !de ? 4'h0 : (q_draw ? 4'h0 : 4'hF);
    end
endmodule

Representing Colour
The colour output is 12-bit, which is 4 bits per colour channel.
We use a single hex digit from 0-F to represent the intensity of red, green, and blue.

iCEBreaker DVI

The iCE40 DVI implementation is similar, but the DVI output requires data enable, de, and pixel clock, clk_pix, too. The TFP410 chip on the DVI Pmod takes our colour and sync signals and encodes them into DVI using Transition-minimized differential signalling (TMDS).

module top_square (
    input  wire logic clk_12m,      // 12 MHz clock
    input  wire logic btn_rst,      // reset button (active high)
    output      logic dvi_clk,      // DVI pixel clock
    output      logic dvi_hsync,    // DVI horizontal sync
    output      logic dvi_vsync,    // DVI vertical sync
    output      logic dvi_de,       // DVI data enable
    output      logic [3:0] dvi_r,  // 4-bit DVI red
    output      logic [3:0] dvi_g,  // 4-bit DVI green
    output      logic [3:0] dvi_b   // 4-bit DVI blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen_480p clock_pix_inst (
       .clk(clk_12m),
       .rst(btn_rst),
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    simple_display_timings_480p display_timings_inst (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // 32 x 32 pixel square
    logic q_draw;
    always_comb q_draw = (sx < 32 && sy < 32) ? 1 : 0;

    // colours
    logic [3:0] red, green, blue;
    always_comb begin
        red   = !de ? 4'h0 : (q_draw ? 4'hF : 4'h0);
        green = !de ? 4'h0 : (q_draw ? 4'h8 : 4'h8);
        blue  = !de ? 4'h0 : (q_draw ? 4'h0 : 4'hF);
    end

    // Output DVI clock: 180° out of phase with other DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010000)  // PIN_OUTPUT_DDR
    ) dvi_clk_io (
        .PACKAGE_PIN(dvi_clk),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0(1'b0),  // output not DDR because we disable rising edge out
        .D_OUT_1(1'b1)   // output 180° out of phase using falling edge out
    );

    // Output DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010100)  // PIN_OUTPUT_REGISTERED
    ) dvi_signal_io [14:0] (
        .PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0({hsync, vsync, de, red, green, blue}),
        .D_OUT_1()
    );
endmodule

Lattice SB_IO
The SB_IO primitive (with registered outputs) ensures our DVI signals are in sync when they leave the FPGA. The DVI clock is 180 degrees out of phase, so the TFP410 will sample the middle of the colour values. You can learn more about iCE primitives from the Lattice ICE Technology Library.

Constraints

Before building the design, we need board constraints. The constraints map the pins on the FPGA to the signals in our design. For example, we need to know which FPGA pin is connected to the oscillator. The Arty constraints use Xilinx’s XDC format, which is written in Tcl.

Take a look at the constraints for your board:

Building

Every part of this series includes a README with build instructions. I have included the basic instructions in this post to get you started. If you need help with your board or tools, I recommend the Digilent Forum for Arty and 1BitSquared Discord for iCEBreaker.

Arty

We build Arty designs using Xilinx Vivado. To create a Vivado project, clone the projf-explore repo from GitHub. Then, start Vivado and run the following in the Tcl Console:

cd projf-explore/graphics/fpga-graphics/xc7/vivado
source ./create_project.tcl

This will create a Vivado project with all the designs from this post. You can than build top_square from the Vivado user interface.

iCEBreaker

We build iCEBreaker designs with the open-source toolchain of Yosys, nextpnr, and IceStorm Tools. If you don’t already have these tools see the README.

For example, to build and program top_square; clone the projf-explore repo, then in a shell:

cd projf-explore/graphics/fpga-graphics/ice40
make top_square
iceprog top_square.bin

Hip to be Square

Once you’ve programmed your board, you should see something like this:

A Square

Try experimenting with your own square size, position, and colours.

Animation

To create a simple animation, we can update the position of the square every frame. If we move the square during active drawing, we risk screen tearing, so we create an animate signal at the start of the vertical blanking period.

We’re going to replicate the behaviour of the video display itself, scanning across then down the screen. The square “beam” disappears off the edge of the screen, like the signal in the blanking interval. Try rebuilding the design with top_beam:

The square animation logic looks like this:

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    logic animate;  // high for one clock tick at start of vertical blanking
    always_comb animate = (sy == V_RES && sx == 0);

    // square 'Q' - origin at top-left
    localparam Q_SIZE = 32;    // square size in pixels
    localparam Q_SPEED = 4;    // pixels moved per frame
    logic [CORDW-1:0] qx, qy;  // square position

    // update square position once per frame
    always_ff @(posedge clk_pix) begin
        if (animate) begin
            if (qx >= H_RES_FULL - Q_SIZE) begin
                qx <= 0;
                qy <= (qy >= V_RES_FULL - Q_SIZE) ? 0 : qy + Q_SIZE;
            end else begin
                qx <= qx + Q_SPEED;
            end
        end
    end

    // is square at current screen position?
    logic q_draw;
    always_comb begin
        q_draw = (sx >= qx) && (sx < qx + Q_SIZE)
              && (sy >= qy) && (sy < qy + Q_SIZE);
    end

Bounce!

Now we can animate, we can start to create some interesting effects. By adding collision detection, we can bounce squares around the screen. If we draw three squares: red, green, and blue, we have a simple demo. While simple, it’s satisfying to watch the squares combine colours as they move around the screen.

Try rebuilding the design with top_bounce:

Bouncing Squares

Collision Detection

Collision detection is one of those things that seems trivial but has several subtleties. In our bounce module, each square checks for collisions in both horizontal and vertical directions. We’ll make use of this in the next part of this series on Pong, so it’s worth understanding.

Horizontal collision detection example from top_bounce:

    if (q1x >= H_RES - (Q1_SIZE + q1s)) begin  // right edge
        q1dx <= 1;
        q1x <= q1x - q1s;
    end else if (q1x < q1s) begin  // left edge
        q1dx <= 0;
        q1x <= q1x + q1s;
    end else q1x <= (q1dx) ? q1x - q1s : q1x + q1s;
  • H_RES - horizontal screen resolution
  • Q1_SIZE - size of 1st square
  • q1x - horizontal position of 1st square
  • q1dx - horizontal direction of 1st square
  • q1s - horizontal speed of 1st square

A couple of things to consider:

  1. What needs to change to make the left and right edge collision tests symmetrical?
  2. Why do we need to account for the speed of the square?

At first blush, it seems we can simplify this to the following, with a single position update for all situations:

    // questionable collision design
    if (q1x >= H_RES - (Q1_SIZE + q1s)) q1dx <= 1;
    if (q1x < q1s) q1dx <= 0;
    q1x <= (q1dx) ? q1x - q1s : q1x + q1s;

What’s the problem with this approach? Hint: logic in an always_ff block operates in parallel.

Can you suggest a change to the comparisons to make this simpler approach work?

Explore

I hope you enjoyed the first instalment of Exploring FPGA Graphics. Nothing beats creating your own designs; here are a few suggestions to get you started:

  • Try drawing some country flags (many are composed entirely of squares and rectangles)
  • Animate the size of the squares so they grow and shrink
  • Add collision detection between squares, so they bounce off each other
  • Create a square Verilog module to avoid code duplication

You could also learn how to create a graphical simulation with Verilator and SDL.

Sponsor Project F
If you like what I do, consider sponsoring me on GitHub.
I’ll use contributions to spend more time creating open-source FPGA designs and tutorials.

What’s Possible?

With a bit of experience, you’ll be creating fantastic FPGA projects of your own. To get an idea of what’s possible with a small FPGA and a little logic, take a look at:

Next Time

Next time we’ll put your new graphics skills to work recreating the arcade classic: Pong.

Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.

©2021 Will Green, Project F