20 May 2020

Exploring FPGA Graphics

Welcome to Exploring FPGA Graphics. In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020 and 2021.

In this first post, we learn how computer displays work and animate simple shapes with an FPGA.

Updated 2021-01-08. Get in touch with @WillFlux or open an issue on GitHub.

In all beginnings dwells a magic force
Herman Hesse, Stages from The Glass Bead Game

Series Outline

  • Exploring FPGA Graphics (this post) - learn how displays work and animate simple shapes
  • FPGA Pong - race the beam to create the arcade classic
  • Hardware Sprites - fast, colourful, graphics with minimal resources
  • FPGA Ad Astra - demo with hardware sprites and animated starfields
  • Framebuffers - driving the display from a bitmap in memory
  • Life on Screen - the screen comes alive with Conway’s Game of Life

More parts to follow.

Requirements

For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will work. You should be comfortable with programming your FPGA board and reasonably familiar with Verilog. We’ll be demoing the designs with two boards:

Source

The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.

Quick Aside: SystemVerilog?!
We’ll be using a few choice features from SystemVerilog to make Verilog a little more pleasant (no laughing at the back). If you’re familiar with Verilog, you’ll have no trouble.

Space and Time

The screen you’re looking at is a little universe with its own rules of space and time.

Looking at a screen from afar, you see a smooth two-dimensional image. Look more closely, and you see many individual blocks: these are pixels, made up of red, green, and blue components. A typical high-definition image is 1920 pixels across and 1080 lines down: over 2 million pixels in total. Even a 640x480 image has over 300,000 pixels. The need to handle so much information so quickly is a big part of the challenge of working with graphics at a hardware level.

A VGA cable has five main signals: red, green, blue, horizontal sync, and vertical sync. There are no addressing signals to tell the screen where to draw pixels; the secret is time, defined by the sync signals. The red, green, and blue wires carry the colour of each pixel in turn. Each pixel lasts a fixed length of time; when the display receives a horizontal sync, it starts a new line; when it receives a vertical sync, it begins a new frame. Showing many frames in quick succession provides the illusion of a moving image.

The sync signals are part of blanking intervals. Originally designed to allow an electron gun to move to the next line or top of the screen, blanking intervals have been retained and repurposed in contemporary displays: HDMI uses them to transmit audio. The blanking interval has three parts: front porch, sync, and back porch.

Display Timings

Display Timings

In this series, we’re going to use 640x480 as our display resolution. Almost all displays support 640x480, and its low resource requirements make it simple to work with on small FPGAs. All the same principles apply at higher resolutions, such as 1280x720 or 4K.

We’ll use traditional horizontal and vertical timings, based on the original VGA monitor and adapter:

    640x480 Timings      HOR    VER
    -------------------------------
    Active Pixels        640    480
    Front Porch           16     10
    Sync Width            96      2
    Back Porch            48     33
    Blanking Total       160     45
    Total Pixels         800    525
    Sync Polarity        neg    neg

Learn more from Video Timings: VGA, SVGA, 720p, 1080p.

Taking blanking into account, we have a total of 800x525 pixels. A typical LCD refreshes 60 times a second, so the number of pixels per second is 800 x 525 x 60 = 25,200,000, which equates to a pixel clock of 25.2 MHz.

CAUTION: CRT Monitors
Any modern display, including multisync CRTs, should be fine with a 25.2 or 25 MHz pixel clock. Fixed-frequency CRTs, such as the original IBM 85xx series, could be damaged by an out-of-spec signal. Use these designs at your own risk.

Running to Time

We’ve decided we need a pixel clock of 25.2 MHz pixel clock, but neither of our demo boards has such a clock. To reach the required frequency, we’re going to use a phase-locked loop (PLL). Almost all FPGAs include one or more PLLs, but there isn’t a standard way to configure them in Verilog, so we have to use vendor-specific designs.

We have provided implementations for Xilinx 7 Series (XC7) and Lattice iCE40; for other FPGAs, you’ll need to consult your vendor documentation. If you can’t reach 25.2 MHz exactly, then 25 MHz or thereabouts should be fine (but see note about CRTs, above). The iCE40 can’t generate 25.2 MHz using the oscillators on iCEBreaker but works fine at 25.125 MHz.

Clock Generator Modules

Display Timings Module

Using our ~25 MHz pixel clock, we can generate timings for our 640x480 display. Creating display timings is straightforward: there’s one counter for horizontal position and one for vertical. We use these counters to decide on the correct time for sync signals.

640x480 display timings generator [display_timings_480p.sv]:

module display_timings_480p (
    input  wire logic clk_pix,   // pixel clock
    input  wire logic rst,       // reset
    output      logic [9:0] sx,  // horizontal screen position
    output      logic [9:0] sy,  // vertical screen position
    output      logic hsync,     // horizontal sync
    output      logic vsync,     // vertical sync
    output      logic de         // data enable (low in blanking interval)
    );

    // horizontal timings
    parameter HA_END = 639;           // end of active pixels
    parameter HS_STA = HA_END + 16;   // sync starts after front porch
    parameter HS_END = HS_STA + 96;   // sync ends
    parameter LINE   = 799;           // last pixel on line (after back porch)

    // vertical timings
    parameter VA_END = 479;           // end of active pixels
    parameter VS_STA = VA_END + 10;   // sync starts after front porch
    parameter VS_END = VS_STA + 2;    // sync ends
    parameter SCREEN = 524;           // last line on screen (after back porch)

    always_comb begin
        hsync = ~(sx >= HS_STA && sx < HS_END);  // invert: negative polarity
        vsync = ~(sy >= VS_STA && sy < VS_END);  // invert: negative polarity
        de = (sx <= HA_END && sy <= VA_END);
    end

    // calculate horizontal and vertical screen position
    always_ff @ (posedge clk_pix) begin
        if (sx == LINE) begin  // last pixel on line?
            sx <= 0;
            sy <= (sy == SCREEN) ? 0 : sy + 1;  // last line on screen?
        end else begin
            sx <= sx + 1;
        end
        if (rst) begin
            sx <= 0;
            sy <= 0;
        end
    end
endmodule

ProTip: The last assignment wins in Verilog, so the reset overrides the existing sx and sy.

sx and sy store the horizontal and vertical position; their maximum values are 800 and 525 respectively, so we need 10 bits to hold them (210 = 1024). de is data enable, which is low during the blanking interval: we use it to decide when to draw pixels.

Display modes vary in the polarity of their sync signals; for traditional 640x480, the polarity is negative for both hsync and vsync. Negative polarity means the voltage is mostly high, with low voltage indicating a sync signal.

The following simulation shows the vertical sync starting at the 490th line (counting starts at zero):

Sync Signal Simulation

Test Benches

You can exercise the designs with the included test benches (Xilinx only):

Some things to check:

  • What is the pixel clock period?
  • How long does the pixel clock take to lock?
  • Does a frame last exactly 1/60th of a second?
  • How much time does a single line last?
  • What is the maximum values of sx and sy when de is low?

You can find instructions for running the simulation in the source README.

Top Display

Now we have our display signals we’re ready to start drawing. To begin, we’re going to keep it simple and draw a coloured square. When the screen x and y coordinates are both less than 32 we draw in orange; otherwise, we use blue. Because our colour output has 4 bits per channel, we can use a single hex digit from 0-F to represent the intensity of red, green, and blue.

There are two versions of this top module, one for each demo board:

Arty VGA

Shown below is the version for Arty A7-35T (XC7) with Pmod VGA:

module top_square (
    input  wire logic clk_100m,     // 100 MHz clock
    input  wire logic btn_rst,      // reset button (active low)
    output      logic vga_hsync,    // horizontal sync
    output      logic vga_vsync,    // vertical sync
    output      logic [3:0] vga_r,  // 4-bit VGA red
    output      logic [3:0] vga_g,  // 4-bit VGA green
    output      logic [3:0] vga_b   // 4-bit VGA blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_100m),
       .rst(!btn_rst),  // reset button is active low
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    display_timings_480p timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // 32 x 32 pixel square
    logic q_draw;
    always_comb q_draw = (sx < 32 && sy < 32) ? 1 : 0;

    // VGA output
    always_ff @(posedge clk_pix) begin
        vga_hsync <= hsync;
        vga_vsync <= vsync;
        vga_r <= !de ? 4'h0 : (q_draw ? 4'hF : 4'h0);
        vga_g <= !de ? 4'h0 : (q_draw ? 4'h8 : 4'h8);
        vga_b <= !de ? 4'h0 : (q_draw ? 4'h0 : 4'hF);
    end
endmodule

Take a look at the VGA output section of the top module. For each colour, we check whether we’re in the blanking interval (when de is 0). If we are in the blanking interval, we set the colour intensity to zero. Otherwise, we look at the value of q_draw: if true, we set the pixel to orange, if false we set it to blue. The colour intensity must be zero in the blanking interval; otherwise, your display may be garbled or misaligned. We want the output signals to be registered, and in sync with each other, so we use an always_ff with all five output signals.

iCEBreaker DVI

The version for iCE40 DVI is similar, but the DVI Pmod requires the data enable, de, and pixel clock, clk_pix, too. The TFP410 chip on the DVI Pmod takes our colour and sync signals and encodes them into DVI using Transition-minimized differential signalling (TMDS).

module top_square (
    input  wire logic clk_12m,      // 12 MHz clock
    input  wire logic btn_rst,      // reset button (active high)
    output      logic dvi_clk,      // DVI pixel clock
    output      logic dvi_hsync,    // DVI horizontal sync
    output      logic dvi_vsync,    // DVI vertical sync
    output      logic dvi_de,       // DVI data enable
    output      logic [3:0] dvi_r,  // 4-bit DVI red
    output      logic [3:0] dvi_g,  // 4-bit DVI green
    output      logic [3:0] dvi_b   // 4-bit DVI blue
    );

    // generate pixel clock
    logic clk_pix;
    logic clk_locked;
    clock_gen clock_640x480 (
       .clk(clk_12m),
       .rst(btn_rst),
       .clk_pix,
       .clk_locked
    );

    // display timings
    localparam CORDW = 10;  // screen coordinate width in bits
    logic [CORDW-1:0] sx, sy;
    logic hsync, vsync, de;
    display_timings_480p timings_640x480 (
        .clk_pix,
        .rst(!clk_locked),  // wait for clock lock
        .sx,
        .sy,
        .hsync,
        .vsync,
        .de
    );

    // 32 x 32 pixel square
    logic q_draw;
    always_comb q_draw = (sx < 32 && sy < 32) ? 1 : 0;

    // colours
    logic [3:0] red, green, blue;
    always_comb begin
        red   = !de ? 4'h0 : (q_draw ? 4'hF : 4'h0);
        green = !de ? 4'h0 : (q_draw ? 4'h8 : 4'h8);
        blue  = !de ? 4'h0 : (q_draw ? 4'h0 : 4'hF);
    end

    // Output DVI clock: 180° out of phase with other DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010000)  // PIN_OUTPUT_DDR
    ) dvi_clk_io (
        .PACKAGE_PIN(dvi_clk),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0(1'b0),  // output not DDR because we disable rising edge out
        .D_OUT_1(1'b1)   // output 180° out of phase using falling edge out
    );

    // Output DVI signals
    SB_IO #(
        .PIN_TYPE(6'b010100)  // PIN_OUTPUT_REGISTERED
    ) dvi_signal_io [14:0] (
        .PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
        .OUTPUT_CLK(clk_pix),
        .D_OUT_0({hsync, vsync, de, red, green, blue}),
        .D_OUT_1()
    );
endmodule

Quick Aside: SB_IO
To output the DVI signals correctly, we use the SB_IO primitive. Using SB_IO with registered output ensures all our signals are in sync when they leave the FPGA. By having the DVI clock 180 degrees out of phase, the TFP410 will sample the middle of the colour values. You can learn more about Lattice iCE primitives from the Lattice ICE Technology Library.

Constraints

Combine the top_square, display_timings, and clock_gen modules with suitable constraints, and you’re ready to drive a display. The constraints map the pins on the FPGA to signals in our design:

Building the Designs
In the Exploring FPGA Graphics section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.

Let there be Pixels

Once you’ve programmed your board, you should see something like this (colours #0088FF and #FF8800):

A Square

Try experimenting with your own square size, position, and colours.

Animation

To create a simple animation, we can update the position of the square every frame. If we move the square during active drawing, we risk screen tearing, so we create an animate signal that happens at the start of the blanking period.

We’re going to replicate the behaviour of the video display itself, scanning across then down the screen. The square “beam” disappears off the edge of the screen, like the signal in the blanking interval. Try rebuilding the design with top_beam:

The square animation logic looks like this:

    // size of screen with and without blanking
    localparam H_RES_FULL = 800;
    localparam V_RES_FULL = 525;
    localparam H_RES = 640;
    localparam V_RES = 480;

    logic animate;  // high for one clock tick at start of blanking
    always_comb animate = (sy == V_RES && sx == 0);

    // square 'Q' - origin at top-left
    localparam Q_SIZE = 32;    // square size in pixels
    localparam Q_SPEED = 4;    // pixels moved per frame
    logic [CORDW-1:0] qx, qy;  // square position

    // update square position once per frame
    always_ff @(posedge clk_pix) begin
        if (animate) begin
            if (qx >= H_RES_FULL - Q_SIZE) begin
                qx <= 0;
                qy <= (qy >= V_RES_FULL - Q_SIZE) ? 0 : qy + Q_SIZE;
            end else begin
                qx <= qx + Q_SPEED;
            end
        end
    end

    // is square at current screen position?
    logic q_draw;
    always_comb begin
        q_draw = (sx >= qx) && (sx < qx + Q_SIZE)
              && (sy >= qy) && (sy < qy + Q_SIZE);
    end

Bounce!

Now we can animate, we can start to create some interesting effects. By adding collision detection, we can bounce squares around the screen. If we create three squares: red, green, and blue, we have a simple demo. While simple, it’s satisfying to watch the squares combine colours as they move around the screen.

Try rebuilding the design with top_bounce:

Bouncing Squares

Collision Detection

Collision detection is one of those things that seems trivial but has several subtleties. In our bounce module, each square checks for collisions in both horizontal and vertical directions. We’ll make use of this in the next part of this series on Pong, so it’s worth understanding.

Horizontal collision detection example from top_bounce:

    if (q1x >= H_RES - (Q1_SIZE + q1s)) begin  // right edge
        q1dx <= 1;
        q1x <= q1x - q1s;
    end else if (q1x < q1s) begin  // left edge
        q1dx <= 0;
        q1x <= q1x + q1s;
    end else q1x <= (q1dx) ? q1x - q1s : q1x + q1s;
  • H_RES - horizontal screen resolution
  • Q1_SIZE - size of 1st square
  • q1x - horizontal position of 1st square
  • q1dx - horizontal direction of 1st square
  • q1s - horizontal speed of 1st square

A couple of things to consider:

  1. What needs to change to make the left and right edge collision tests symmetrical?
  2. Why do we need to account for the speed of the square?

At first blush it seems we can simplify this to the following, with a single position update for all situations:

    // questionable collision design
    if (q1x >= H_RES - (Q1_SIZE + q1s)) q1dx <= 1;
    if (q1x < q1s) q1dx <= 0;
    q1x <= (q1dx) ? q1x - q1s : q1x + q1s;

What’s the problem with this approach? Hint: logic in an always_ff block operates in parallel.

Can you suggest a change to the comparisons to make this simpler approach work?

Explore

I hope you enjoyed the first instalment of Exploring FPGA Graphics. Nothing beats creating your own designs; here are a few suggestions to get you started:

  • Try drawing some country flags (many are composed of rectangular shapes)
  • Animate the size of the squares, so they grow and shrink
  • Add collision detection between squares, so they bounce off each other
  • Create a square Verilog module to avoid code duplication

Feedback is most welcome; you can get in touch with @WillFlux or open an issue on GitHub.

Next Time

Next time we’ll put your new graphics skills to work recreating the arcade classic: Pong.

Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.

©2021 Will Green, Project F