Welcome to Exploring FPGA Graphics. In this series, we explore graphics at the hardware level and get a feel for the power of FPGAs. We start by learning how displays work, before racing the beam with Pong, starfields and sprites, simulating life with bitmaps, drawing lines and triangles, and finally creating simple 3D models. I’ll be writing and revising this series throughout 2020 and 2021.
In this first post, we learn how computer displays work and animate simple shapes with an FPGA.
Updated 2021-01-08. Get in touch with @WillFlux or open an issue on GitHub.
In all beginnings dwells a magic force
Herman Hesse, Stages from The Glass Bead Game
Series Outline
- Exploring FPGA Graphics (this post) - learn how displays work and animate simple shapes
- FPGA Pong - race the beam to create the arcade classic
- Hardware Sprites - fast, colourful, graphics with minimal resources
- FPGA Ad Astra - demo with hardware sprites and animated starfields
- Framebuffers - driving the display from a bitmap in memory
- Life on Screen - the screen comes alive with Conway’s Game of Life
More parts to follow.
Requirements
For this series, you need an FPGA board with video output. We’ll be working at 640x480, so pretty much any video output will work. You should be comfortable with programming your FPGA board and reasonably familiar with Verilog. We’ll be demoing the designs with two boards:
- iCEBreaker (Lattice iCE40) with 12-Bit DVI Pmod
- Digilent Arty A7-35T (Xilinx Artix-7) with Pmod VGA
Source
The SystemVerilog designs featured in this series are available from the projf-explore repo on GitHub. The designs are open source hardware under the permissive MIT licence, but this blog is subject to normal copyright restrictions.
Quick Aside: SystemVerilog?!
We’ll be using a few choice features from SystemVerilog to make Verilog a little more pleasant (no laughing at the back). If you’re familiar with Verilog, you’ll have no trouble.
Space and Time
The screen you’re looking at is a little universe with its own rules of space and time.
Looking at a screen from afar, you see a smooth two-dimensional image. Look more closely, and you see many individual blocks: these are pixels, made up of red, green, and blue components. A typical high-definition image is 1920 pixels across and 1080 lines down: over 2 million pixels in total. Even a 640x480 image has over 300,000 pixels. The need to handle so much information so quickly is a big part of the challenge of working with graphics at a hardware level.
A VGA cable has five main signals: red, green, blue, horizontal sync, and vertical sync. There are no addressing signals to tell the screen where to draw pixels; the secret is time, defined by the sync signals. The red, green, and blue wires carry the colour of each pixel in turn. Each pixel lasts a fixed length of time; when the display receives a horizontal sync, it starts a new line; when it receives a vertical sync, it begins a new frame. Showing many frames in quick succession provides the illusion of a moving image.
The sync signals are part of blanking intervals. Originally designed to allow an electron gun to move to the next line or top of the screen, blanking intervals have been retained and repurposed in contemporary displays: HDMI uses them to transmit audio. The blanking interval has three parts: front porch, sync, and back porch.
Display Timings
In this series, we’re going to use 640x480 as our display resolution. Almost all displays support 640x480, and its low resource requirements make it simple to work with on small FPGAs. All the same principles apply at higher resolutions, such as 1280x720 or 4K.
We’ll use traditional horizontal and vertical timings, based on the original VGA monitor and adapter:
640x480 Timings HOR VER
-------------------------------
Active Pixels 640 480
Front Porch 16 10
Sync Width 96 2
Back Porch 48 33
Blanking Total 160 45
Total Pixels 800 525
Sync Polarity neg neg
Learn more from Video Timings: VGA, SVGA, 720p, 1080p.
Taking blanking into account, we have a total of 800x525 pixels. A typical LCD refreshes 60 times a second, so the number of pixels per second is 800 x 525 x 60 = 25,200,000
, which equates to a pixel clock of 25.2 MHz.
CAUTION: CRT Monitors
Any modern display, including multisync CRTs, should be fine with a 25.2 or 25 MHz pixel clock. Fixed-frequency CRTs, such as the original IBM 85xx series, could be damaged by an out-of-spec signal. Use these designs at your own risk.
Running to Time
We’ve decided we need a pixel clock of 25.2 MHz pixel clock, but neither of our demo boards has such a clock. To reach the required frequency, we’re going to use a phase-locked loop (PLL). Almost all FPGAs include one or more PLLs, but there isn’t a standard way to configure them in Verilog, so we have to use vendor-specific designs.
We have provided implementations for Xilinx 7 Series (XC7) and Lattice iCE40; for other FPGAs, you’ll need to consult your vendor documentation. If you can’t reach 25.2 MHz exactly, then 25 MHz or thereabouts should be fine (but see note about CRTs, above). The iCE40 can’t generate 25.2 MHz using the oscillators on iCEBreaker but works fine at 25.125 MHz.
Clock Generator Modules
- Xilinx 7 Series: xc7/clock_gen.sv
- Lattice iCE40: ice40/clock_gen.sv
Display Timings Module
Using our ~25 MHz pixel clock, we can generate timings for our 640x480 display. Creating display timings is straightforward: there’s one counter for horizontal position and one for vertical. We use these counters to decide on the correct time for sync signals.
640x480 display timings generator [display_timings_480p.sv]:
module display_timings_480p (
input wire logic clk_pix, // pixel clock
input wire logic rst, // reset
output logic [9:0] sx, // horizontal screen position
output logic [9:0] sy, // vertical screen position
output logic hsync, // horizontal sync
output logic vsync, // vertical sync
output logic de // data enable (low in blanking interval)
);
// horizontal timings
parameter HA_END = 639; // end of active pixels
parameter HS_STA = HA_END + 16; // sync starts after front porch
parameter HS_END = HS_STA + 96; // sync ends
parameter LINE = 799; // last pixel on line (after back porch)
// vertical timings
parameter VA_END = 479; // end of active pixels
parameter VS_STA = VA_END + 10; // sync starts after front porch
parameter VS_END = VS_STA + 2; // sync ends
parameter SCREEN = 524; // last line on screen (after back porch)
always_comb begin
hsync = ~(sx >= HS_STA && sx < HS_END); // invert: negative polarity
vsync = ~(sy >= VS_STA && sy < VS_END); // invert: negative polarity
de = (sx <= HA_END && sy <= VA_END);
end
// calculate horizontal and vertical screen position
always_ff @ (posedge clk_pix) begin
if (sx == LINE) begin // last pixel on line?
sx <= 0;
sy <= (sy == SCREEN) ? 0 : sy + 1; // last line on screen?
end else begin
sx <= sx + 1;
end
if (rst) begin
sx <= 0;
sy <= 0;
end
end
endmodule
ProTip: The last assignment wins in Verilog, so the reset overrides the existing sx
and sy
.
sx and sy store the horizontal and vertical position; their maximum values are 800 and 525 respectively, so we need 10 bits to hold them (210 = 1024). de is data enable, which is low during the blanking interval: we use it to decide when to draw pixels.
Display modes vary in the polarity of their sync signals; for traditional 640x480, the polarity is negative for both hsync and vsync. Negative polarity means the voltage is mostly high, with low voltage indicating a sync signal.
The following simulation shows the vertical sync starting at the 490th line (counting starts at zero):
Test Benches
You can exercise the designs with the included test benches (Xilinx only):
- Clock Gen Test Bench (Xilinx 7 Series)
- Display Timings Test Bench (Xilinx 7 Series)
Some things to check:
- What is the pixel clock period?
- How long does the pixel clock take to lock?
- Does a frame last exactly 1/60th of a second?
- How much time does a single line last?
- What is the maximum values of
sx
andsy
whende
is low?
You can find instructions for running the simulation in the source README.
Top Display
Now we have our display signals we’re ready to start drawing. To begin, we’re going to keep it simple and draw a coloured square. When the screen x and y coordinates are both less than 32 we draw in orange; otherwise, we use blue. Because our colour output has 4 bits per channel, we can use a single hex digit from 0-F to represent the intensity of red, green, and blue.
There are two versions of this top module, one for each demo board:
- Xilinx XC7: xc7/top_square.sv
- Lattice iCE40: ice40/top_square.sv
Arty VGA
Shown below is the version for Arty A7-35T (XC7) with Pmod VGA:
module top_square (
input wire logic clk_100m, // 100 MHz clock
input wire logic btn_rst, // reset button (active low)
output logic vga_hsync, // horizontal sync
output logic vga_vsync, // vertical sync
output logic [3:0] vga_r, // 4-bit VGA red
output logic [3:0] vga_g, // 4-bit VGA green
output logic [3:0] vga_b // 4-bit VGA blue
);
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen clock_640x480 (
.clk(clk_100m),
.rst(!btn_rst), // reset button is active low
.clk_pix,
.clk_locked
);
// display timings
localparam CORDW = 10; // screen coordinate width in bits
logic [CORDW-1:0] sx, sy;
logic hsync, vsync, de;
display_timings_480p timings_640x480 (
.clk_pix,
.rst(!clk_locked), // wait for clock lock
.sx,
.sy,
.hsync,
.vsync,
.de
);
// 32 x 32 pixel square
logic q_draw;
always_comb q_draw = (sx < 32 && sy < 32) ? 1 : 0;
// VGA output
always_ff @(posedge clk_pix) begin
vga_hsync <= hsync;
vga_vsync <= vsync;
vga_r <= !de ? 4'h0 : (q_draw ? 4'hF : 4'h0);
vga_g <= !de ? 4'h0 : (q_draw ? 4'h8 : 4'h8);
vga_b <= !de ? 4'h0 : (q_draw ? 4'h0 : 4'hF);
end
endmodule
Take a look at the VGA output section of the top module. For each colour, we check whether we’re in the blanking interval (when de is 0). If we are in the blanking interval, we set the colour intensity to zero. Otherwise, we look at the value of q_draw
: if true, we set the pixel to orange, if false we set it to blue. The colour intensity must be zero in the blanking interval; otherwise, your display may be garbled or misaligned. We want the output signals to be registered, and in sync with each other, so we use an always_ff
with all five output signals.
iCEBreaker DVI
The version for iCE40 DVI is similar, but the DVI Pmod requires the data enable, de
, and pixel clock, clk_pix
, too. The TFP410 chip on the DVI Pmod takes our colour and sync signals and encodes them into DVI using Transition-minimized differential signalling (TMDS).
module top_square (
input wire logic clk_12m, // 12 MHz clock
input wire logic btn_rst, // reset button (active high)
output logic dvi_clk, // DVI pixel clock
output logic dvi_hsync, // DVI horizontal sync
output logic dvi_vsync, // DVI vertical sync
output logic dvi_de, // DVI data enable
output logic [3:0] dvi_r, // 4-bit DVI red
output logic [3:0] dvi_g, // 4-bit DVI green
output logic [3:0] dvi_b // 4-bit DVI blue
);
// generate pixel clock
logic clk_pix;
logic clk_locked;
clock_gen clock_640x480 (
.clk(clk_12m),
.rst(btn_rst),
.clk_pix,
.clk_locked
);
// display timings
localparam CORDW = 10; // screen coordinate width in bits
logic [CORDW-1:0] sx, sy;
logic hsync, vsync, de;
display_timings_480p timings_640x480 (
.clk_pix,
.rst(!clk_locked), // wait for clock lock
.sx,
.sy,
.hsync,
.vsync,
.de
);
// 32 x 32 pixel square
logic q_draw;
always_comb q_draw = (sx < 32 && sy < 32) ? 1 : 0;
// colours
logic [3:0] red, green, blue;
always_comb begin
red = !de ? 4'h0 : (q_draw ? 4'hF : 4'h0);
green = !de ? 4'h0 : (q_draw ? 4'h8 : 4'h8);
blue = !de ? 4'h0 : (q_draw ? 4'h0 : 4'hF);
end
// Output DVI clock: 180° out of phase with other DVI signals
SB_IO #(
.PIN_TYPE(6'b010000) // PIN_OUTPUT_DDR
) dvi_clk_io (
.PACKAGE_PIN(dvi_clk),
.OUTPUT_CLK(clk_pix),
.D_OUT_0(1'b0), // output not DDR because we disable rising edge out
.D_OUT_1(1'b1) // output 180° out of phase using falling edge out
);
// Output DVI signals
SB_IO #(
.PIN_TYPE(6'b010100) // PIN_OUTPUT_REGISTERED
) dvi_signal_io [14:0] (
.PACKAGE_PIN({dvi_hsync, dvi_vsync, dvi_de, dvi_r, dvi_g, dvi_b}),
.OUTPUT_CLK(clk_pix),
.D_OUT_0({hsync, vsync, de, red, green, blue}),
.D_OUT_1()
);
endmodule
Quick Aside: SB_IO
To output the DVI signals correctly, we use the SB_IO primitive. Using SB_IO with registered output ensures all our signals are in sync when they leave the FPGA. By having the DVI clock 180 degrees out of phase, the TFP410 will sample the middle of the colour values. You can learn more about Lattice iCE primitives from the Lattice ICE Technology Library.
Constraints
Combine the top_square
, display_timings
, and clock_gen
modules with suitable constraints, and you’re ready to drive a display. The constraints map the pins on the FPGA to signals in our design:
- Arty Constraints: arty.xdc
- iCEBreaker Constraints: icebreaker.pcf
Building the Designs
In the Exploring FPGA Graphics section of the git repo, you’ll find the design files, a makefile for iCEBreaker, a Vivado project for Arty, and instructions for building the designs for both boards.
Let there be Pixels
Once you’ve programmed your board, you should see something like this (colours #0088FF and #FF8800):
Try experimenting with your own square size, position, and colours.
Animation
To create a simple animation, we can update the position of the square every frame. If we move the square during active drawing, we risk screen tearing, so we create an animate
signal that happens at the start of the blanking period.
We’re going to replicate the behaviour of the video display itself, scanning across then down the screen. The square “beam” disappears off the edge of the screen, like the signal in the blanking interval. Try rebuilding the design with top_beam
:
- Xilinx XC7: xc7/top_beam.sv
- Lattice iCE40: ice40/top_beam.sv
The square animation logic looks like this:
// size of screen with and without blanking
localparam H_RES_FULL = 800;
localparam V_RES_FULL = 525;
localparam H_RES = 640;
localparam V_RES = 480;
logic animate; // high for one clock tick at start of blanking
always_comb animate = (sy == V_RES && sx == 0);
// square 'Q' - origin at top-left
localparam Q_SIZE = 32; // square size in pixels
localparam Q_SPEED = 4; // pixels moved per frame
logic [CORDW-1:0] qx, qy; // square position
// update square position once per frame
always_ff @(posedge clk_pix) begin
if (animate) begin
if (qx >= H_RES_FULL - Q_SIZE) begin
qx <= 0;
qy <= (qy >= V_RES_FULL - Q_SIZE) ? 0 : qy + Q_SIZE;
end else begin
qx <= qx + Q_SPEED;
end
end
end
// is square at current screen position?
logic q_draw;
always_comb begin
q_draw = (sx >= qx) && (sx < qx + Q_SIZE)
&& (sy >= qy) && (sy < qy + Q_SIZE);
end
Bounce!
Now we can animate, we can start to create some interesting effects. By adding collision detection, we can bounce squares around the screen. If we create three squares: red, green, and blue, we have a simple demo. While simple, it’s satisfying to watch the squares combine colours as they move around the screen.
Try rebuilding the design with top_bounce
:
- Xilinx XC7: xc7/top_bounce.sv
- Lattice iCE40: ice40/top_bounce.sv
Collision Detection
Collision detection is one of those things that seems trivial but has several subtleties. In our bounce module, each square checks for collisions in both horizontal and vertical directions. We’ll make use of this in the next part of this series on Pong, so it’s worth understanding.
Horizontal collision detection example from top_bounce
:
if (q1x >= H_RES - (Q1_SIZE + q1s)) begin // right edge
q1dx <= 1;
q1x <= q1x - q1s;
end else if (q1x < q1s) begin // left edge
q1dx <= 0;
q1x <= q1x + q1s;
end else q1x <= (q1dx) ? q1x - q1s : q1x + q1s;
H_RES
- horizontal screen resolutionQ1_SIZE
- size of 1st squareq1x
- horizontal position of 1st squareq1dx
- horizontal direction of 1st squareq1s
- horizontal speed of 1st square
A couple of things to consider:
- What needs to change to make the left and right edge collision tests symmetrical?
- Why do we need to account for the speed of the square?
At first blush it seems we can simplify this to the following, with a single position update for all situations:
// questionable collision design
if (q1x >= H_RES - (Q1_SIZE + q1s)) q1dx <= 1;
if (q1x < q1s) q1dx <= 0;
q1x <= (q1dx) ? q1x - q1s : q1x + q1s;
What’s the problem with this approach? Hint: logic in an always_ff
block operates in parallel.
Can you suggest a change to the comparisons to make this simpler approach work?
Explore
I hope you enjoyed the first instalment of Exploring FPGA Graphics. Nothing beats creating your own designs; here are a few suggestions to get you started:
- Try drawing some country flags (many are composed of rectangular shapes)
- Animate the size of the squares, so they grow and shrink
- Add collision detection between squares, so they bounce off each other
- Create a square Verilog module to avoid code duplication
Feedback is most welcome; you can get in touch with @WillFlux or open an issue on GitHub.
Next Time
Next time we’ll put your new graphics skills to work recreating the arcade classic: Pong.
Constructive feedback is always welcome. Get in touch with @WillFlux or open an issue on GitHub.