Ghost 'n Goblins is a game released in 1986 by Keith Burkhill. It is a conversion of the arcade game with the same name by Capcom. The ZX Spectrum has a 3.5mhz Z80 8-bit CPU and no graphics acceleration hardware support (i.e. no hardware sprites or hardware scrolling). Considering these limitations it is quite an achievement how this game manages to run a almost full screen 8-way smooth scrolling playfield at good frame rates.
This article will try to explain the graphics routines used in this game.
Let's take a look how the game paints a frame. In this video the emulation speed is reduced so we can see exactly the order of which things are painted. The screen is also filled with all pixels set and attributes set to white on black to fully show what is going on.
Here is another example from later in the same level.
And from the next level.
Some things to note:
Because of the strict top-to-bottom behavior it can "race the beam" and avoid graphics tearing or flickering that would otherwise be a problem when painting directly to video memory. But it takes very fast routines to be able to do that so lets continue with examining how those routines work.
The hidden clip area works like this: The attributes for this area is set to black pixels on black background which means any graphics written there will not be visible. A sprite that is painted at the right edge of the scroll area does not need to have any specialized code for clipping, it can instead just paint into the black area. So why is there no border at the left edge? Here is the clever part: a sprite painted at the left edge will also paint the clipped portion into the right border because when you decrease the address pointer at the left edge it will end up at the right edge 8 pixels above. Since no sprite is larger than 16 pixels in width (24 pixels when pre-rotated), it is enough with a 16-pixel border for all clipping.
The flying demon is 32 pixels in width but is painted as two separate 16-pixels width sprites. Top and bottom clipping is trivial, just call the regular sprite routines with start address and number of lines to paint adjusted to the vertically clipped portion.
This image shows the content of the clipping area when we turn on attributes.
The game uses several different specialized sprite routines for maximum speed. Sprites that doesn't move are painted with ORing against the background (examples of this are the shooting plant and the princess). Most moving sprites are painted with masks that is ANDed on background for best visual results as this makes it possible to outline sprites to better distinguish them from the background.
The "paint masked pre-rotated 16-pixel width sprite with variable height" is located at address
0x8ea8. Since the sprites are pre-rotated this means it updates 24 pixels in width. The graphics and mask data is interleaved. Here is the Z80 disassembly.
Things to note:
inc l" increase the target 8 pixels to the right. "
inc h" increase one pixel down. This works within the same "row" of 8 lines, after that this piece of code navigates to next row. The ZX Spectrum has a peculiar video memory layout but the method used here is very efficient.
Sprites are stored in compact form and then pre-rotated and mirrored to temporary buffers when the game start. Sometimes this happens even during gameplay (for instance when the player takes a hit and loses the armor you can notice a short delay, this is because the game then pre-rotates and mirrors the player graphics without armor). Pre-rotation is done in 2-pixels steps so each sprite is rotated 4 times.
The routine that copies a masked sprite to the runtime format is located at address
0x6792 for the graphics and
0x6ae9 for the mask. The routine that mirrors a sprite is located at
0x69bb. The routine that rotates a masked sprite is located at address
0x6a8f for the graphics and
0x6b06 for the mask. They are straightforward so the disassembly of those routines are omitted here.
These are the masked sprites that are in the game.
We know that the background is fully updated on every frame. If every pixel was unique then the whole operation of reading 240x276 pixels from memory and then write them to video memory would just be too slow, no matter how much we tried to optimize the code. So what can be done? We can't reduce the number of writes but we can reduce the number of reads by using the fact that the background is made of repeating patterns. And that typically the background is composed of around 50% empty space.
By identifying runs of repeating patterns of empty space or graphics these can be painted as a individual specialized function call.
Clearing empty parts of the background is made by specialized routines for each horizontal length of the block that should be cleared, all the way from 2 columns up to the full scrolling view width of 30 columns (incremented in 2-column steps since we have a 2-column width clipping area). So for the 30-column empty area in the picture above we call the "clear screen line 30 column" routine.
Here is an example of such a routine that clears 12 columns of pixels:
There are 15 of these routines, one for each horizontal length, that varies only in the number of
push instructions in the inner loop.
Painting a repeating background pattern is done the same way. There are specialized routines for every 2-column increment in length so 15 routines in total.
Here is an example:
Notice that stack is used both for reading pattern data and to write to video memory for maximum speed.
The graphics is pre-rotated in 2 pixel increments and rotated in such a way that pixels that are shifted out to the right edge enters on the left edge again. Thanks to this format there is no need to blend the background graphics in any way, they can simply be written as fast as possible to the screen. There are separate routines for painting the edges.
Most of the optimizations are done using specialization. A lot of cycles are saved by writing specific routines for every case that would otherwise require expensive testing for various conditions. This does increase the code size: at least 35 different routines are used in combination to fill the screen. Transferring data using the hardware stack is also an important optimization.
I'm sure there are a lot more interesting techniques that are left to uncover in this game. Send me a message if there is something specific I should add or if you want help with making your own investigations.
Link to YouTube video where someone finishes the game (using cheats because the game is frustratingly hard and unfair to play, just like pretty much all 8-bit titles 😀).