We built an FPGA-based AI that uses video input from an NES console to automatically play the game Super Mario Bros. All of the video analysis and AI techniques are performed using Verilog-compiled hardware running on an Altera DE2 Cyclone board. The project combines NTSC decoding, VGA output, kernel-based pattern matching, real-time image manipulation, and NES controller emulation. Below, we explain the system design, our results, our conclusions, and useful appendices. The following video shows a demo of the system working.
Developing an AI to play Super Mario Bros. offers a challenging problem that is possible (though difficult) to accomplish through mere observation of the video output. Unlike similar AIs for Mario-like games (see the Mario AI Championship and learnfun/playfun by Tom Murphy), we do not have direct access to the game’s memory due to our choice of using an original, unmodified NES console. This means we lack basic information like Mario’s location, the map, the location of enemies, etc. All information used in the AI is derived visually, the same way a human would play the game.
We approached the idea believing that Super Mario Bros. is a fairly deterministic environment with side-scrolling that makes it straightforward to do edge and obstacle detection. Interfacing with the NES is also fairly straightforward, as the controller is nothing more than some buttons and an a parallel-to-serial shift register integrated circuit. Importantly, we believed that the game would provide a challenging problem with a fairly large array of possible solutions.
At a high level, our AI works by reading the each video frame into a buffer and analyzing it against a set of precomputed kernels and colors to look for areas of interest. The program then uses a kernel and color matching algorithm with an error threshold to attempt to identify the location of enemies, walls, and pipes, while keeping false positives to a minimum. With this information, the program makes decisions about when to make Mario jump and run so that he avoids enemies and obstacles. We chose to define "success" as Mario making it through the first level, alive, as quickly as possible. Mario makes no attempts to obtain coins or get powerups, though he frequently does so due to dumb luck.
Mario’s strategy is fairly simple: Walk to the right, and jump to avoid danger and/or obstacles. If an enemy is detected immediately to the right of Mario, he does a small jump (timed well for jumping on/over enemies, and bouncing on their heads for groups of enemies). If multiple enemies are detected, run faster before jumping (to try to clear a large group, and to decrease the chance of getting caught in between two enemies without time to jump away). If a pipe or a pit of doom is detected, Mario does a large jump to try to clear the pipe or the pit.
Mario also detects the blocks comprising the squares at the end of the level and jumps accordingly. Sadly, he makes no attempt to earn 5000 points when jumping on the flagpole. He also is unaware of the warp pipe that allows you to skip most of the level.
To simplify some of our computations, we assume Mario is in the center of the screen, which is true since Mario never turns around or stops in our algorithm. This approach is sufficient to solve the relatively basic environment of world 1-1, but would obviously fail in more challenging environments like world 1-2. (We also didn’t develop the kernels necessary to detect the underground enemies and bricks, so our AI frequently beats world 1-1 only to repeatedly die by slamming into the first Goomba in world 1-2.)
Configuring the timing of Mario’s jumps and the thresholds for enemy/obstacle detection was quite a challenge, in part because measuring success is difficult and time-consuming. Sometimes Mario would consistently make it through most of the level before dying on an arrangement of four Goombas towards the end of the level. A slight tweak to the Goomba detection parameter might cause him to get past that part, at the consequence of consistently dying on the first Goomba. In other words, we had difficulty choosing parameters without overfitting to a particular obstacle or situation.
The NES produces a 240p NTSC signal which we convert to VGA for display on an attached monitor. However, we encountered a problem where the video produced by the NES was converted in a way that caused a constant vertical scrolling, which was not present when testing the same hardware with other NTSC video sources. After much experimentation and debugging, we believe that the problem is caused by the fact that the ADV7181B video decoder on-board the DE2 expects 262.5 scanlines in normal NTSC-M mode, while the NES’ PPU, or Picture Processing Unit, produces only 262. The ADV7181B handles extraction of sync pulses using a predictive algorithm, which claims to be correct for improper/noisy sync generation but is unable to properly handle the PPU’s method of video generation. This is a problem that could likely be corrected using a timebase corrector or other high-quality video processing equipment, but those options were outside the amount of money we were willing to spend. We did experiment with cheap external NTSC-to-VGA converters, but the resulting color distortion sacrificed information we needed to reliably detect obstacles and enemies. We also spent a lot of time trying different configurations of the ADV7181B (which has many different settings that can be accessed using I2C) and modifying the DE2_TV module provided to use by Terasic. While this problem may be solvable (or at least correctable), we would encourage future 5760 groups to consider this carefully before embarking on another retro-video-game-console-related project.
Despite this problem, we chose to simply have our system deal with a constantly scrolling video source. We are fortunate that Super Mario Bros. is an entirely horizontally-scrolling game, so we primarily use information about the location of objects in the X direction, and can extract some information in the Y direction by comparing the location of objects to the location of the black horizontal bar that marks the end of the frame. Still, the rollover is a problem because for a brief period of time, some pixels are not visible on the screen. Since our system only uses the video on the screen, and because the exact timing of the rollover depends on when exactly the game is started, this creates a headache and some slightly random behavior, but is not an insurmountable problem.
Another issue with reading video from the NES is that the PPU utilizes a “shortcut” method of modulation, causing vertical lines to appear slightly jagged and flicker. The conversion from 256x240 to 640x480 also results in different scale factors in the X and Y directions, which makes developing kernels for pattern recognition difficult as well.
We chose to implement this project using custom hardware rather than instantiating a NIOS II processor on the FPGA and then programming in C. While coding in C is simpler, the VGA controllers typically used with the NIOS are vastly different from the custom controller we used to display the NTSC signal from the NES. Using custom hardware also allows us to access the VGA buffer directly, allowing us to read and write to the screen at the same time. This allows for greater computation parallelization so that we can simultaneously detect multiple types of obstacles at once. It would not have been possible to read the video input and generate an updated output fast enough if we were using a NIOS II CPU.
Our design uses the DE2_TV project as a starting point to decode the input video signal from the NES. It then uses rolling buffers to store an 11x11 grid containing a thresholded version of each channel, which can then be used to match with predefined kernels to detect objects on the screen. The results of the detection are fed into our main AI logic, which includes an incrementing state machine to filter noise from each detector, and to make decisions about jumping based on detected objects and their relative positions on the screen.
In order to interface with the NES, it is necessary for the Altera DE2 to emulate an NES Controller. We also found that it was helpful to augment the board with a remote LED display that we could mount next to the monitor. The LEDs light up various colors to indicate when a pipe, goomba, or brick is causing the Mario AI to jump. The following schematic shows both of these pieces of additional hardware, and is referenced in the next two sub-sections:
The NES uses a very simple controller configuration: 8 buttons on the controller connect to a parallel-to-serial shift register, which clocks data into the NES based on a clock signal that is generated by the NES. The original NES used 4021 Shift Registers , which are now a bit harder to come by . Instead, we used a 74165N Shift Register, which is nearly identical, except that the polarity of the latch input is inverted . This was easily accomodated for with the addition of a not gate on the latch line from the NES. GPIO pins from the Altera DE2 board emulate "Active Low" button press signals which are sent to the parallel inputs of the shift register. Based on the CLK and LATCH signals from the NES, these button values are periodically shifted into the NES via the controller port. The 5V power from the controller port is used to power the shift and not logic gates. 3.3V logic-level signals are sent from teh FPGA, but these exceed the threshold for logic high detection on both ICs.
A second circuit board, mounted next to the display monitor, assists in the debugging of code by using color-coded LEDs to indicate what scene triggers are causing the Mario AI to execute a "jump" command. The yellow LED lights up when Mario is jumping to avoid a pit or a wall. The red LED lights up when Mario is jumping to avoid an enemy. The green LED lights up to indicate that Mario is jumping to clear a pipe.
The Kernel matching algorithm that we implemented requires that we have a 11x11 binary bitmap representation of each sprite to matched for the red, green, and blue channels. To facilitate automatically generating the verilog representations of these bitmaps, an excel spreadsheet was created that allows the user to “draw” a bitmap for each channel of a sprite. Once this has been drawn, excel equations are used to generate the corresponding verilog code, which can then be copy-pasted into the verilog program where appropriate. This spreadsheet is included within the GitHub Repo where we are hosting the code for this project.
There are a huge number of configuration options available to tweak the performance of the AI we developed. To make it easier to adjust the system to the user’s desired performance, we broke out the most important configuration settings as parameters in one section of the top-level design file. Some of the options include:
A series of modules, adapted from the the cam_to_vga project , were used to facilitate reading the NTSC signal from the NES, and outputting our modified frames over VGA. These modules perform the following conversion steps, in order:
First, an 11x11 kernel was devised for finding unique features of each obstacle in the Red, Green, and Blue channels using the excel spreadsheet mentioned earlier. The figure below shows a sample of the binary bitmaps that were to used to identify the pipe corners, as an example.
To generate the circular buffers necessary to match regions of the screen with the different kernels, we adapted code graciously provided to us by the Cartoonifier group. The approach for an NxN buffer is to store the previous N lines in a circular buffer and to output the NxN grid of bits representing the square of pixels closest to the most recent one shifted in. We wrote a MATLAB script to generate a circular buffer of arbitrary size for the binary, thresholded channels we utilize, and settled on N=11.
The RGB channels of the VGA are then thresholded using a value that was manually determined to produces a clear black and white image used which contains enough detail to uniquely locate the obstacles. The obstacles are then located by comparing their kernels, which were determined using the same threshold, to the thresholded RGB channels.
Using the switches on the DE2 board, we allow configurable display of the thresholded RGB channels separately or combined. We also allow the user to enable or disable display the results of the AI’s detection methods, redrawing the pixels near enemies as red, pipes as green, and bricks as yellow, corresponding to the colors of the debugging LED display.
One problem encountered was that if we rewrote large chunks of the VGA buffer or modified thresholds/display settings during normal operation, the decoded chroma and luma channels of the NTSC signal would flip. This problem also depended a bit on the way the design was synthesized, since it resulted from the system failing to redraw the screen in time to properly decode the VBLANK lines of the NTSC signal. To correct for this, since we knew the screen should always look a certain way (blue sky on the sides of the screen), we created a method of automatically correcting a chroma/luma flip if the sky suddenly turned pinkish-red (lovingly denoted as the “salmon screen of death”). This works well given a reasonable amount of timing mismatch. In very bad timing situations, the system will constantly flip chroma and luma (creating a horrible visual effect, but at least giving the AI enough information on alternating frames to work).
Emulating the NES Controller is a fairly straight-forward operation. Eight I/O pins on the DE2 Board were designated at the button that you would normally find on an NES Controller: A, B, Up, Down, Left, Right, Start, and Select. To "press" any of these buttons, the DE2 pulls that output line low, then back high again. Importantly, the duration of this simulated button press can affect how Mario behaves. In particular, the amount of time that the A button is held down controls how high Mario jumps. We took advantage of this to enable Mario to execute both short and long jumps depending on the obstacles that were ahead of him. These button-press commands feed into an ordinary parallel-to-serial shift register, which was explained in the hardware description above.
We were able to implement a simple AI to play World 1-1 of Super Mario Bros. for the NES. The AI is able to play in real time by plugging in a custom controller-to-GPIO breakout into one of the controller ports of the NES. The AI can successfully complete the level most of the time.
Given some randomness due to the video problems described above and a little bit of non-deterministic behavior from the game (some enemies don’t always appear in the same positions), this is difficult to measure. However, the AI performs admirably, and can generally complete the first level successfully, even when the game does not behave exactly as we might predict. Failures generally occur when the scrolling happens to be timed in such a way that an obstacle (like a goomba) is occluded from view at an inopportune time. It may be possible to deal with something like this by periodically pausing to ensure all enemies are seen, but this would likely cause other complications.
Since the NES does not have a standard NTSC signal, the VGA controller produces a constantly scrolling image. The FPGA is still able to play the game by only looking at the horizontal position of the obstacles and enemies detected and by calculating some of the vertical information by searching for the location of the empty black bar of space that appears between frames. This also means that the FPGA only knows the image information that it is capable of displaying. When enemies or obstacles are not being displayed on the VGA, there are not visible to the AI, making the game challenging to play.
Despite the setbacks posed by the NTSC decoder, we were pleased that we were able to implement an AI that is capable of playing World 1-1 of Super Mario Bros. While we were unable to correct the scrolling of the VGA display, the AI implemented met our expectations and is not only able to completely World 1-1 of Super Mario Bros. but is able to do without a constant reference for the ground.
There are several changes we would make if we were to redo this project. We would have used an emulator or SNES, since both of these would produce a standard NTSC signal. This would have made the AI development much easier as we spent a great deal of time trying to correct the video signal. We would also consider directly reading the ROM of the game instead of trying to use computer vision – this would have made it much simpler to detect where the enemies were located, allow us to easily know the score, etc. With more time, we also would also consider implementing a neural net or other learning algorithm instead of hardcoding in specific actions, allowing the FPGA to learn how to play subsequent levels in the game.
For this project we referenced the Terasic DE2_TV example project and the Real-Time Cartoonifier, both of which are publicly accessible projects. For the NES controller breakout we referenced the NES controller schematic. The Super Mario Bros. game is the property of Nintendo, but there are no rules related to playing it known to us.
All of our code can be found here, hosted on GitHub.
Tasks were broken down amongst team members as follows:
The Nintendo Ninja Project is also Open Source.