After having emulated and studied the first part of ‘the core’, we now study in detail the second part of what is found in every computer system emulator. This happens immediately after executing every instruction: First, we must check if any interrupt has occurred. Second, we must take into account the duration of the instruction just executed, and update the machine’s status accordingly. After studying this ubiquitous part of the emulator, we will have a firm understanding of the basics of emulating (via interpretation) a computer system. Of course, for an emulator to be usable this isn’t enough; many more things, some of them not directly related to emulation (for example the user interface), must be done in order to have a complete and functional emulator.
One fundamental aspect of every computer system is timing. Things need to be done at specific moments during the machine’s functioning: The display needs to draw a new frame at a rate of, commonly, about 60hz; the sound system must produce samples at specific sampling rates (e.g 44100hz); timers must trigger interrupts when their respective counters expire; etc. On the other hand, devices may require attention from the CPU after finishing some operation. For example, the display generates an interrupt every time a Vertical Blank is about to start (that is, every time a new frame is drawn); the disk controller may request an interrupt as soon as a disk block is read, probably as a result of a page fault exception; software may set a particular timer to interrupt at specific intervals; etc. Additionally, these interrupts must be processed in a well-defined way. For example, interrupts can have priorities, with higher priority interrupts being serviced first when various interrupts are requested simultaneously. Also, the hardware can disable automatically further interrupts while servicing one, and it may also provide other mechanisms to help the software handle the interrupts in general.
Most of these timing-related issues are implemented in each device independently of the rest of devices, although there is also much interaction between the different devices and their timings. For example, the display device refreshes the screen at a rate of 60hz, quite independently of the rest of the system, but another device can depend on this display timing; the CPU may be able to access Video RAM only shortly after a frame update, and for a short period of time, before the display begins to draw the new frame. Therefore, there must be some mechanism so the CPU has knowledge of the display’s timing. This kind of synchronization is possible thanks to the interrupt system, and is widely used by software as it amounts to just the flexibility it needs to drive effectively and efficiently the machine. As we see, interrupts and timing are two sides of the same coin.
IMPLEMENTING INTERRUPTS AND TIMING IN SOFTWARE
For implementing the above functionality, we have a great advantage: We are doing so in software, so we have no physical (electrical, mechanical, etc) constraints and can focus solely on the logic of the process. Of course, we have adopted the tediousness of doing the implementation in Assembly Language, but nevertheless this decision will enable us to clear our understanding of the machine’s inner workings.
IMPLEMENTING INTERRUPTS IN REALBOY
Immediately after executing an instruction, where we left off in Emulating The Core, Part 1: The Fetch-And-Execute Cycle, we need to process interrupts.
Interrupts are undoubtedly a fundamental aspect of the computer system we are emulating (the Game Boy). First, let’s remember where we left off:
Instruction is not instruction extender (opcode 0xCB)
Recall that we have successfully emulated the first instruction (LD SP, $0xFFFE) and our ‘PC’ now points to the next instruction to execute. What follows is what needs to be done before restarting the Instruction Cycle; that is, what needs to be done right after executing every instruction and just before executing the next instruction. Let’s see.
As noted in the code’s comments (“/* the reason for doing the following is very subtle, but it must be done */“), the following lines deal with a subtleness of the emulator. This needs to be done due to some exotic pieces of code found in some games while debugging. In essence, lines 166 to 173 reads the value of the PC, and sets %r13 accordingly. Now, it is more complicated than it should in order to handle a special case. It goes something like this: Recall that each instruction advances the PC the number of bytes that makes up the instruction. However, also recall that the PC actually holds a memory address; it is treated as a memory pointer. Now, suppose that the PC points to memory address 0x3FFF. This corresponds to the last byte of the permanently-mapped ROM bank. Suppose that the instruction at this address is 1 byte long. After executing the instruction at 0x3FFF, the PC advances 1 byte and points to the next instruction; that is, it now points to address 0x4000. But remember that address space 0x4000 to 0x7FFF is used for alternately mapping the rest of the ROM banks of the game. This way, it is not sufficient to literally advance the actual PC pointer because, unless the bank mapped at that moment at 0x4000 is the one immediately following the permanently mapped bank, the addresses are logically contiguous, but not physically contiguous. In most cases it suffices to advance the PC pointer, because the next instruction is physically contiguous, but some rare cases like the above I have witnessed (although I don’t recall the game that exhibits this behaviour), so we can no longer assume that the more common case is the only case. For this reason, after executing every instruction, we indeed, as usual, advance the value stored at the PC register, but the pointer itself, which we implement through our %r13 register, is always remapped in case the new memory address is not physically contiguous.
Briefly again: Remember that %r13 is where we actually implement the ‘PC’; it is the pointer to memory where the next instruction will be fetched. But it is not sufficient to actually advance %r13 a certain amount of bytes so it points to the next instruction to execute, because a particular instruction may involve a bank switch, so the address space is no longer contiguous. Let us not worry about this issue at the time and continue execution at line 175:
Begin interrupts processing code
We begin our interrupt-processing logic with a special-case code. The Game Boy’s CPU presents an instruction called halt. Effectively, it suspends execution and the CPU goes to a ‘sleep’ state; the CPU remains halted until an interrupt occurs. The opcode for this instruction is 0x76, so we test if the opcode for the just-executed instruction is equal to 0x76 (118 decimal). If the opcode is not equal to 0x76 (the instruction is not ‘halt’), then the jump is not performed, and execution continues at line 177.
Effectively, our current instruction is not ‘halt’
Next, we test if interrupts are enabled. This is determined by the Interrupt Master Enable Flag (IME), a special register accessed through a couple of special CPU instructions; this register is not mapped into the Game Boy’s Address Space; it is not accessed through memory, but instead the Game Boy’s CPU offers two instructions that interact with it. One of such instructions sets the IME flag, effectively enabling interrupts, while the other instructions clears it, disabling interrupts. Also, hardware clears the IME flag as a result of servicing an interrupt, and sets it as a result of returning from an interrupt. In RealBoy we use a special static variable to represent the IME flag. This variable is ime_flag, so we test if it is set or not at line 177. In case interrupts are disabled, a jump is performed all the way to line 228, effectively skipping the whole interrupt processing logic. Because interrupts at this point are enabled, such skip is not made, and execution continues at line 180:
Actually begin interrupt processing
Finally, with the special case handled and interrupts enabled, we begin to check for interrupts, from higher to lower priority.
Recall that the interrupt system presents a couple of memory-mapped registers. First, the Interrupt Request Register (IF) specifies the interrupts that have been requested by the various devices. This register’s layout is as follows:
FF0F – IF – Interrupt Flag (R/W)
Bit 0: V-Blank Interrupt Request (INT 40h) (1=Request) Bit 1: LCD STAT Interrupt Request (INT 48h) (1=Request) Bit 2: Timer Interrupt Request (INT 50h) (1=Request) Bit 3: Serial Interrupt Request (INT 58h) (1=Request) Bit 4: Joypad Interrupt Request (INT 60h) (1=Request)
It presents a bit for each possible source of interruption. For example, bit 0 is set every time the display finishes drawing a frame and starts what is known as the Vertical Blank Period (VBLANK). This is the time it takes for the display to start drawing the next frame at the upper-left corner of the screen (row 0, column 0). This source of interruption is actually used by all games to update the Video RAM (VRAM), because during the VBLANK the CPU is free to access all of VRAM for a long period of time, whereas it may only access a restricted portion of VRAM and for a very short time while the display is writing a new frame.
Now, because priorities are ordered from least significant to most significant bit, we start checking if we have a VBLANK interrupt. We assign 1 to register %rcx. This register will be used as a bit mask; it starts with the value of 1 (binary 00000001), and, if there is no VBLANK interrupt request, it is shifted one unit to the left (binary 00000010); if no LCD STAT interrupt is requested, it is shifted again (binary 00000100), etc. This way we test each bit of the IF register for an interrupt request.
We use %r10 as a pointer to ints_offs. Let’s see what is this memory location:
FILE: gboy_x86_64.S LINE NUMBER: 7269 ints_offs: .quad 0x40, 0x48, 0x50, 0x58, 0x60
The memory address pointed to by %r10, then, is just an array of 8-byte values. Each value corresponds to the memory location where each interrupt handler resides. For example, if we had a VBLANK interrupt and were going to process it, the CPU disables interrupts, pushes the current address to the stack and transfers execution control to address 0x0040. Similarly, the interrupt handler for the LCD STAT interrupt is at address 0x0048; the one for the internal timer is at 0x0050; the one for the serial link cable is at 0x0058 and the one for the joypad is at 0x0060. As you can see, the interrupt handlers are at fixed addresses, and all of them reside in the permanently-mapped ROM bank, that is, the address range from 0x0000 to 0x3FFF, so there is no problem with bank switching replacing the interrupt handlers.
Now we take the jump at line 182 and continue execution at line 189.
Control values at registers %rcx and %r10
Checking for an interrupt is done at line 189. Recall that in the Game Boy Address Space, address 0xFF0F is actually mapped to the IF register (the Interrupt Request Register). Also, recall that register %rcx is 1, that the variable addr_sp represents the start of the Game Boy’s Address Space and note that the macro IR_REG effectively expands to the value 0xFF0F. With this in mind, we can see that the instruction at line 189, testq %rcx, IR_REG+addr_sp, is actually testing bit 0 of the IF register; that is, it is testing whether or not a VBLANK interrupt has been requested. Because we have just set up our virtual machine, everything has not long ago been reset to default values; we have yet not received any interrupts, so all bits of the IF register are indeed clear (have a value of 0). This way, there is no interrupt request at bit 0 of the IF register, so we continue checking for the rest of the bits; the jump at line 189 is taken, and execution continues at line 184.
Prepare to check next interrupt bit
Recall that %rcx had a value of 1, and it was used to test bit 0 of the IF register at address 0xFF0F. Also, register %r10 points to the memory location where the interrupt handlers’ addresses are stored. Actually, %r10 points to the first element of the array, and the first element is indeed the value 0x0040. Had an interrupt been requested at bit 0, we would have used %10 to extract that value and transfer control over there, where the interrupt handler corresponding to the VBLANK interrupt effectively resides.
First, at line 184 we verify that we haven’t finished processing all interrupts. Remember, from the IF register that there are only 5 possible interrupt sources, so only the first 5 bits of the register are used to indicate an interrupt request. The last check is for bit 4, which translates to a value of 0x10 for %rcx (binary 00010000). When %rcx is indeed equal to 0x10, a jump is performed all the way to line 228; effectively where the interrupt-processing function terminates. We haven’t finished processing interrupts, so the jump is not performed. We now need to test bit 1 of the IF register, so we shift %rcx to the left by one unit; now %rcx is 2 (binary 00000010). Also, we advance the pointer held at %r10 to point to the next element (address 0x0048). We are ready to check for the LCD STAT interrupt.
New values for the control registers. Note that %rcx is printed in binary.
The logic now repeats:
- We check for an interrupt request at bit 1 in the IF register.
- No request has been made for this bit, so a jump is performed back to line 181.
- We verify that we haven’t checked all interrupt bits.
- We shift %rcx left by one unit so we can use it to check the next bit. In this case, %rcx would be 4 (binary 00000100).
- We advance the pointer %r10 to point to the address for the corresponding interrupt handler, that is, the next element of the ints_offs array. In this case, the new interrupt handler would be at address 0x0050.
After having checked all bits, and because we have no interrupts requested yet, a jump is finally made at line 185 after checking the last bit of the IF register.
The last interrupt bit has been checked
We had no interrupts, so we jump past the interrupt processing routine
We have seen how priority-based interrupts may be implemented in software. Too bad that we didn’t actually have an interrupt to process, so we could also see how we handle an interrupt request. We will cover this part in another post. For now it suffices to now that, while executing every instruction, an interrupt may have been requested, so we need to check every possible source of interruption.
We have studied the way RealBoy does interrupt processing; every possible source of interruption must be checked for requested interrupts. In general, interrupt processing must be done after executing every instruction, and indeed RealBoy works this way; this is due to the unpredictable nature of interrupts. For example, it is not possible to predict when the user is going to input data. In practice, however, it is possible to implement some short-cuts when processing interrupts for the Game Boy. This is possible for two reasons:
- Practically no games make use of the joypad interrupt facility.
- The rest of the interrupt sources are predictable.
As we see, because all games disable interrupts from the joypad, it is possible to predict when the next interrupt is going to occur, because the rest of the possible sources of interruption have precise timings and are, therefore, completely predictable. Many Game Boy emulators use this facility to predict the time for the next interrupt, therefore enhancing performance since the emulator avoids interrupt processing when it knows that there are no interrupts requested. At the time RealBoy doesn’t take advantage of this fact, and it processes interrupts after executing every instruction.
IMPLEMENTING TIMING IN REALBOY
Having studied interrupt processing in detail, let’s now proceed with how RealBoy handles timing issues, and how it attempts to maintain the whole system in synchrony.
Let’s recall what we have accomplished up to now:
- We have fetched, decoded and executed one instruction (see Emulating The Core, Part 1: The Fetch-And-Execute Cycle).
- We have processed interrupts; we have checked for interrupt requests that might have occurred during the execution of the last instruction.
As we can infer from the above, executing an instruction implies that some time has passed. Because the Game Boy’s devices have very precise timings, we have to make sure to replicate in some form these timings. When emulating, timing issues are always in the podium of the most confusing things to understand, and this honor is more than justified. We are used to think of time in terms of seconds, milliseconds, microseconds, etc. In this respect, it is still true that the hardware have very specific timing. For example, while the LCD display is scanning the VRAM, it continuously changes mode state. This way, mode 0 lasts 48.2uS (microseconds), mode 2 lasts 19uS and mode 3 lasts 41uS. It would be possible to implement this exact timing in software, but in practice such precision is both not necessary and difficult to implement. What in fact is necessary is to keep a relative timing. For this purpose, we ignore the actual timings (measured in microseconds, milliseconds, etc) and instead update things in terms of the instructions’ duration in cycles (measured in CPU clock cycles). It turns out that each instruction has a predefined duration, and this time can be translated into CPU cycles. For example, the last instruction executed, which was ld SP, 0xFFFE, has a cycle duration of 12 cycles. Because the Game Boy’s CPU is clocked at 4194304hz (that is, 4194304 cycles per second), this 12 cycles corresponds to 12/4194304 seconds (approximately 2.8uS). Nevertheless, we ignore this time duration and instead treat ‘the cycle’ as the basic unit for timing; indeed, instruction cycles is going to be our timing unit. Note that we don’t care how much time it actually took to execute the last instruction, what matters is that it took 12 CPU cycles; this timing measure in instruction cycles will enable us to keep our system up to date an synchronized, because knowing the device’s real timings we can deduce their timings in terms of these cycles. For example, we discussed earlier that the LCD display changes modes while scanning the VRAM. For instance, mode 0 lasts 48.2uS. This translates to about 201 CPU cycles. Here is how the operation is deduced: Remember that our last instruction was 2.8uS, which corresponds to 12 cycles (12/4194304). Now, we know that mode 0 of the LCD state lasts 48.2uS, so this is about 201 cycles (201/4194304). This way we can deduce all the LCD mode timings in terms of cycles.
Let’s continue execution to take a better look at what we are saying. We proceed at line 232, so we execute the instruction at that line:
Extract opcode for next instruction
As we can see, immediately after processing interrupts, if no interrupts were requested (as in our case), we extract the opcode for the next instruction to execute from the PC (memory pointed to by %r13); %r12 holds this value in its least significant byte (value 0xAF).
Now, note the label lcd_refrshing. This indicates that we are going to update our LCD display status. Remember that the instruction executed/emulated had a duration of 12 cycles, so the rest of the hardware must be updated accordingly. Unfortunately at this point of execution the LCD display is disabled; it must be enabled explicitly by software every time the Game Boy is powered on. This means that we are going to skip the whole lcd_refrshing routine, which comprises lines 226 to 416; we’ll leave the study of this routine for another post. The LCD display is enabled/disabled through the LCD Control Register at address 0xFF40; this register has the following layout:
FF40 – LCDC – LCD Control (R/W)
Bit 7 - LCD Display Enable (0=Off, 1=On) Bit 6 - Window Tile Map Display Select (0=9800-9BFF, 1=9C00-9FFF) Bit 5 - Window Display Enable (0=Off, 1=On) Bit 4 - BG & Window Tile Data Select (0=8800-97FF, 1=8000-8FFF) Bit 3 - BG Tile Map Display Select (0=9800-9BFF, 1=9C00-9FFF) Bit 2 - OBJ (Sprite) Size (0=8x8, 1=8x16) Bit 1 - OBJ (Sprite) Display Enable (0=Off, 1=On) Bit 0 - BG Display (for CGB see below) (0=Off, 1=On)
As you can see, the most significant bit (bit 7) is used for turning on and off the display; writing 1 to it will turn the display on, and writing 0 will turn it off. Because the bootstrap ROM hasn’t explicitly written this register, the LCD display is currently turned off (default value), so this bit has a value of 0.
Now, instruction at line 232 checks for this bit. This is done every time because if the display is turned off we don’t have to update its status, so we skip this routine. Effectively, the jump at line 233 is not taken, and line 234, as noted earlier, takes us out of the lcd_refrshing routine, all the way to line 423.
We skip the whole lcd_refrshing routine; execution continues at line 418.
We now enter the timer_divider_update routine. This routine uses the instruction’s cycle duration to update the timers status. A timer in our context is a special register that gets incremented at a defined rate. Generally, the purpose of this kind of hardware timers is to ‘signal the software’ that some time has elapsed, or that a specific event has just completed; there is a lot of flexibility for the usage of hardware timers. This kind of ‘hardware-software communication’ is implemented in the form of an interrupt. The timer specific to the Game Boy is composed of three registers, as follows:
FF05 – TIMA – Timer Counter (R/W): This is the register that actually gets incremented at precise rate. When it overflows, the value at register 0xFF06 gets copied here.
FF06 – TMA – Timer Modulo (R/W): Holds the value that gets loaded to TIMA when it overflows.
FF07 – TAC – Timer Control (R/W) with the following layout:
Bit 2 - Timer Stop (0=Stop, 1=Start) Bits 1-0 - Input Clock Select 00: 4096 Hz (~4194 Hz SGB) 01: 262144 Hz (~268400 Hz SGB) 10: 65536 Hz (~67110 Hz SGB) 11: 16384 Hz (~16780 Hz SGB)
The timer’s logic is as follows: If the timer is started (bit 2 of the TAC), the Timer Counter (TIMA) gets incremented at a rate specified by the Timer Control (TAC). When the Timer Counter gets to 0xFF and is incremented, it effectively overflows, and when this happens, it gets loaded with the value held at the Timer Modulo (TMA). If the timer is stopped, nothing is done. Unfortunately, as with the LCD display, the timer is stopped by default upon power on; it must be explicitly started every time the Game Boy is turned on. We will however cover the case when the timer is started on another post.
Let’s go to the code: lines 423 and 424 are important only in Game Boy Color mode; we can ignore them because our example is for the Game Boy.
The instruction at line 425 checks if the timer is started; we keep this information in the tac_on variable. Because the timer is stopped, the jump at line 426 is performed all the way to line 452.
Timer is stopped (disabled).
Execution continues at line 447.
Finally we get to a point where we can show how all this timing in terms of CPU cycles work. Although, as we just saw, the Game Boy’s internal timer device has to be initialized explicitly, the Game Boy presents yet another timer that is always on. This timer does not include an interrupt mechanism, and it is accessed through the Divider Register (DIV) mapped to address 0xFF04. Writing any value to this address will reset it (write 0 to address 0xFF04). What matters to us right now is that this DIV register is incremented at a rate of 16384 times per second (16384 Hz), so now we can actually explain how the CPU cycles can help us to keep an accurate timing.
Now, we know that the executed/emulated instruction lasted 12 cycles, and we also know that we have to increment the DIV register 16384 times per second. Recall that the CPU frequency is 4194304 Hz, which means that every second the CPU produces 4194304 cycles. We want to know the amount of cycles required for the DIV register to be incremented; we get that 4194304/16384=256 CPU cycles are required before incrementing the DIV register.
The logic for the DIV timer is simple; it comprises lines 452 to 455:
%r14 holds the number of cycles for the last instruction; div_ctrl is the Divider Control variable.
Recall that register %r14 keeps the value that corresponds to the cycle duration of the last instruction. Also remember that 256 cycles are required before incrementing the DIV register. For this purpose, we keep the variable div_ctrl, which holds the amount of cycles that have passed. Because we have just executed our first instruction, div_ctrl is 0 before adding the first instruction’s cycles.
Add last instruction’s cycles to div_ctrl.
As we see, line 452 adds the number of cycles of the instruction just executed to the current amount of cycles stored in div_ctrl. This is, div_ctrl is effectively a control variable that holds the total amount of cycles passed. However, instruction at line 452 is somewhat special and perhaps needs some comments. Note that the instruction is addb %r14b, div_ctrl. Note the ‘b’ at the end of add, and at the end of %r14. This ‘b’ is for byte, so the operation does 8-bit byte addition rather than 64-bit word addition. This means that the instruction will add the value at the least significant byte of %r14 (in this case the value 12) to the least significant byte of the div_ctrl variable. Remember that we require 256 cycles before incrementing DIV and that a byte can hold an unsigned number up to 255. Because the instruction performs a byte addition, when the least significant byte of div_ctrl gets greater than 0xFF (255), the carry is not propagated to the following byte; instead, the carry is used to set the carry flag of the host CPU, and the least significant byte of div_ctrl restarts counting from 0. Now, we can take advantage of this feature in the following instruction at line 453. Indeed, the instruction at line 453 is a “jump if the carry flag is not set” instruction. In fact, the instruction performs a jump all the way back to exec_next if the carry flag is not set. But the previous instruction, addb %r14b, div_ctrl, cleared or set the carry flag depending on its result. Because the least significant byte of div_ctrl was 0, and we added 12 to it, no carry was produces, the carry flag was cleared, and the jump is performed. But, if we had, for example, a value of 250 in the least significant byte of div_ctrl, and we added 12 to it, a carry would have been produced, the carry flag would have been set and execution would have continued at line 454, which effectively increments the DIV register, and the new value for the least significant byte of div_ctrl would be (250+12) % 256=7. This way we know when DIV register needs to be advanced.
Execution continues at exec_next, effectively where we started studying ‘the core’.
Now, ‘the core loop’ beginning at exec_next is executed again, and again, and again… ad nauseam.
We have now studied the actual implementation of interrupt processing and timings, two of the ubiquitous issues concerning every computing system. This is a continuation of our first post on the inner workings of RealBoy, that is, Emulating The Core, Part 1: The Fetch-And-Execute Cycle. In the next post about the core, Emulating The Core, Part 3: Interrupt Processing (A Real-World Example), we will take a look at how interrupts are actually serviced by studying a real-world example; we will show you how this is done for servicing an interrupt in the popular game Pokemon Red.
The ‘core’ is fundamentally a loop that is performed on each instruction executed from the beginning of the emulation, until the user decides to quit the program.
HELP US IMPROVE!
Did you like this post? Do you have any suggestions? Please rate this post or leave us a comment so we can improve the quality of our work!