About a month before the deadline of the Neoflash 2010 Summer Retro Coding Compo I was pondering what I should do for an entry. I considered a few possible projects, including some porting prospects, some of which I may take up sometime in the future. But in the end I decided to try to write an NSF player for the PC-Engine (aka TurboGrafx).
Now, for those unfamiliar with NSFs, an NSF (NES Sound Format file) contains music data and code, and is the standard format used for ripping soundtracks from NES/Famicom games. It's also used for newly created music by musicians who work with NES music creation tools such as FamiTracker, NerdTracker II or ppmck.
The code in question is machine code for the NES's CPU (a variant of the 6502, combined with the sound hardware and some other features into the 2A03/2A07). An NSF player will therefor typically emulate all or most of the 2A03 in order to play back the music. This is quite unlike logged formats such as VGM or YM, where the player only needs to emulate the sound chip and very little else.
The idea of writing an NSF player for the PCE came from the fact that the PCE's CPU (part of the HuC6280 circuit) also belongs to the 65xxx family. In fact, its instruction set architecture is a superset of the 2A03's, making it binary compatible with code compiled for the 2A03 - assuming that certain conditions are met, which I will discuss below.
The same idea has already been used in the past for a SNES NSF player developed by Memblers, as the 65C816 core used in the SNES had a 65C02 compability mode. The HuC6280 is actually faster than the SNES's 65C816 executing in 8-bit mode, but the SNES had an advantage over the PCE in the amount of RAM available (I'm only talking about advantages germane to writing an NSF player - I'm not comparing the two consoles in general terms).
Four's a Crowd
As I've mentioned there are some conditions that need to be met in order for the code within an NSF to execute properly on the PCE, most of which are related to memory mapping.
They key memory areas within NES memory space that an NSF's playback code will try to access (besides "ROM") are WRAM at $0000-$07FF and the NES APU registers at $4000-$4017. Some might also use SRAM at $6000-$7FFF.
Others yet may try to access the registers of extra sound channels included on more advanced mappers, such as MMC5 or VRC7. Expansion sound isn't supported by my player though - for a variety of reasons.
The PCE's 64 kB logical address space is split into eight pages. Each 8 kB page can be set to map against a given page of ROM, the PCE's 8 kB of work RAM (WRAM), the memory-mapped I/O registers, or the 2 kB of battery-backed RAM (BRAM) if it's available. There are eight paging registers named MPR0 - MPR7, and some special instructions (TAM/TMA) that are used for this purpose.
Page 1 ($2000-$3FFF) is pretty much required to be mapped against WRAM, since zeropage (ZP) instructions on the HuC6280 implicitly access the $2000-$20FF logical region rather than $0000-$00FF. The interrupt vectors are expected to be located at $FFF6-$FFFF, so whatever is mapped into page 7 needs to contain the correct vectors in the final 10 bytes.
Page 0 also needed to be mapped to WRAM so that the NSF's code can access NES WRAM beyond the ZP area. Likewise, page 2 ($4000-$5FFF) is mapped to WRAM in order to pick up the values the NSF's code writes to the NES APU registers. Page 3 ($6000-7FFF) is also mapped to WRAM to enable those NSFs to play which rely on SRAM.
Here the problem of the PCE's relatively small amount of RAM becomes apparent. With four pages mapped to the same 8 kB of physical RAM there's a fair risk of overlap - especially between the ZP area and the NES APU register area. Using this memory setup, if the NSF's code wrote to ZP address $0C and also to the NES APU register $400C it would be writing to the same physical address in PCE WRAM. This was a major concern when I started writing the player. So I used the memory viewer in nezplug++ to see how much ZP RAM that NSFs typically would use. To my satisfaction I found that typical ZP usage was fairly low and that, more importantly, few of the NSFs I checked used any of the ZP RAM in the $00-$17 range.
Some NSFs do use ZP RAM at $00-$17 though, which will cause them to either play incorrectly or not at all. Some examples that show this problem are Darkman, Goonies II, Gradius, Ninja Gaiden 3 and Rygar. One way to get around some of these address clashes is to map page 2 ($4000-$5FFF) to the 2 kB BRAM instead of WRAM, assuming that BRAM is available. However, BRAM was not used in the version of the player entered in the competition. Another possible solution would be to patch the NSFs in question so that any access to the $00-$17 area would be changed to some other free area in ZP RAM. But locating all the instructions to patch would require a fair amount of work, so it's not a very good option.
Breaking the Bank
NSFs also have the ability to use bankswitching, with eight 4 kB logical banks mapped into the $8000-$FFFF area in NES memory space. The NSF header contains the initial bank setup, which the NSF's code can alter at runtime by writing to a set of banking registers at $5FF8-$5FFF in NES memory space.
Runtime bankswitching is not supported by my player for a couple of reasons: First, the 4 kB bank size differs from the 8 kB page size on the PCE. So there's no way of independently swapping a single bank, unless the NSF had been modified in such a way that all the banks had been duplicated to form all possible 8 kB bank pairs. Secondly, I'd have no good way of detecting writes to the bankswitching registers as soon as they occur. If the NSF's code tried to jump to a bank it thought it had just switched in - without giving control bank to the player first - it would end up jumping to the wrong place, with a high probability that it would start executing garbage.
The rombuilder application has two purposes, besides bundling all the NSFs with the player: One is to separate the NSF headers from the actual code/data, as they will be stored at different places in the final player ROM. The other is to reorder the physical banks according to the initial bank setting specified in the NSF header. This order is then assumed not to change at runtime (if it does, the player will either crash or output garbage).
The lack of bankswitching support also limits the maximum size for NSFs that my player can handle to 32 kB, i.e. those eight banks specified in the NSF header. If an NSF has less than 8 banks it will be padded to 32 kB by the rombuilder during the reordering phase.
Apples and Oranges
The 2A03's sound capabilities include five channels; two square wave channels, one triangle wave channel, one white noise channel, and one delta modulation channel (DMC) capable of playing low-quality samples. I skipped the DMC in my player since I couldn't think of any way of emulating it efficiently on the HuC6280.
Each channel has a length counter which, when enabled, will count down at 60 Hz (for NTSC) until it reaches zero, at which point it will effectively mute the channel.
The square wave channels and the noise channel have envelope generators that can generate a constant volume, a linearly decreasing volume, or a looping linearly decreasing volume (giving an envelope in the shape of a low-frequency sawtooth wave).
The two square wave channels are capable of generating square waves with four different duty cycles (12.5%, 25%, 50% and 75%). An 11-bit counter determines the period of the waveforms. Both channels also have their own sweep unit running at 120 Hz (for NTSC), which can slide the period up or down by a portion of the current period.
The triangle wave channel lacks a real envelope generator. Without using any special tricks to try to adjust its volume it's either fully on or off. Unlike the other channels it has a second counter - called the linear counter - counting down at a higher frequency, which can be used in combination with the length counter.
The noise channel is made up of a 15-bit shift register which is tapped at one of 16 different frequencies, ranging from a few hundred Hz to a few hundred thousand Hz.
The programmable sound generator (PSG) integrated into the HuC6280 provides six channels, each being capable of playing a looped waveform consisting of thirty-two 5-bit samples. The fifth and sixth channels can also be set to output white noise rather than the data from the channel's sample memory, though the frequency range of the noise generator differs a bit from that on the NES.
Each channel has its own volume control with 32 levels, compared to the 2A03's 16 levels. There's also one balance register per channel, plus a global balance register. These can be used to control the overall volume and panning. Another difference is that the HuC6280 - like the SN76489 used in several earlier consoles such as the Sega Master System - uses logarithmic volume control, while the 2A03 uses linear volume.
Five different samples are included in the player; 4 for the different square wave duty cycles, and 1 for the triangle wave. Replacing the sample for a given channel on the HuC6280 requires temporarily disabling the channel, loading the 32 bytes of sample data, then re-enabling the channel. This caused some concern for me about whether there would be noticeable distortion from repeatedly turning the channel on and off when playing an NSF that frequently changes the duty cycle for one or both of the square waves.
Let's Give It a Shot
Given the uncertainties, I decided to start out small to determine the feasibility of the project. I found a suitable NSF (Duck Tales) which had all its banks in sequential order, since I hadn't yet written the rombuilder at that point.
The first experiment was simply to call the NSF's INIT routine and see if it successfully returned back to the player, or if it would crash because of a conflict in the memory mapping or some other unforeseen problem.
That test went well, so I proceeded to emulated the square wave channels, excluding the envelope generators, length counters and sweep units. I was encouraged when I heard it play something that resembled the familiar Duck Tales theme. It was out of tune and, as I recall, one of the channels was just playing noise because of some typos in my code. Still, it was a success which told me that the project should be doable.
The Real Deal
The first prototype only took a day or two to put together, but the vast majority of the work remained. A considerable amount of time went into writing and tweaking the code for emulating all the sound features of the 2A03 (minus the DMC). The first version had a timer interrupt trigger on the HuC6280 at 60 Hz (or 50 for PAL), at which point it would call the NSF's PLAY routine, and then check if any of the NES APU registers ($4000-$4017) had been modified and use the register values to do the necessary updates on the HuC6280 PSG. When I began adding support for more 2A03 features, like the envelope generators and sweep units it became clear that I needed to trigger interrupts at a higher rate - 240 Hz to be precise (or 200 for PAL). Every time a timer interrupt occurs (which I will refer to as a step) the linear counter and the envelope generators are updated. The sweep units and length counters are updated every second step, and everything else (duty cycles, periods, etc.) is updated every fourth step - after calling the NSF's PLAY routine.
Before calling the NSF's PLAY routine, the (PCE) player code which normally resides at page 7 ($E000-$FFFF) is swapped out, and the last 8 kB of the NSF is swapped in. The interrupt handlers therefor needed to be placed in RAM, as there were no available pages to map against anything other than RAM or the NSF while the NSF is executing. An additional bit of page-juggling takes place in the interrupt handlers since they need to access the PCE I/O registers. So the I/O registers are temporarily paged in at $6000-$7FFF, and then back out before the handler returns. As I mentioned before, the interrupt vectors are expected to be found at $FFF6-$FFFF. To ensure that this is the case even while the NSF is executing I made the rombuilder overwrite the last 10 bytes of the NSF with the PCE interrupt vectors, which it reads from the player binary.
Since writes to some of the NES APU registers have side effects, I needed some way of knowing when they had been written to by the NSF's PLAY routine. Just checking if the value had changed since the last iteration wasn't enough since that wouldn't detect multiple writes with the same value. I ended up writing a bogus value to the registers before calling the NSF's PLAY routine - just something that seemed like an unlikely value for a music driver to use - and checking afterwards if the register's value was my bogus value or something else. It's not a bullet-proof solution, but it worked well enough.
And with that I give you the final result:
I guess Hudson Does What Nintendo...es. Or something like that.