NeoGeo Pocket Color Sound Programming

As I began looking into the area of sound programming for the NGPC I found that a) No single document covered all the topics I was interested in; b) None of the documents went into sufficient detail on the sound-related topics; and c) Most of the available NGPC emulators did not emulate the system accurately enough to test and debug sound-related code.
This lead to a series of experiments that I conducted on an actual NGPC console, in which I tried to fill in some of the blanks and get a more detailed understanding of the sound capabilities of the NGPC.
Much of the information in this document is based upon the results of those experiments, and should therefor be accurate to a sufficiently high degree for most practical purposes. I can not guarantee, however, that this document is completely free of factual errors.

  d<n>:		Used to denote individual bits. E.g. d7 for bit 7.
  MMIO:		Memory-mapped I/O
  r8:		8-bit general purpose CPU register
  r16:		16-bit general purpose CPU register
  r32:		32-bit general purpose CPU register
  T900:		TLCS-900/H
  uDMA:		microDMA

 Target audience

This document assumes some basic knowledge about the NGPC system, low-level programming, as well as familiarity with programmable sound generators in general (and the SN76489 in particular).
As such, it might provide useful to someone wanting to write their own NGPC sound driver, an aspiring NGPC emulator author, or someone just curious about the NGPC hardware.


The sound chip found in the NGPC (aka the T6W28) is very similar to the SN76489 found e.g. in the Sega Master System, and knowledge about how the SN76489 works can be directly carried over to the NGPC in most cases. A few differences exist, which will be described throughout this document.
Other, more general topics which are relevant to sound programming, such as interrupt processing and uDMA, are also covered.

 Sound chip access

 Tone channels

 Noise channel

The linear feedback shift register (LFSR) used in the NGP(C) is 15 bits wide. I'm not 100% sure about which bits are being tapped to generate the pseudo-random sequence, as I haven't found a pattern that perfectly predicts the recorded output from a real NGPC. The closest match I've found was when bits 0 and 2 were tapped.

   -> [0|0|0|0|0|0|0|0|0|0|0|0|0|0|1] -> output
  |                            |   |
  |                            |   |
  |                        <--/    |
   \------------------ XOR <------/


0x00A2 - DACL - DAC Left Output Register
0x00A3 - DACR - DAC Right Output Register

 Shared RAM

The Z80 has 4 kB of RAM at $0000-$0FFF. This memory is also accessible from the T900 side at $7000-$7FFF.

The T900 appears to be prioritized over the Z80 on the data bus when both processors want to accesses the shared RAM area at the same time. This is worth keeping in mind, since you can end up with severly degraded performance of your Z80 code if the T900 is accessing shared RAM very frequently. The T900 code is also likely to take a small performance hit in such a situation, as there will be at least some instances when the Z80 has been granted bus access and the T900 has to wait. Other parts of RAM (below $7000) can be accessed by the T900 while the Z80 is running without any such performance impact.

 TLCS-900/H <-> Z80 communication


0x8000 - ICR - K1GE Interrupt Control Register
  Bit	Expl.
  0-5	Unknown
  6	Enable HBlank interrupts
  7	Enable VBlank interrupts

HBlank interrupts are requested before the active display period of a scanline begins, meaning that no HBlank interrupt request will be generated at the end of scanline 151.
HBlank interrupts need to be enabled when using them as a clock source for Timer0.
It's generally a bad idea to disable VBlank interrupts, since e.g. the watchdog timer typically is cleared inside the VBlank interrupt handler.


0x0020 - TRUN - Timer Operation Control Register
  Bit	Expl.
  0	Timer0 on/off (0=off, 1=on)
  1	Timer1 on/off
  2	Timer2 on/off
  3	Timer3 on/off
  4-6	Unknown*
  7	Prescaler on/off**

* None of these bits appear to have any effect on normal interval timing.
** Disabling the prescaler will effectively stop the timer count-up process. The exception to this is when HINT is used as the source for Timer0, as it doesn't rely on the prescaler counter.

0x0024 - TMOD01 - Timer 01 Mode Register
  Bit	Expl.
  0-1	Timer0 clock source (0=HINT, 1=T1, 2=T4, 3=T16)
  2-3	Timer1 clock source (0=Timer0, 1=T1, 2=T16, 3=T256)
  4-5	PWM cycle generation
  6-7	Operation mode (0=two 8-bit timers, 1=one 16-bit timer, 2=PPG 8-bit, 3=PWM 8-bit)

0x0028 - TMOD23 - Timer 23 Mode Register
  Bit	Expl.
  0-1	Timer2 clock source (0=Prohibited*, 1=T1, 2=T4, 3=T16)
  2-3	Timer3 clock source (0=Timer2, 1=T1, 2=T16, 3=T256)
  4-5	PWM cycle generation
  6-7	Operation mode (0=two 8-bit timers, 1=one 16-bit timer, 2=PPG 8-bit, 3=PWM 8-bit)

The available prescalers correspond to the following frequencies, irrespective of the currently set CPU speed:
  T1	  48 kHz
  T4	  12 kHz
  T16	   3 kHz
  T256	187.5 Hz

* There's a document from SNK which mentions in passing that it's possible to set a 384 kHz clock source for Timer2 (which would mean the input clock for the prescaler, aka T0). The 8-bit timer reference manual however only lists T1, T4 and T16 as valid clocks for Timer2, and I haven't found any way of getting Timer2 to run at 384 kHz. It's possible that using T0 was intended to be allowed at first and that this decision was later changed. Or perhaps it was a feature specific to the K1 Sound Simulation card (a piece of development hardware that added NGP(C) sound capabilities to a PC without the need for software emulation).


Sometimes you want to transfer a small amount of data at regular intervals, e.g. to do raster effects or to play sampled audio. Rather than resorting to timed loops you can make use of the four uDMA channels provided by the NGPC's main CPU package.

Each uDMA channel is controlled through four CPU control registers, plus a set of MMIO registers.

CPU Register DMAS0 - DMA0 Source Address
CPU Register DMAS1 - DMA1 Source Address
CPU Register DMAS2 - DMA2 Source Address
CPU Register DMAS3 - DMA3 Source Address
ldc DMAS0,r32 *
ldc r32,DMAS0 *
  Bit	Expl.
  0-23	Source address

CPU Register DMAD0 - DMA0 Destination Address
CPU Register DMAD1 - DMA1 Destination Address
CPU Register DMAD2 - DMA2 Destination Address
CPU Register DMAD3 - DMA3 Destination Address
ldc DMAD0,r32 *
ldc r32,DMAD0 *
  Bit	Expl.
  0-23	Destination address

CPU Register DMAC0 - DMA0 Transfer Count
CPU Register DMAC1 - DMA1 Transfer Count
CPU Register DMAC2 - DMA2 Transfer Count
CPU Register DMAC3 - DMA3 Transfer Count
ldc DMAC0,r16
ldc r16,DMAC0
  Bit	Expl.
  0-15	Transfer count (A setting of 0 means 65536)
Note that the transfer count isn't in bytes, but in number of transfers. So if each transfer is set to transfer 4 bytes, you'll transfer 4*DMACn bytes in total (assuming that you don't disable the interrupt source triggering the uDMA or otherwise prevent it from running).

CPU Register DMAM0 - DMA0 Transfer Mode
CPU Register DMAM1 - DMA1 Transfer Mode
CPU Register DMAM2 - DMA2 Transfer Mode
CPU Register DMAM3 - DMA3 Transfer Mode
ldc DMAM0,r8
ldc r8,DMAM0
  Bit	Expl.
  0-1	Transfer size (0=one byte, 1=two bytes, 2=four bytes, 3=prohibited)
  2-4	Source/destination address control
  5-7	Should Be Zero
The available address control modes are **:
  0	Increase destination address after each transfer
  1	Decrease destination address after each transfer
  2	Increase source address after each transfer
  3	Decrease source address after each transfer
  4	Fixed source/destination address
  5	Counter mode. No data is transfered, but DMASn is increased by 1 after each transfer
Common for each of these modes is that DMACn is decreased by 1 after each transfer, and if it reaches zero, an interrupt request will be placed.

* The DMAS/DMAD registers are 32-bit, but only the lower 24 bits are used.
** Using an address control mode other than the ones listed will cause the system to shut down.

0x007C - DMA0V - DMA0 Start Vector
0x007D - DMA1V - DMA1 Start Vector
0x007E - DMA2V - DMA2 Start Vector
0x007F - DMA3V - DMA3 Start Vector
  Bit	Expl.
  0-4	Interrupt source that will trigger uDMA transfers on this channel
  5-7	Should Be Zero

Available interrupt source vectors:
  0x0A	RTC Alarm
  0x0B	VBlank
  0x0C	Z80-triggered interrupt
  0x10	Timer0
  0x11	Timer1
  0x12	Timer2
  0x13	Timer3
Transfer Rate
Each transfer takes at least 8 states when the transfer size is one or two bytes, and at least 12 states when the transfer size is four bytes (1 state = 325.5 ns).
Toshiba's documentation states that when either the source or destination is accessed over an 8-bit bus (e.g. cartridge ROM), each transfer will take an additional 2 states per such address (i.e. up to 4 extra states in total). The same penalties apply for use of unaligned addresses on a 16-bit bus.

An uDMA transfer will be carried out with the highest possible maskable priority (6), regardless of the priority of the interrupt source that triggered the transfer.
If multiple uDMA channels are triggered at the same time, priority will be given to the channel with the lowest channel number (i.e. DMA0 has the highest prio, DMA3 the lowest).


Code to set up uDMA0/Timer0 to write to the left and right audio DACs at 16000 Hz.

  ; stop all timers
  ldb  (TRUN),0

  ; set up timer0 to generate IRQs at 16000 Hz
  ldb  (TMOD01),1
  ldb  (TREG0),3
  ; flip-flop / double-buffering isn't used here
  ldb  (TFFCR),0
  ldb  (TRDC),0
  ; set the timer0 interrupt level to 0 (disable timer0 interrupts, but
  ; still allow it to trigger uDMA transfers), and the uDMA0 transfer
  ; completion interrupt level to 5.
  ldb  (INTET01),0
  ldb  (INTETC01),5

  ldl  xwa,sampleData
  ldc  DMAS0,xwa
  ldl  xwa,DACL
  ldc  DMAD0,xwa
  ; set each transfer to 2 bytes (i.e. both DACL and DACR are written to),
  ; and increase the source address after each transfer.
  ldb  w,9
  ldc  DMAM0,w
  ; perform 65536 transfers (one per timer0 expiration)
  ldw  wa,0
  ldc  DMAC0,wa
  ; set this uDMA channel to be triggered by timer0 IRQs
  ldb  (DMA0V),0x10

  ; start timer 0
  ldb  (TRUN),0x81

 About this document

This document was written 2012-2013 by Mic. Layout code was borrowed from GBATEK (2001-2007 Martin Korth).

Send your questions/comments to micol972 <curly a> gmail <dot> com