Cx4 notes by ikari_01 https://sd2snes.de -> Version: 0.2 - add clarification on memory mapping - point out that the CPU is halted until caching is complete - cart bus only claimed on actual bus operations - add register $7f48 - correct pin mapping (74 and 75 were swapped) Version: 0.1 (initial) ======================================= These notes add some information about previously unknown/undocumented aspects of the Capcom Cx4 custom chip. It is NOT a complete documentation of the Cx4 but adds bits of information missing in existing documentation. They were compiled while working with the Cx4 and are a bit chaotic, please ask if anything is unclear. Hardware: ========= Pin 74: global memory output enable Cx4 still exposes its MMIO and internal RAM etc. to the bus if this is high, but no ROM or RAM connected to it. Probably for use with cart ROM/RAM connected "alongside" the Cx4 but independent of it. Pin 75: Map select (0=LoROM; 1=HiROM) Memory map ========== LoROM mapping is widely known. Cart RAM is mapped at 70-7f:0000-7fff. HiROM mapping is a bit botched at least on the MMX2 PCB. SNES A15 becomes A20 to the ROM, all other address lines are shifted down by one to close the gap. So the mapping S-CPU -> Cart ROM goes as follows: C0:0000 => 0x000000 C0:8000 => 0x100000 C1:0000 => 0x008000 C1:8000 => 0x108000 C2:0000 => 0x010000 ... DF:7FFF => 0x0FFFFF DF:FFFF => 0x1FFFFF ROM content must be rearranged to match, or rewired. As a plus, in HiROM mode it is possible to use 32MBits of cart ROM in two 16Mbit chips (by leaving $7f52 at $01), from E0:0000 onward the second ROM will be selected. In HiROM mode ROM is mapped as follows (assuming $7f52 = $01) 00-3F:8000-FFFF ROM1 0x100000-0x1FFFFF, ROM2 0x100000-0x1FFFFF 40-7D:0000-FFFF NOTHING (open bus with a bit of noise) 80-BF:8000-FFFF ROM1 0x100000-0x1FFFFF, ROM2 0x100000-0x1FFFFF C0-FF:0000-7FFF ROM1 0x000000-0x0FFFFF, ROM2 0x000000-0x1FFFFF C0-FF:8000-FFFF ROM1 0x100000-0x1FFFFF, ROM2 0x100000-0x1FFFFF Cart RAM mapping: LoROM: 70-77:0000-7FFF HiROM: 30-3F:6000-7FFF, B0-BF:6000-7FFF MMIO mapping: LoROM: 00-3F:6000-7FFF, 80-BF:6000-7FFF HiROM: 00-2F:6000-7FFF, 80-AF:6000-7FFF (to make room for Cart RAM) DMA === DMA source and destination CAN reference the same bus but not the same chip, e.g. cart ROM <-> cart RAM is allowed but cart RAM <-> cart RAM isn't! Same-bus DMA takes WS1+WS2 extra waitstates per cycle. DMA from/to internal RAM only takes WS1 or WS2 extra waitstates depending on the referenced mapping area. Neither DMA source nor destination may point to unmapped areas (-> lockup). ROM is disallowed as a DMA destination (-> lockup). CPU misc. (caching) =================== Cx4 has two program cache pages. They are the only way it can execute code, the CPU CANNOT run directly from cart ROM/RAM. The pages have tags to indicate what program page (from ROM) they currently contain. The CPU will use these to determine whether a jump across page boundaries requires re-buffering of one of the cache pages. If execution reaches the end of cache page 0 and there is no STOP instruction, page 1 will be buffered (according to contents of P register?) and execution continues. On end of page 1, execution halts (implied STOP instruction) and CPU goes idle. During caching of a program page the CPU is halted until all bytes are copied. For more details on caching see $7f4c. For cartridge ROM/RAM access, there are two different configurable waitstate counts (called WS1+WS2 here; see $7f50 below). WS1 applies to cart ROM, WS2 applies to cart RAM. Registers ========= (for sake of completeness in conjunction with $7f47, $7f40-$7f46 are listed here.) $7f40: DMA source low byte $7f41: DMA source high byte $7f42: DMA source bank $7f43: DMA length low byte $7f44: DMA length high byte $7f45: DMA destination low byte $7f46: DMA destination high byte $7f47: DMA destination bank (!) ALSO: Trigger GPDMA (BUS<->internal map) $7f48: R/W Trigger program page caching 0: Page select (0/1) This preloads a cache page with bus data (cart ROM/RAM) pre-set by the offset select ($7f49-$7f4b) and program page select ($7f4d-$7f3e) registers. The appropriate number of waitstates for the designated memory type applies. $7f4c: R/W 1: cache page 1 lock (1=locked) 0: cache page 0 lock (1=locked) The cache page lock flags are used to prevent the CPU from buffering to the corresponding cache page at runtime (e.g. when the pgm_page register is prepared and a JMP/CALL P instruction is executed). The cache pages can still be filled by writing to $7f48. This is more or less a tuning mechanism for the developer who can decide to keep certain code cached at all times. Several constellations must be considered when code is executed in one of the cache pages: no pages locked: ================ If the other page already contains the program page required, the CPU will just jump there. Otherwise the other page will be loaded with the program page contents from ROM prior to jumping and its tag will be updated. either of the pages locked: =========================== The other page cannot be used for buffering so unless it already contains the desired program page the same page is used, overwriting the code that is currently executed. If a RET occurs, the previous program page is swapped back in. This requires reading 512 bytes from cartridge ROM every time so it can get very slow. both pages locked: ================== ONLY the program pages that have been pre-cached by writing $7f4d/e and $7f48 are available to the CPU. If a different program page is requested prior to execution either by $7f4d/e -> $7f4f, or by a JMP/CALL P instruction at runtime, execution will stop immediately. $7f50: 7: - 6-4: WS1 (ROM read waitstates) (0-7) 3: - 2-0: WS2 (cart RAM read/write waitstates) (0-7) $7f51: 0: IRQ ack / inhibit (write 1 to ACK and disable further IRQ write 0 to enable IRQ) $7f52: ROM configuration select LoROM: 0: 2x 8Mbit (A21 switches between ROM /CE1 and /CE2) 1: 1x 16Mbit (maybe A22 switches but 40-7f/c0-ff are inactive) HiROM: 0: 2x 8Mbit (A20 switches) 1: 2x 16Mbit (A21 switches) $7f53: READ (mirrors: $7f54-$7f57, $7f59, $7f5b-$7f5f) 7: CPU is accessing ROM bus (SNES cut off) 6: CPU is running 5-2: - 1: IRQ pending 0: Cx4 suspended (see $7f55-$7f5d) WRITE (no mirrors) Any write access returns the Cx4 to idle state immediately - useful to recover from infinite loops ;) $7f55: Any write access indefinitely suspends the Cx4 (registers can be read and written but the CPU shows no reaction: no buffering occurs, no code is run and $7f53 is not updated). Cx4 status bit 0 is set. $7f56: Any write: Suspend Cx4 for 32 cycles ( 1.6µs @20MHz) $7f57: Any write: Suspend Cx4 for 64 cycles ( 3.2µs @20MHz) $7f58: Any write: Suspend Cx4 for 96 cycles ( 4.8µs @20MHz) $7f59: Any write: Suspend Cx4 for 128 cycles ( 6.4µs @20MHz) $7f5a: Any write: Suspend Cx4 for 160 cycles ( 8.0µs @20MHz) $7f5b: Any write: Suspend Cx4 for 192 cycles ( 9.6µs @20MHz) $7f5c: Any write: Suspend Cx4 for 224 cycles (11.2µs @20MHz) These registers can be used to obtain guaranteed access to ROM/RAM from the SNES side while the Cx4 is running. CPU and/or ongoing DMA transfers are suspended. $7f5d: Any write clears the Cx4 suspend flag and the chip becomes responsive again (presumably resumes execution). $7f5e: Any write clears the IRQ pending flag WITHOUT touching the actual cart IRQ signal (remains low) If IRQ is enabled, /IRQ goes high->low when the Cx4 CPU stops, and stays low. Software must ACK by writing 1 to $7f51 bit 0 -> /IRQ will go high again. Software must then write 0 again to re-enable IRQ triggering for the next execution. CPU registers ============= $20: PC (PC of current instruction + 1) $28: ??? (always seems to return $2e) $2e: cart ROM bus port (triggers cart ROM reads), to be used with $61 opcode! Waitstates = $7f50 bits 6-4. $2f: cart RAM bus port (triggers cart RAM reads/writes), to be used with $61 / $e1 opcode! Waitstates = $7f50 bits 2-0. $70-$7f are mirrors of $60-$6f ("R0-R15"). internal register address appears to be 7-bit, e.g. $e0-$ff are the same as $60-$7f. Instruction cycles ================== 1 cycle = 50ns (@20MHz) duh The vast majority of instructions execute in a single cycle. Exceptions/noteworthy details are: - jmp/call takes 1 cycle if branch not taken, 3 cycles if taken (regardless of p flag, crossing page boundaries comes at no extra cost) - ret takes 3 cycles - skip takes 1 cycle for itself, but it makes the skipped instruction count for 1 cycle (injected NOP or equivalent) - internal data rom/ram access takes 1 cycle only. - cart ROM access from Cx4 code: * Cartridge ROM is accessed by reading from register $2e to a special internal register (fullsnes: ext_dta). (Opcode: $612e) * The read itself is 0-waitstate and executes in a single cycle. However the result will not be valid before the appropriate number of waitstates is reached and the data is actually pulled in from the ROM. * The CPU may execute other code in the meantime. * To stall the CPU until the ROM read operation is complete, a wait instruction can be issued ($1c00). * The external bus address does not auto-increment; to do so, a special instruction can be issued ($4000). There may be a decrement instruction as well (as of yet unknown). It is useful to do this before the wait instruction to save a cycle. * The number of waitstates can be configured by setting $7f50 bits 6-4. - cart RAM access from Cx4 code: * Cartridge RAM is accessed by reading from register $2f to a special internal register, or by writing to register $2f from the same. (Opcode: $612f (read) / $e12f (write)) * Access handling appears to be the same as for cart ROM, and the same for reading and writing: issue read/write; alter bus address; wait for complete (or do something else in the meantime) * The number of waitstates can be configured by setting $7f50 bits 2-0. Cartridge bus is only claimed by the Cx4 when DMA or caching occurs, or a bus access is carried out by Cx4 code. At all other times the SNES address+data buses are forwarded to the ROM/RAM even if the Cx4 CPU is running. Flags ===== As of yet untouched. higan seems to do a decent job at them already. scribble / internal notes from testing ====================================== This is probably useless but eh. Cx4 code base: 01:8000 PC always 00 1 cycle = 50ns 255 NOP + 1 STOP in 12.8us -> 50ns/inst -> 1 cycle/inst 255 JMP + 1 STOP in 38.4us -> 150ns/inst -> 3 cycles/inst 255 MOV -> A/R0 + 1 STOP in 12.8 us -> 50ns/inst -> 1 cycle/inst 255 RDWR ROM/RAM + 1 STOP in 12.8 us -> 50ns/inst -> 1 cycle/inst 255 RDBUS (612E) -> CRASH 127 RDBUS+WAIT + 1 STOP in 32us -> 250ns/pair -> 5 cycles total (WS = 4) 127 RDBUS+WAIT + 1 STOP in 19.2us -> 150ns/pair -> 3 cycles total (WS = 2) 127 RDBUS+WAIT + 1 STOP in 12.8us -> 100ns/pair -> 2 cycles total (WS = 1) 127 RDBUS+WAIT + 1 STOP in 12.8us -> 100ns/pair -> 2 cycles total (WS = 0!) 127 RDBUS+INC + 1 STOP in 12.8us + CRASH -> 100ns/pair -> 2 cycles total (WS = 4!!!!!) 85 RDBUS+INC+WAIT + 1 STOP in 21.4us -> 250ns/triple -> 5 cycles total (WS = 4) 85 WRBUS?!+INC+WAIT + 1 STOP in 21.4us -> 250ns/triple -> 5 cycles total (WS = 4) Page file ====================== 00 cx4_00_nop.bin 1 01 cx4_08_jmp.bin 3 02 cx4_64_mova.bin 1 03 cx4_e0_movr0.bin 1 04 cx4_70_rdrom.bin 1 05 cx4_68_rdram_r0.bin 1 06 cx4_6c_rdram_imm.bin 1 07 cx4_e8_wrram_r0.bin 1 08 cx4_ec_wrram_imm.bin 1 09 cx4_40_rdbus_wait.bin ?! 0a cx4_40_rdbus_nowait.bin 1 0b cx4_24_skip_unknown.bin 1 0c cx4_25_skip_nc.bin 1 0d cx4_25_skip_c.bin 1 0e cx4_28_call.bin 3 0f cx4_81_add_shl1.bin 1 10 cx4_61_bustest.bin - (1) 11 cx4_61_bustest_1c.bin 1+WS (1 cycle 612e; ->WS cycles 1c00) 12 cx4_61_bustest_40.bin 2+crash (1 cycle 612e; 1 cycle 4000) 13 cx4_61_bustest_401c.bin 2+(WS-1) (1 cycle 612e; 1 cycle 4000; ->WS cycles 1c00) 14 cx4_e0_bustest_401c.bin 1 15 cx4_e1_bustest_401c.bin 1 16 cx4_e12f_bustest_401c.bin 2+WS2-1 !!!! 17 cx4_612f_bustest_401c.bin 2+WS2-1 !!!! 18 cx4_e02f_bustest_401c.bin 3 (doesn't work!) 19 cx4_0a_jmp_p1.bin 3 (P test not applicable - see 1d+1e) 1a cx4_0a_jmp_p2.bin 3 (P test not applicable - see 1d+1e) 1b cx4_0c_jz.bin 1 (not taken), 3 (taken) 1c cx4_10_jc.bin 1 (not taken), 3 (taken) 1d cx4_3c_ret.bin 3 1e cx4_2a_call_p.bin 1 (not taken), 3 (taken) 1f cx4_25_skiptest_nc.bin 1 (not taken), 2 (taken) 20 cx4_25_skiptest_c.bin 1 (not taken), 2 (taken) 21 cx4_25_skiptest_nc_alljmp.bin 22 cx4_25_skiptest_c_alljmp.bin 23 cx4_xx_infloop.bin - 24 cx4_xx_busloop.bin (for 0x80 flag testing) 25 cx4_xx_dumpflags.bin (hopefully) 26 cx4_f0_xchg.bin 1 27 cx4_xx_opdump.bin - 28 cx4_xx_flagloop.bin - 29 cx4_xx_opdump2.bin - 2a cx4_xx_opdump3.bin - opdump: PC regs --------------- 00 04,05,06,07,09,0a,0b,0d,0e,0f,10,11,12,14,15,16 28 17,18,19,1a,1b,1d,1e,1f,20,21,22,23,24,25,26,27 50 28,29,2a,2b,2c,2d,30,31,32,33,34,35,36,37,38,39 78 3a,3b,3c,3d,3e,3f,40,41,42,43,44,45,46,47,48,49 a0 4a,4b,4c,4d,4e,4f,70,71,72,73,74,75,76,77,78,79 c8 7a,7b,7c,7d,7e,7f,80,81,82,83,84,85,86,87,88,89 f0 8a,8b,8c,8d,8e,8f,90 opdump2: PC regs --------------- 00 91,92,93,94,95,96,97,98,99,9a,9b,9c,9d,9e,9f,a0 28 a1,a2,a3,a4,a5,a6,a7,a8,a9,aa,ab,ac,ad,ae,af,b0 50 b1,b2,b3,b4,b5,b6,b7,b8,b9,ba,bb,bc,bd,be,bf,c0 78 c1,c2,c3,c4,c5,c6,c7,c8,c9,ca,cb,cc,cd,ce,cf,d0 a0 d1,d2,d3,d4,d5,d6,d7,d8,d9,da,db,dc,dd,de,df,e0 c8 e1,e2,e3,e4,e5,e6,e7,e8,e9,ea,eb,ec,ed,ee,ef,f0 f0 f1,f2,f3,f4,f5,f6,f7 opdump3: PC regs --------------- 00 f8,f9,fa,fb,fc,fd,fe,ff