Before CPS1.5, I already had evidence of intersymbol interference going on in the memory. When I adjusted the SDRAM clock phase, I could see that the valid range was much wider for 32MB devices than for 128MB ones. But also, it was much wider if I held the data bus down when not accessing it. That could be explained as some interference.
Then two cores were prone to have ROM test failures after loading: Contra and Double Dragon 1. You have seen the ROM BAD message on those. But they all had the same memory controller, at the same speed. And they all worked fine in other platforms.
I had designed everything at 48MHz to avoid dealing with high frequencies. But for CPS1 I needed more speed, so I moved to 96MHz at some point. Only to find that it didn't work for too many people. Holding the bus down, as described earlier, and looking for the right clock shift for several modules brought a solution that worked for most people. Still, you could have a memory that produced a ROM BAD error in Ghouls'n Ghosts. You wouldn't know it unless you booted that game, of course.
The CPS1.5 case:
CPS1.5 increases memory access to some 12MB/s in total (compared to 10MB/s of CPS1). I decided it was time to increase the SDRAM controller performance. I designed a new controller using bank interleaving. I found then that dropping the DQ mask lines was going to hit performance too. Compared with the old SDRAM modules (or MiST), the newer modules that have DQM and A lines shorted had made a compromise losing performance for a reduced pin count. This has an electrical implication too. Still, everything worked well until I released it. Then, 20% of users were not happy. Neither was I.
The games will consistently hang at the same time. However, the time was different for each user. The consistency for a given user setup pointed to an intersymbol interference problem.
Note that the core worked for 32MB modules and also on MiST and SiDi platforms. Plus, I had extensive simulations -that include all SDRAM timing constraints- proving that the design was sound. So I thought this would be an electrical problem. I put back my analogue IC engineer hat, which I normally leave at work, and started looking at the module.
There are four electrical problems with the modules.
1. VDD ripple is completely out of spec
2. Too much ringing in signals, particularly on FPGA output pins
3. Signal coupling
4. Clock bandwidth
VDD is between 2.6 and 4.0V in all modules, including the 32MB ones. It is not rare to see values as extreme as 4.2 or 2.4V. Notice that the SDRAM spec states that VDD should be within 3.6V and 3V.
VDD ripple is made worse by 128MB modules because power is 4x. Clocking frequency or multibank access will increase the amount of current switching through VDD, so VDD ripple is higher in those cases.
FPGA outputs that go the module show large ringing that can last for 5ns, half the period of a 100MHz clock.
Ringing gets worse for 128MB modules because the load capacitance to the FPGA is 4x larger. The FPGA output pads will produce more current to charge the net simultaneously, and the extra current across the line and connector inductance creates the ugly ringing.
This effect can get slightly improved by setting the slew rate of the FPGA outputs to a minimum. Signal transition times go from 3ns to 4ns when setting the slew assignment to zero in Quartus. Ringing amplitude halves in that case.
Note that lines A12 and A11 are especially loaded in modern modules because of a design change. At some point, Sorg decided to short A12 and A11 to DQMH and DQML to save two pins. For most cores, this does not impact performance, but it does increase the capacitance on the FPGA pins. Thus lines A12 and A11 have more ringing than the rest.
As stated above, when some combinations of values occur in the bus in a given sequence, there is a chance to access wrong data.
To give more time to signals to settle and avoid interference, I tried changing the bus signals at different SDRAM access cycle steps. For instance, the FPGA can set DQ signals at the RAS step, and leave the DQ bus ready ahead of the CAS step. However, the signals that change at RAS and CAS are almost the same (the A bus) but with different values. I found a consistent error in the Contra ROM load when setting the DQ bus at RAS. Only one module accepted it (a 32MB one). All the other failed. When moving the DQ setting to the CAS step, a total of four modules worked ok.
Finally, when setting the slew to slowest for all output signals, all modules worked regardless of whether DQ was set at RAS or CAS. The following table summarizes this. Modules 2 and 3 are 32MB; the rest of them are 128MB.
Code: Select all
Module | DQ at RAS | DQ at CAS -------|-------------|------------- | fast | slow | fast | slow -------|------|------|------|------ 1 | NG | OK | OK | OK 2 | NG | OK | NG | OK 3 | OK | OK | NG | OK 4 | NG | OK | OK | OK 7 | NG | OK | NG | OK 8 | NG | OK | OK | OK 9 | NG | OK | OK | OK -------|------|------|------|------ Fails | 6 | 0 | 3 | 0
Clock measurements at 48MHz show overvoltage (3.8V) and undervoltage (-1V). Setting the slew to a minimum improves the clock shape (3.78V max and -0.66V).
At 96MHz, the situation is worse. The clock -using the fastest slew- is just a sinewave. Reducing the slew only makes the sinewave amplitude smaller. My oscilloscope has a 100MHz bandwidth so it might be filtering the higher harmonics but, just by look at the edges of the 48MHz clock (3ns each with peaking at both ends), it does make sense that at 96MHz the waveform will look like a sinewave. Note that the sinewave is not bounded by VDD-0V but does go all the way to -0.8V.
I think operation at 96MHz or more should be discouraged even if it works for some case or some people. I think the data shows that it is not reliable and is just going to bring user grievance.
Operation at 48MHz may not be completely reliable either. It does show improvement when setting the slew rate to the minimum.
For module builders:
-Decoupling for modules should be fixed to comply with the spec
-Independent lines for DQMH and DQML should be provided to avoid excessive load to A12 and A11
-Using serial resistors with FPGA outputs may help with ringing and guarantee reliable operation
-Trace coupling should be minimised
-Trace length should be matched
-Avoid using more than 2 SDRAM chips per module
-64MB modules made of a single chip are more reliable than 128MB modules
-Integrate the memory modules with the I/O board to share more supply pins
-Consider a dual SDRAM solution using both connectors but a single board, again to share more supply
I am attaching several measurements.