Page 1 of 2

SDRAM Reliability

Posted: Fri Dec 25, 2020 10:44 am
by jotego
As you know, the CPS1.5 does not work for 1 out of 5 users of 128MB modules. This has been the final fight with the SDRAM on MiSTer. I am honestly tired of dealing with it. Note that I have not experienced these problems with the old MiST, which doesn't have an ideal design either but at least everything is on the same board.

Some history:

Before CPS1.5, I already had evidence of intersymbol interference going on in the memory. When I adjusted the SDRAM clock phase, I could see that the valid range was much wider for 32MB devices than for 128MB ones. But also, it was much wider if I held the data bus down when not accessing it. That could be explained as some interference.

Then two cores were prone to have ROM test failures after loading: Contra and Double Dragon 1. You have seen the ROM BAD message on those. But they all had the same memory controller, at the same speed. And they all worked fine in other platforms.

I had designed everything at 48MHz to avoid dealing with high frequencies. But for CPS1 I needed more speed, so I moved to 96MHz at some point. Only to find that it didn't work for too many people. Holding the bus down, as described earlier, and looking for the right clock shift for several modules brought a solution that worked for most people. Still, you could have a memory that produced a ROM BAD error in Ghouls'n Ghosts. You wouldn't know it unless you booted that game, of course.

The CPS1.5 case:

CPS1.5 increases memory access to some 12MB/s in total (compared to 10MB/s of CPS1). I decided it was time to increase the SDRAM controller performance. I designed a new controller using bank interleaving. I found then that dropping the DQ mask lines was going to hit performance too. Compared with the old SDRAM modules (or MiST), the newer modules that have DQM and A lines shorted had made a compromise losing performance for a reduced pin count. This has an electrical implication too. Still, everything worked well until I released it. Then, 20% of users were not happy. Neither was I.

The games will consistently hang at the same time. However, the time was different for each user. The consistency for a given user setup pointed to an intersymbol interference problem.

Note that the core worked for 32MB modules and also on MiST and SiDi platforms. Plus, I had extensive simulations -that include all SDRAM timing constraints- proving that the design was sound. So I thought this would be an electrical problem. I put back my analogue IC engineer hat, which I normally leave at work, and started looking at the module.

The findings:

There are four electrical problems with the modules.

1. VDD ripple is completely out of spec
2. Too much ringing in signals, particularly on FPGA output pins
3. Signal coupling
4. Clock bandwidth

VDD ripple

VDD is between 2.6 and 4.0V in all modules, including the 32MB ones. It is not rare to see values as extreme as 4.2 or 2.4V. Notice that the SDRAM spec states that VDD should be within 3.6V and 3V.

VDD ripple is made worse by 128MB modules because power is 4x. Clocking frequency or multibank access will increase the amount of current switching through VDD, so VDD ripple is higher in those cases.

Ringing

FPGA outputs that go the module show large ringing that can last for 5ns, half the period of a 100MHz clock.

Ringing gets worse for 128MB modules because the load capacitance to the FPGA is 4x larger. The FPGA output pads will produce more current to charge the net simultaneously, and the extra current across the line and connector inductance creates the ugly ringing.

This effect can get slightly improved by setting the slew rate of the FPGA outputs to a minimum. Signal transition times go from 3ns to 4ns when setting the slew assignment to zero in Quartus. Ringing amplitude halves in that case.

Note that lines A12 and A11 are especially loaded in modern modules because of a design change. At some point, Sorg decided to short A12 and A11 to DQMH and DQML to save two pins. For most cores, this does not impact performance, but it does increase the capacitance on the FPGA pins. Thus lines A12 and A11 have more ringing than the rest.

Signal Coupling

As stated above, when some combinations of values occur in the bus in a given sequence, there is a chance to access wrong data.

To give more time to signals to settle and avoid interference, I tried changing the bus signals at different SDRAM access cycle steps. For instance, the FPGA can set DQ signals at the RAS step, and leave the DQ bus ready ahead of the CAS step. However, the signals that change at RAS and CAS are almost the same (the A bus) but with different values. I found a consistent error in the Contra ROM load when setting the DQ bus at RAS. Only one module accepted it (a 32MB one). All the other failed. When moving the DQ setting to the CAS step, a total of four modules worked ok.

Finally, when setting the slew to slowest for all output signals, all modules worked regardless of whether DQ was set at RAS or CAS. The following table summarizes this. Modules 2 and 3 are 32MB; the rest of them are 128MB.

Code: Select all

Module | DQ at RAS   |  DQ at CAS
-------|-------------|-------------
       | fast | slow | fast | slow
-------|------|------|------|------
1      |  NG  | OK   | OK   |  OK
2      |  NG  | OK   | NG   |  OK
3      |  OK  | OK   | NG   |  OK
4      |  NG  | OK   | OK   |  OK
7      |  NG  | OK   | NG   |  OK
8      |  NG  | OK   | OK   |  OK
9      |  NG  | OK   | OK   |  OK
-------|------|------|------|------
Fails  |   6  |  0   |  3   |   0
Clock Bandwidth

Clock measurements at 48MHz show overvoltage (3.8V) and undervoltage (-1V). Setting the slew to a minimum improves the clock shape (3.78V max and -0.66V).

At 96MHz, the situation is worse. The clock -using the fastest slew- is just a sinewave. Reducing the slew only makes the sinewave amplitude smaller. My oscilloscope has a 100MHz bandwidth so it might be filtering the higher harmonics but, just by look at the edges of the 48MHz clock (3ns each with peaking at both ends), it does make sense that at 96MHz the waveform will look like a sinewave. Note that the sinewave is not bounded by VDD-0V but does go all the way to -0.8V.

Suggestions

For developers:

I think operation at 96MHz or more should be discouraged even if it works for some case or some people. I think the data shows that it is not reliable and is just going to bring user grievance.

Operation at 48MHz may not be completely reliable either. It does show improvement when setting the slew rate to the minimum.

For module builders:

-Decoupling for modules should be fixed to comply with the spec
-Independent lines for DQMH and DQML should be provided to avoid excessive load to A12 and A11
-Using serial resistors with FPGA outputs may help with ringing and guarantee reliable operation
-Trace coupling should be minimised
-Trace length should be matched
-Avoid using more than 2 SDRAM chips per module
-64MB modules made of a single chip are more reliable than 128MB modules
-Integrate the memory modules with the I/O board to share more supply pins
-Consider a dual SDRAM solution using both connectors but a single board, again to share more supply

I am attaching several measurements.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 11:57 am
by maxxsun
Thank you for all your hard work.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 2:04 pm
by EvilRyu
Thanks for the hard work Jose. I hope your findings plus the ones from all the other MiSTer devs continue to bring the project to new heights.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 2:10 pm
by Milspex
Good job!!

Wonder which seller is gonna offer these up to spec boards first..

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 3:49 pm
by FatSlob71
All the 1.5 games work with the 2.5 Board i have but what is the Problem with SSlammasters after Completing Level 1 illegal Instruction error ?
Thanks for all the Work!

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 4:31 pm
by lamarax
Feliz Navidad Jotego, and thank you for all the work you're putting into this project, be it theoretical, analytical or just pure creative.

I still have to ask though; what happened with the Street Fighter (Capcom 68000) core, and why this doesn't boot for anyone after having been updated to run @ 48Mhz, a measure taken to supposedly bypass the objective problems that you're describing re: existing 128MB modules?

Thank you again for your immense contribution to our enjoyment!

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 5:10 pm
by hyp36rmax
Thank you for looking into this Jotego. Mister can only get better from here.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 5:22 pm
by jotego
lamarax wrote: Fri Dec 25, 2020 4:31 pm I still have to ask though; what happened with the Street Fighter (Capcom 68000) core, and why this doesn't boot for anyone after having been updated to run @ 48Mhz, a measure taken to supposedly bypass the objective problems that you're describing re: existing 128MB modules?
I was in a rush and uploaded the wrong file. Apologies about that.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 5:24 pm
by jotego
FatSlob71 wrote: Fri Dec 25, 2020 3:49 pm All the 1.5 games work with the 2.5 Board i have but what is the Problem with SSlammasters after Completing Level 1 illegal Instruction error ?
I think the Slam Masters hung-up after completing one level is a problem with the core logic. That one is not related to the SDRAM so I treated it as a low priority. I have been only focused on the SDRAM since the CPS1.5 public release.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 5:33 pm
by FatSlob71
What a Job for a 1 man band! I just got into System 100 Modular Synthesis it's like i tell myself what have i done! Keeps my mind busy though.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 7:13 pm
by PikWik
your research will help future developers and SDRAM builders

thank you, jotego!

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 8:27 pm
by PikWik
if the plan is to wait for a new SDRAM design, i would be OK with jotego releasing a "beta" CPS2 core for people that have working 128 SDRAM modules. versus waiting 3 months for a new RAM design to be created/accepted, people to start making and selling the new SDRAM, and then the CPS2 core be released.

but, i understand if this is too much to ask, and im fine with waiting.
CPS2, and beyond, is a big deal, and this new RAM design will only help future cores (PS1, CPS3) be developed :)

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 8:38 pm
by aberu
jotego wrote: Fri Dec 25, 2020 10:44 am Integrate the memory modules with the I/O board to share more supply pins
This was what I was thinking of... A new Digital I/O board that occupies both headers and has 128MB SDRAM embedded on it, with twice the I/O available (assuming they are identically speced in terms of the FPGA's access ot them) would increase the capabilities of the SDRAM and now obviously get rid of the issue you were noticing. The Analog I/O would have to be recognized as limited in this regard, Direct video has come a long way, but it's a tough sell since there are many fans of the Analog I/O board already, the digital I/O board seems like it's less frequently purchased/produced.

Re: SDRAM Reliability

Posted: Fri Dec 25, 2020 8:49 pm
by zerohimself
I think the analog IO should not be a dead end..

I feel that if we put our minds to it, with a little bit of effort we can make a design that overcomes (or at least can deal with and minimize) some of the issues, besides what is inherently out side of our control with the MiSTer.

Re: SDRAM Reliability

Posted: Sat Dec 26, 2020 1:30 am
by CMR
Awesome explanation, could cheap ram modules or insufficient power supplies also be causing problems?

Re: SDRAM Reliability

Posted: Sat Dec 26, 2020 3:21 am
by nullobject
Thank you for turning your analog engineering skills at this issue, Jotego.

I agree that this is something the community needs to figure out.

I have no doubt that if we want to see the more modern arcade cores come to MiSTer, then we will need to be able to access the full bandwidth (~100MHz) of the SDRAM module.

Sure you can do other tricks, like offload some ROMs or the framebuffer to DDR3, but this unnecessarily complicates core design for the sake of a "workaround".

It's a shame that there may be hundreds (thousands?) of SDRAM modules out there which may suffer from the electrical issues you identified. But I think ultimately the MiSTer community will appreciate iterating towards a more correct solution.

Do you think that an optimal design can even be achieved with the dual-row GPIO connectors on the DE10-nano?

It seems that other FPGA boards (ULX3S, DE0-nano, etc.) have SDRAM that "just works" at high frequencies (>=100MHz). But in these designs the SDRAM chips are physically locatated very close to the FPGA, with balanced traces, etc.

Anyhow, I'm glad you have started this conversation. I'm interested to see where this takes us.

Thanks again :D

Re: SDRAM Reliability

Posted: Sat Dec 26, 2020 9:35 pm
by lamarax
jotego wrote: Fri Dec 25, 2020 5:22 pm I was in a rush and uploaded the wrong file. Apologies about that.
In light of this announcement: https://twitter.com/topapate/status/1342897230433447937, and given that you haven't uploaded the correct file yet, I'm starting to feel a bit uneasy about the whole SDRAM situation in relation to your cores. I must note that I'm one of the few (?) lucky ones who doesn't encounter any problems with things "as is". I understand of course that the issue will inevitably rear its ugly head down the road as things move forward. I'm using a 4A psu with minimal peripheral load; maybe that's something worth taking into consideration (re: VDD ripple)?

But hey, it's your work and it's Christmas time, so yeah :)

Re: SDRAM Reliability

Posted: Sat Dec 26, 2020 10:13 pm
by PikWik
im also one of the "lucky" ones without issues using the cores developed/released by jotego.

i too was wondering about a recommended PSU to use with MiSTer.
like, specific brands and ratings of wall adapters that are suggested for use with a MiSTer.
obviously the standard PSU will be sufficient for most people, but with a USB hub, "XXXX brand at 4a 5v 20w" is suggested.

ive also noticed that some people fixed their address issues when using the CPS cores after correcting their mame/arcade folders and gathering the correct MRAs/ROMs and placing them in the correct structure/directory.
perhaps someone can assist with a standardized folder structure for jotego's cores to make sure people have the expected files in the right place.

im very grateful for everything created so far, and im sure this is a small step which will only help MiSTer development in the long run!

moving forward, the SDRAM will either get a new design, people will do the 10uf cap mod, or jotego will figure out some magic to make the existing 128 RAM modules work 8-)

Re: SDRAM Reliability

Posted: Sat Dec 26, 2020 10:58 pm
by Alkadian
First of all thank you so much Jotego for your very much appreciated hard work!

Well, I am also one of the lucky ones with no ram issues at all apart from Slammasters which it won't boot. Infact I have got two mister setups with two 128mb ram modules. So I consider myself double lucky :mrgreen:

Anyways regarding the PSU query I have got mine which is a Mean Well unit rated @ 5v, 4A, 20W and again no issues so far...touch wood :D

Re: SDRAM Reliability

Posted: Sat Dec 26, 2020 11:42 pm
by narf
does memtest tool detect these errors?
It seems rather than try to engineer around the issue, users should be encouraged to replace out of spec SDRAM with proper ones.
Also a list of suppliers that sell SDRAM that is in-spec.
I'm sure this is wasting a lot of your time and other developers working around this issue.

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 1:47 am
by jca
With 32M Slammasters works up to core jtcps15_20201218.rbf but not jtcps15_20201219.rbf.
I am confused with:
jtcps15_20201218.rbf CPS 1.5: added support for SF Zero CPS1.5 version
jtcps15_20201219.rbf Adds support for Megaman CPS1.5
Where are the mras?

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 2:20 pm
by goofyseeker3
could you partition memory usage, that some part will be on any sdr sdram and slower stuff is on the ddr3 sdram, shared or dedicated

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 5:49 pm
by Sorgelig
While it's good to have everything working perfectly, and while theoretic requirements are true, we live in a real world with imperfections.
Power supply listed in datasheet is continuous power supply. It's not fully true when it comes to spikes due to switching circuits. So momentary value below or above nominal rating in datasheet is more or less fine. It's not a medical, aerospace or nuclear controlling device. Also, with existing CPS1.5 core we have an additional test for memory, so sellers will test their modules on this core before ship. Still there can be a very small amount of modules passed on seller's test platform which will fail on user, but it will be very small fraction. Reputable seller will replace it.
MiSTer is developing platform pushing to the edges some usage as new cores appear.
I don't see a reason for drama developed here. Also i'm still thinking it's possible to re-design the core and make it work on modules currently failing.

P.S.: Don't forget the current design made by hobbyist(me) based on practical tests as by datasheets the conditions should not allow SDRAM to work at all. I've released it for free, sellers don't pay me for this (besides couple sellers who decided to donate me time after time at their free will). None of sellers who earn a lot from sales bothered to help with improvements. Everyone, especially core developers for who it's just a hobby, need to distinguish hobbyists from commerce. As hobbyists we should be more flexible and self demanded to fix the problems or find workarounds using what we have instead of pointing to other who should fix it for you. Even commercial devices have a lot of HW bugs and problems fixed or workarounded in firmware or driver. So i suggest to understand what you are demanding from free hobby project...

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 6:00 pm
by Sorgelig
Aside from SDRAM, MiSTer has a very fast DDR3 which has long latency on random reads (writes have no latency) but it's still possible to use it instead of SDRAM. For example GBA core is written to use DDR3 and only couple games have minor artifacts. It's just different technique to write the core, but it's still possible.
Besides the complex methods (used in GBA for example) there are many cases when data is known to be read sequentially, so DDR latency can be eliminated by prefetch. Audio samples or some graphics are such cases.
Another aspect, core written for multiple FPGA boards has to use some common functionality and basically is limited by lowest FPGA board. So cores universally written for many FPGA boards won't use additional features and won't use available resources in effective ways.

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 7:06 pm
by BasketSnake
JT, please make it open source so people can sleep at night. Thank you :)

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 9:36 pm
by lamarax
As long as JT is willing, or capable, to put his observations into silicone, I'm betting people would flock to buy the new and improved modules.

Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 11:37 pm
by silentheaven83
lamarax wrote: Sun Dec 27, 2020 9:36 pm

As long as JT is willing, or capable, to put his observations into silicone, I'm betting people would flock to buy the new and improved modules.

Frankly I wouldn’t, I bought in September a 128 MB board from https://misterfpga.co.uk that fails in SFZero CPS 1.5 and I bought it to be future proof, and it wasn’t really cheap as I live in Italy.

I hope Jotego can do what Sorg says, to find a workaround and use everything we already have.


Re: SDRAM Reliability

Posted: Sun Dec 27, 2020 11:47 pm
by silentheaven83
Sorg thank you for your input.
Sorgelig wrote: Sun Dec 27, 2020 5:49 pm Also, with existing CPS1.5 core we have an additional test for memory, so sellers will test their modules on this core before ship. Still there can be a very small amount of modules passed on seller's test platform which will fail on user, but it will be very small fraction. Reputable seller will replace it.
The problem would still remain for the people like me who already bought 128 MB v2.5 boards that fail this test. I don’t know if sellers would accept the returning boards, fix them and ship again to the customers.

If you sell 2.5v boards that can fail the test and then you start selling the same version but only if it doesn’t fail the same test I think it’s not fair for first buyers. I’m talking about reputable sellers of course.

Re: SDRAM Reliability

Posted: Mon Dec 28, 2020 1:11 am
by PikWik
im hoping that moving forward, SDRAM sellers run an additional more extensive memory test and also disclose the specific capacitors (ex. kemet) they use for their modules.
if sorg or someone else can create a more thorough memory test for sellers to use, this will also create peace of mind for first time buyers.

and i actually do think sellers will be willing to fix memory modules that give errors with current cores.
(if bought within a certain amount of time, seller approved repair, or small fee to pay for shipping)
this kind of service will only help give the seller a better name and respect from the MiSTer community.

Re: SDRAM Reliability

Posted: Mon Dec 28, 2020 1:19 am
by Lodovik
Thanks Jotego for exposing the limits of the current design and coming with suggestions for improving on it.