SDRAM Reliability

User avatar
jotego
Core Developer
Posts: 57
Joined: Sun May 24, 2020 7:07 pm
Has thanked: 19 times
Been thanked: 193 times

SDRAM Reliability

Unread post by jotego »

As you know, the CPS1.5 does not work for 1 out of 5 users of 128MB modules. This has been the final fight with the SDRAM on MiSTer. I am honestly tired of dealing with it. Note that I have not experienced these problems with the old MiST, which doesn't have an ideal design either but at least everything is on the same board.

Some history:

Before CPS1.5, I already had evidence of intersymbol interference going on in the memory. When I adjusted the SDRAM clock phase, I could see that the valid range was much wider for 32MB devices than for 128MB ones. But also, it was much wider if I held the data bus down when not accessing it. That could be explained as some interference.

Then two cores were prone to have ROM test failures after loading: Contra and Double Dragon 1. You have seen the ROM BAD message on those. But they all had the same memory controller, at the same speed. And they all worked fine in other platforms.

I had designed everything at 48MHz to avoid dealing with high frequencies. But for CPS1 I needed more speed, so I moved to 96MHz at some point. Only to find that it didn't work for too many people. Holding the bus down, as described earlier, and looking for the right clock shift for several modules brought a solution that worked for most people. Still, you could have a memory that produced a ROM BAD error in Ghouls'n Ghosts. You wouldn't know it unless you booted that game, of course.

The CPS1.5 case:

CPS1.5 increases memory access to some 12MB/s in total (compared to 10MB/s of CPS1). I decided it was time to increase the SDRAM controller performance. I designed a new controller using bank interleaving. I found then that dropping the DQ mask lines was going to hit performance too. Compared with the old SDRAM modules (or MiST), the newer modules that have DQM and A lines shorted had made a compromise losing performance for a reduced pin count. This has an electrical implication too. Still, everything worked well until I released it. Then, 20% of users were not happy. Neither was I.

The games will consistently hang at the same time. However, the time was different for each user. The consistency for a given user setup pointed to an intersymbol interference problem.

Note that the core worked for 32MB modules and also on MiST and SiDi platforms. Plus, I had extensive simulations -that include all SDRAM timing constraints- proving that the design was sound. So I thought this would be an electrical problem. I put back my analogue IC engineer hat, which I normally leave at work, and started looking at the module.

The findings:

There are four electrical problems with the modules.

1. VDD ripple is completely out of spec
2. Too much ringing in signals, particularly on FPGA output pins
3. Signal coupling
4. Clock bandwidth

VDD ripple

VDD is between 2.6 and 4.0V in all modules, including the 32MB ones. It is not rare to see values as extreme as 4.2 or 2.4V. Notice that the SDRAM spec states that VDD should be within 3.6V and 3V.

VDD ripple is made worse by 128MB modules because power is 4x. Clocking frequency or multibank access will increase the amount of current switching through VDD, so VDD ripple is higher in those cases.

Ringing

FPGA outputs that go the module show large ringing that can last for 5ns, half the period of a 100MHz clock.

Ringing gets worse for 128MB modules because the load capacitance to the FPGA is 4x larger. The FPGA output pads will produce more current to charge the net simultaneously, and the extra current across the line and connector inductance creates the ugly ringing.

This effect can get slightly improved by setting the slew rate of the FPGA outputs to a minimum. Signal transition times go from 3ns to 4ns when setting the slew assignment to zero in Quartus. Ringing amplitude halves in that case.

Note that lines A12 and A11 are especially loaded in modern modules because of a design change. At some point, Sorg decided to short A12 and A11 to DQMH and DQML to save two pins. For most cores, this does not impact performance, but it does increase the capacitance on the FPGA pins. Thus lines A12 and A11 have more ringing than the rest.

Signal Coupling

As stated above, when some combinations of values occur in the bus in a given sequence, there is a chance to access wrong data.

To give more time to signals to settle and avoid interference, I tried changing the bus signals at different SDRAM access cycle steps. For instance, the FPGA can set DQ signals at the RAS step, and leave the DQ bus ready ahead of the CAS step. However, the signals that change at RAS and CAS are almost the same (the A bus) but with different values. I found a consistent error in the Contra ROM load when setting the DQ bus at RAS. Only one module accepted it (a 32MB one). All the other failed. When moving the DQ setting to the CAS step, a total of four modules worked ok.

Finally, when setting the slew to slowest for all output signals, all modules worked regardless of whether DQ was set at RAS or CAS. The following table summarizes this. Modules 2 and 3 are 32MB; the rest of them are 128MB.

Code: Select all

Module | DQ at RAS   |  DQ at CAS
-------|-------------|-------------
       | fast | slow | fast | slow
-------|------|------|------|------
1      |  NG  | OK   | OK   |  OK
2      |  NG  | OK   | NG   |  OK
3      |  OK  | OK   | NG   |  OK
4      |  NG  | OK   | OK   |  OK
7      |  NG  | OK   | NG   |  OK
8      |  NG  | OK   | OK   |  OK
9      |  NG  | OK   | OK   |  OK
-------|------|------|------|------
Fails  |   6  |  0   |  3   |   0
Clock Bandwidth

Clock measurements at 48MHz show overvoltage (3.8V) and undervoltage (-1V). Setting the slew to a minimum improves the clock shape (3.78V max and -0.66V).

At 96MHz, the situation is worse. The clock -using the fastest slew- is just a sinewave. Reducing the slew only makes the sinewave amplitude smaller. My oscilloscope has a 100MHz bandwidth so it might be filtering the higher harmonics but, just by look at the edges of the 48MHz clock (3ns each with peaking at both ends), it does make sense that at 96MHz the waveform will look like a sinewave. Note that the sinewave is not bounded by VDD-0V but does go all the way to -0.8V.

Suggestions

For developers:

I think operation at 96MHz or more should be discouraged even if it works for some case or some people. I think the data shows that it is not reliable and is just going to bring user grievance.

Operation at 48MHz may not be completely reliable either. It does show improvement when setting the slew rate to the minimum.

For module builders:

-Decoupling for modules should be fixed to comply with the spec
-Independent lines for DQMH and DQML should be provided to avoid excessive load to A12 and A11
-Using serial resistors with FPGA outputs may help with ringing and guarantee reliable operation
-Trace coupling should be minimised
-Trace length should be matched
-Avoid using more than 2 SDRAM chips per module
-64MB modules made of a single chip are more reliable than 128MB modules
-Integrate the memory modules with the I/O board to share more supply pins
-Consider a dual SDRAM solution using both connectors but a single board, again to share more supply

I am attaching several measurements.
Attachments
SDRAM waveforms.zip
(1.88 MiB) Downloaded 245 times
Open IP for many chips in my github account
RBF files for my MiSTer cores in jtbin
Support new IP and core development here
maxxsun
Posts: 2
Joined: Mon May 25, 2020 8:32 am
Has thanked: 1 time
Been thanked: 7 times

Re: SDRAM Reliability

Unread post by maxxsun »

Thank you for all your hard work.
User avatar
EvilRyu
Posts: 29
Joined: Sun May 24, 2020 9:18 pm
Has thanked: 8 times
Been thanked: 3 times

Re: SDRAM Reliability

Unread post by EvilRyu »

Thanks for the hard work Jose. I hope your findings plus the ones from all the other MiSTer devs continue to bring the project to new heights.
Milspex
Posts: 141
Joined: Wed Jun 10, 2020 6:46 pm
Has thanked: 34 times
Been thanked: 29 times

Re: SDRAM Reliability

Unread post by Milspex »

Good job!!

Wonder which seller is gonna offer these up to spec boards first..
User avatar
FatSlob71
Posts: 87
Joined: Tue Oct 13, 2020 10:11 am
Has thanked: 18 times
Been thanked: 10 times

Re: SDRAM Reliability

Unread post by FatSlob71 »

All the 1.5 games work with the 2.5 Board i have but what is the Problem with SSlammasters after Completing Level 1 illegal Instruction error ?
Thanks for all the Work!
User avatar
lamarax
Posts: 456
Joined: Wed Nov 11, 2020 6:28 pm
Has thanked: 32 times
Been thanked: 188 times

Re: SDRAM Reliability

Unread post by lamarax »

Feliz Navidad Jotego, and thank you for all the work you're putting into this project, be it theoretical, analytical or just pure creative.

I still have to ask though; what happened with the Street Fighter (Capcom 68000) core, and why this doesn't boot for anyone after having been updated to run @ 48Mhz, a measure taken to supposedly bypass the objective problems that you're describing re: existing 128MB modules?

Thank you again for your immense contribution to our enjoyment!
hyp36rmax
Posts: 23
Joined: Sat Jun 20, 2020 7:36 pm
Has thanked: 25 times
Been thanked: 12 times

Re: SDRAM Reliability

Unread post by hyp36rmax »

Thank you for looking into this Jotego. Mister can only get better from here.
User avatar
jotego
Core Developer
Posts: 57
Joined: Sun May 24, 2020 7:07 pm
Has thanked: 19 times
Been thanked: 193 times

Re: SDRAM Reliability

Unread post by jotego »

lamarax wrote: Fri Dec 25, 2020 4:31 pm I still have to ask though; what happened with the Street Fighter (Capcom 68000) core, and why this doesn't boot for anyone after having been updated to run @ 48Mhz, a measure taken to supposedly bypass the objective problems that you're describing re: existing 128MB modules?
I was in a rush and uploaded the wrong file. Apologies about that.
Open IP for many chips in my github account
RBF files for my MiSTer cores in jtbin
Support new IP and core development here
User avatar
jotego
Core Developer
Posts: 57
Joined: Sun May 24, 2020 7:07 pm
Has thanked: 19 times
Been thanked: 193 times

Re: SDRAM Reliability

Unread post by jotego »

FatSlob71 wrote: Fri Dec 25, 2020 3:49 pm All the 1.5 games work with the 2.5 Board i have but what is the Problem with SSlammasters after Completing Level 1 illegal Instruction error ?
I think the Slam Masters hung-up after completing one level is a problem with the core logic. That one is not related to the SDRAM so I treated it as a low priority. I have been only focused on the SDRAM since the CPS1.5 public release.
Open IP for many chips in my github account
RBF files for my MiSTer cores in jtbin
Support new IP and core development here
User avatar
FatSlob71
Posts: 87
Joined: Tue Oct 13, 2020 10:11 am
Has thanked: 18 times
Been thanked: 10 times

Re: SDRAM Reliability

Unread post by FatSlob71 »

What a Job for a 1 man band! I just got into System 100 Modular Synthesis it's like i tell myself what have i done! Keeps my mind busy though.
PikWik
Posts: 201
Joined: Sat May 30, 2020 7:00 pm
Has thanked: 148 times
Been thanked: 59 times

Re: SDRAM Reliability

Unread post by PikWik »

your research will help future developers and SDRAM builders

thank you, jotego!
PikWik
Posts: 201
Joined: Sat May 30, 2020 7:00 pm
Has thanked: 148 times
Been thanked: 59 times

Re: SDRAM Reliability

Unread post by PikWik »

if the plan is to wait for a new SDRAM design, i would be OK with jotego releasing a "beta" CPS2 core for people that have working 128 SDRAM modules. versus waiting 3 months for a new RAM design to be created/accepted, people to start making and selling the new SDRAM, and then the CPS2 core be released.

but, i understand if this is too much to ask, and im fine with waiting.
CPS2, and beyond, is a big deal, and this new RAM design will only help future cores (PS1, CPS3) be developed :)
User avatar
aberu
Core Developer
Posts: 970
Joined: Tue Jun 09, 2020 8:34 pm
Location: Longmont, CO
Has thanked: 203 times
Been thanked: 313 times
Contact:

Re: SDRAM Reliability

Unread post by aberu »

jotego wrote: Fri Dec 25, 2020 10:44 am Integrate the memory modules with the I/O board to share more supply pins
This was what I was thinking of... A new Digital I/O board that occupies both headers and has 128MB SDRAM embedded on it, with twice the I/O available (assuming they are identically speced in terms of the FPGA's access ot them) would increase the capabilities of the SDRAM and now obviously get rid of the issue you were noticing. The Analog I/O would have to be recognized as limited in this regard, Direct video has come a long way, but it's a tough sell since there are many fans of the Analog I/O board already, the digital I/O board seems like it's less frequently purchased/produced.
birdybro~
zerohimself
Posts: 3
Joined: Sun May 24, 2020 8:03 pm

Re: SDRAM Reliability

Unread post by zerohimself »

I think the analog IO should not be a dead end..

I feel that if we put our minds to it, with a little bit of effort we can make a design that overcomes (or at least can deal with and minimize) some of the issues, besides what is inherently out side of our control with the MiSTer.
CMR
Posts: 58
Joined: Sun Dec 20, 2020 12:29 am
Has thanked: 16 times
Been thanked: 1 time

Re: SDRAM Reliability

Unread post by CMR »

Awesome explanation, could cheap ram modules or insufficient power supplies also be causing problems?
User avatar
nullobject
Core Developer
Posts: 16
Joined: Mon May 25, 2020 12:31 am
Has thanked: 5 times
Been thanked: 13 times

Re: SDRAM Reliability

Unread post by nullobject »

Thank you for turning your analog engineering skills at this issue, Jotego.

I agree that this is something the community needs to figure out.

I have no doubt that if we want to see the more modern arcade cores come to MiSTer, then we will need to be able to access the full bandwidth (~100MHz) of the SDRAM module.

Sure you can do other tricks, like offload some ROMs or the framebuffer to DDR3, but this unnecessarily complicates core design for the sake of a "workaround".

It's a shame that there may be hundreds (thousands?) of SDRAM modules out there which may suffer from the electrical issues you identified. But I think ultimately the MiSTer community will appreciate iterating towards a more correct solution.

Do you think that an optimal design can even be achieved with the dual-row GPIO connectors on the DE10-nano?

It seems that other FPGA boards (ULX3S, DE0-nano, etc.) have SDRAM that "just works" at high frequencies (>=100MHz). But in these designs the SDRAM chips are physically locatated very close to the FPGA, with balanced traces, etc.

Anyhow, I'm glad you have started this conversation. I'm interested to see where this takes us.

Thanks again :D
MiSTer core developer: Rygar, Gemini Wing, Silkworm, CAVE
Support me on Patreon
User avatar
lamarax
Posts: 456
Joined: Wed Nov 11, 2020 6:28 pm
Has thanked: 32 times
Been thanked: 188 times

Re: SDRAM Reliability

Unread post by lamarax »

jotego wrote: Fri Dec 25, 2020 5:22 pm I was in a rush and uploaded the wrong file. Apologies about that.
In light of this announcement: https://twitter.com/topapate/status/1342897230433447937, and given that you haven't uploaded the correct file yet, I'm starting to feel a bit uneasy about the whole SDRAM situation in relation to your cores. I must note that I'm one of the few (?) lucky ones who doesn't encounter any problems with things "as is". I understand of course that the issue will inevitably rear its ugly head down the road as things move forward. I'm using a 4A psu with minimal peripheral load; maybe that's something worth taking into consideration (re: VDD ripple)?

But hey, it's your work and it's Christmas time, so yeah :)
PikWik
Posts: 201
Joined: Sat May 30, 2020 7:00 pm
Has thanked: 148 times
Been thanked: 59 times

Re: SDRAM Reliability

Unread post by PikWik »

im also one of the "lucky" ones without issues using the cores developed/released by jotego.

i too was wondering about a recommended PSU to use with MiSTer.
like, specific brands and ratings of wall adapters that are suggested for use with a MiSTer.
obviously the standard PSU will be sufficient for most people, but with a USB hub, "XXXX brand at 4a 5v 20w" is suggested.

ive also noticed that some people fixed their address issues when using the CPS cores after correcting their mame/arcade folders and gathering the correct MRAs/ROMs and placing them in the correct structure/directory.
perhaps someone can assist with a standardized folder structure for jotego's cores to make sure people have the expected files in the right place.

im very grateful for everything created so far, and im sure this is a small step which will only help MiSTer development in the long run!

moving forward, the SDRAM will either get a new design, people will do the 10uf cap mod, or jotego will figure out some magic to make the existing 128 RAM modules work 8-)
User avatar
Alkadian
Posts: 593
Joined: Thu May 28, 2020 9:55 am
Has thanked: 201 times
Been thanked: 66 times

Re: SDRAM Reliability

Unread post by Alkadian »

First of all thank you so much Jotego for your very much appreciated hard work!

Well, I am also one of the lucky ones with no ram issues at all apart from Slammasters which it won't boot. Infact I have got two mister setups with two 128mb ram modules. So I consider myself double lucky :mrgreen:

Anyways regarding the PSU query I have got mine which is a Mean Well unit rated @ 5v, 4A, 20W and again no issues so far...touch wood :D
narf
Posts: 1
Joined: Mon May 25, 2020 3:57 am

Re: SDRAM Reliability

Unread post by narf »

does memtest tool detect these errors?
It seems rather than try to engineer around the issue, users should be encouraged to replace out of spec SDRAM with proper ones.
Also a list of suppliers that sell SDRAM that is in-spec.
I'm sure this is wasting a lot of your time and other developers working around this issue.
jca
Posts: 1167
Joined: Wed May 27, 2020 1:59 pm
Has thanked: 80 times
Been thanked: 231 times

Re: SDRAM Reliability

Unread post by jca »

With 32M Slammasters works up to core jtcps15_20201218.rbf but not jtcps15_20201219.rbf.
I am confused with:
jtcps15_20201218.rbf CPS 1.5: added support for SF Zero CPS1.5 version
jtcps15_20201219.rbf Adds support for Megaman CPS1.5
Where are the mras?
goofyseeker3
Posts: 25
Joined: Sat Dec 26, 2020 8:01 pm

Re: SDRAM Reliability

Unread post by goofyseeker3 »

could you partition memory usage, that some part will be on any sdr sdram and slower stuff is on the ddr3 sdram, shared or dedicated
User avatar
Sorgelig
Site Admin
Posts: 787
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 1 time
Been thanked: 183 times

Re: SDRAM Reliability

Unread post by Sorgelig »

While it's good to have everything working perfectly, and while theoretic requirements are true, we live in a real world with imperfections.
Power supply listed in datasheet is continuous power supply. It's not fully true when it comes to spikes due to switching circuits. So momentary value below or above nominal rating in datasheet is more or less fine. It's not a medical, aerospace or nuclear controlling device. Also, with existing CPS1.5 core we have an additional test for memory, so sellers will test their modules on this core before ship. Still there can be a very small amount of modules passed on seller's test platform which will fail on user, but it will be very small fraction. Reputable seller will replace it.
MiSTer is developing platform pushing to the edges some usage as new cores appear.
I don't see a reason for drama developed here. Also i'm still thinking it's possible to re-design the core and make it work on modules currently failing.

P.S.: Don't forget the current design made by hobbyist(me) based on practical tests as by datasheets the conditions should not allow SDRAM to work at all. I've released it for free, sellers don't pay me for this (besides couple sellers who decided to donate me time after time at their free will). None of sellers who earn a lot from sales bothered to help with improvements. Everyone, especially core developers for who it's just a hobby, need to distinguish hobbyists from commerce. As hobbyists we should be more flexible and self demanded to fix the problems or find workarounds using what we have instead of pointing to other who should fix it for you. Even commercial devices have a lot of HW bugs and problems fixed or workarounded in firmware or driver. So i suggest to understand what you are demanding from free hobby project...
User avatar
Sorgelig
Site Admin
Posts: 787
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 1 time
Been thanked: 183 times

Re: SDRAM Reliability

Unread post by Sorgelig »

Aside from SDRAM, MiSTer has a very fast DDR3 which has long latency on random reads (writes have no latency) but it's still possible to use it instead of SDRAM. For example GBA core is written to use DDR3 and only couple games have minor artifacts. It's just different technique to write the core, but it's still possible.
Besides the complex methods (used in GBA for example) there are many cases when data is known to be read sequentially, so DDR latency can be eliminated by prefetch. Audio samples or some graphics are such cases.
Another aspect, core written for multiple FPGA boards has to use some common functionality and basically is limited by lowest FPGA board. So cores universally written for many FPGA boards won't use additional features and won't use available resources in effective ways.
User avatar
BasketSnake
Posts: 5
Joined: Sun May 24, 2020 7:29 pm
Has thanked: 1 time

Re: SDRAM Reliability

Unread post by BasketSnake »

JT, please make it open source so people can sleep at night. Thank you :)
User avatar
lamarax
Posts: 456
Joined: Wed Nov 11, 2020 6:28 pm
Has thanked: 32 times
Been thanked: 188 times

Re: SDRAM Reliability

Unread post by lamarax »

As long as JT is willing, or capable, to put his observations into silicone, I'm betting people would flock to buy the new and improved modules.
silentheaven83
Posts: 40
Joined: Fri Sep 25, 2020 12:33 pm
Been thanked: 1 time

Re: SDRAM Reliability

Unread post by silentheaven83 »

lamarax wrote: Sun Dec 27, 2020 9:36 pm As long as JT is willing, or capable, to put his observations into silicone, I'm betting people would flock to buy the new and improved modules.
Frankly I wouldn’t, I bought in September a 128 MB board from misterfpga.co.uk that fails in SFZero CPS 1.5 and I bought it to be future proof, and it wasn’t really cheap as I live in Italy.

I hope Jotego can do what Sorg says, to find a workaround and use everything we already have.
silentheaven83
Posts: 40
Joined: Fri Sep 25, 2020 12:33 pm
Been thanked: 1 time

Re: SDRAM Reliability

Unread post by silentheaven83 »

Sorg thank you for your input.
Sorgelig wrote: Sun Dec 27, 2020 5:49 pm Also, with existing CPS1.5 core we have an additional test for memory, so sellers will test their modules on this core before ship. Still there can be a very small amount of modules passed on seller's test platform which will fail on user, but it will be very small fraction. Reputable seller will replace it.
The problem would still remain for the people like me who already bought 128 MB v2.5 boards that fail this test. I don’t know if sellers would accept the returning boards, fix them and ship again to the customers.

If you sell 2.5v boards that can fail the test and then you start selling the same version but only if it doesn’t fail the same test I think it’s not fair for first buyers. I’m talking about reputable sellers of course.
PikWik
Posts: 201
Joined: Sat May 30, 2020 7:00 pm
Has thanked: 148 times
Been thanked: 59 times

Re: SDRAM Reliability

Unread post by PikWik »

im hoping that moving forward, SDRAM sellers run an additional more extensive memory test and also disclose the specific capacitors (ex. kemet) they use for their modules.
if sorg or someone else can create a more thorough memory test for sellers to use, this will also create peace of mind for first time buyers.

and i actually do think sellers will be willing to fix memory modules that give errors with current cores.
(if bought within a certain amount of time, seller approved repair, or small fee to pay for shipping)
this kind of service will only help give the seller a better name and respect from the MiSTer community.
Lodovik
Posts: 10
Joined: Mon May 25, 2020 5:02 am
Has thanked: 8 times
Been thanked: 1 time

Re: SDRAM Reliability

Unread post by Lodovik »

Thanks Jotego for exposing the limits of the current design and coming with suggestions for improving on it.
Post Reply