Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Reed_Solomon · Unread post by **Reed_Solomon** » Mon Oct 02, 2023 5:53 am

mxml wrote: ↑Sat Jul 01, 2023 7:53 pm
Well sipeed is apparently targeting a mister replacement with the gowin arora v 138k + Andes risc v cores

Their dev tooling is kind of meh, and Altera LEs aren't directly comparable to gowin LES, but it does look quite interesting for half the price and 138k le.

Would need a carrier with HDMI TX (12.5gbps serdes so 4k120 works lol) and low latency ram unless there's a direct to logic bus vs axi

But that's relatively easy to do: maybe HYPERRAM/rldram and ti tdp1204 on a USB c pd carrier with breakouts for controller gpio.

Promising and much cheaper than kria. Supposed to release in August.

: Sipeed-Tang-Mega-138K-Pro-system-on-module.jpg (157.56 KiB) Viewed 11762 times

https://www.cnx-software.com/2023/10/01 ... isc-v-soc/

Looks pretty interesting. Supposed to be released next month. With the 3 BTB connectors you could potentially design a board where you just plug in the FPGA + Risc V SoC either directly or via some ribbon cables and perhaps even design a portable device. $120 for the system on module is pretty cheap.

: Sipeed-Tang-Mega-138K-Pro-Dock.jpg (240.52 KiB) Viewed 11762 times

here it is in its own PCIe dock which is overkill and adds $195 to the price.

dcubed · Unread post by **dcubed** » Fri Oct 06, 2023 7:51 am

arachnivore wrote: ↑Tue Aug 15, 2023 3:26 am
I just noticed that the Efinix Ti 180 FPGA is finally available on Digikey for $55 in small quantities (80 units).

I think there's a place for a System-on-Module (SoM) ecosystem like the Raspberry Pi 4 CM. It may be possible to match the capability of the DE-10 with a cheaper board that's also portable. I think there's a lot of interest in a cheaper and/or portable system, especially one that's open-source. It doesn't have to be about making something more powerful. As others have pointed out, that's a dead-end road for many reasons.

I imagine a hand-held with a Raspberry Pi Zero 2 W, a SoM socket, and a dock connector. If you're just after light emulation, no expensive SoM is needed, but if you want the full MiSTer experience, you plug in a module with an FPGA with its own SRAM. If you want to play a multi-player game or use an arcade box, plug it into a dock.

I know a lot of this isn't the most original plan in the world, but it seems like it might be worth the effort.

Edit: I know Efinix offers a cheaper Trion (non-Ti) family of FPGAs that goes up to 120K LEs for as little as $27 on Digikey, however; I have my doubts whether a design that takes up around 80% of a 110K LE Cyclone V could fit in a 120K LE Trion because the feature set and device design are so different.

The feature overlap between the Cyclone V and the Trion Titanium families is much greater, so it might be possible to make due with the 120K LE version of the Trion Titanium, but that chip is only marginally cheaper ($49 vs $55) than the 180K LE version, so I figure better safe than sorry.

That's the FPGA that MARS is using isn't it?

Guess we'll see what potential it has pretty soon then.

Armakuni · Unread post by **Armakuni** » Sun Oct 08, 2023 5:36 pm

dcubed wrote: ↑Fri Oct 06, 2023 7:51 am
arachnivore wrote: ↑Tue Aug 15, 2023 3:26 am
I just noticed that the Efinix Ti 180 FPGA is finally available on Digikey for $55 in small quantities (80 units).

I think there's a place for a System-on-Module (SoM) ecosystem like the Raspberry Pi 4 CM. It may be possible to match the capability of the DE-10 with a cheaper board that's also portable. I think there's a lot of interest in a cheaper and/or portable system, especially one that's open-source. It doesn't have to be about making something more powerful. As others have pointed out, that's a dead-end road for many reasons.

I imagine a hand-held with a Raspberry Pi Zero 2 W, a SoM socket, and a dock connector. If you're just after light emulation, no expensive SoM is needed, but if you want the full MiSTer experience, you plug in a module with an FPGA with its own SRAM. If you want to play a multi-player game or use an arcade box, plug it into a dock.

I know a lot of this isn't the most original plan in the world, but it seems like it might be worth the effort.

Edit: I know Efinix offers a cheaper Trion (non-Ti) family of FPGAs that goes up to 120K LEs for as little as $27 on Digikey, however; I have my doubts whether a design that takes up around 80% of a 110K LE Cyclone V could fit in a 120K LE Trion because the feature set and device design are so different.

The feature overlap between the Cyclone V and the Trion Titanium families is much greater, so it might be possible to make due with the 120K LE version of the Trion Titanium, but that chip is only marginally cheaper ($49 vs $55) than the 180K LE version, so I figure better safe than sorry.

That's the FPGA that MARS is using isn't it?

Guess we'll see what potential it has pretty soon then.

I wouldn't get too carried away just yet. It still doesnt have enough resources to handle Nintendo DS which was confirmed by Robert.

cb88 · Unread post by **cb88** » Wed Mar 13, 2024 3:02 pm

Armakuni wrote: ↑Sun Oct 08, 2023 5:36 pm
I wouldn't get too carried away just yet. It still doesn't have enough resources to handle Nintendo DS which was confirmed by Robert.

It's worth pointing out also that after a certain point the need for cycle accurate emulation diminishes because everyone stopped coding close to the hardware... dreamcast and up probably don't need cycle accurate CPUs or GPUs at all.

There are a few systems that are well documented that could be ran on a much much faster and larger FPGA though, eg the sparc core would shine with a FPGA 2-3 times as large or more if you also want to implement say the SX (simple fast good) or Leo (very complicated) or AG10E (which is an amalgamation of like 5 different GPUs on a sandwiched board) GPUs (it really needs more execution width also in addition to faster and more logic)

Probably nobody would complain about faster 68k and amiga cores either (something along the lines of the vampire accelerators)

Pretty much the only option at this point is Xilinx Ultrascale+ or some newcomer because Intel has priced Agilex out of the market (perhaps they ahve agilex 3 slated but its a no show so far and non of the other agilex fpgas are even stocked anyway because they are too expensive).

Mistex on a Kintex (300k+ luts and faster logic) + Orange Pi Zero 2W looks very interesting honestly.
https://www.youtube.com/watch?v=hXLaA0ITzy8

Armakuni · Unread post by **Armakuni** » Sun Mar 17, 2024 10:26 am

cb88 wrote: ↑Wed Mar 13, 2024 3:02 pm
Armakuni wrote: ↑Sun Oct 08, 2023 5:36 pm
I wouldn't get too carried away just yet. It still doesn't have enough resources to handle Nintendo DS which was confirmed by Robert.

It's worth pointing out also that after a certain point the need for cycle accurate emulation diminishes because everyone stopped coding close to the hardware... dreamcast and up probably don't need cycle accurate CPUs or GPUs at all.

Mistex on a Kintex (300k+ luts and faster logic) + Orange Pi Zero 2W looks very interesting honestly.
https://www.youtube.com/watch?v=hXLaA0ITzy8

Once systems started relying on GPUs they could not be designed cycle accurate due to frame to frame times, so software is really a better choice and we already a lot of mature emus for those systems.

Sorg posted a comment on the FB group years ago about why FPGA for retro gaming basically stops around Y2K and its due to multiple factors including how fast CPUs progressed in frequency and complexity. Some of new FPGAs seem very impressive but are also very expensive especially from AMD and Intel

The Amiga core CPU is held back by the TG68K open source CPU module rather than the DE10

MiSTex is a weird project as it converts the framework but the cores still need porting to the target FPGA and with support for multiple FPGAs will need multiple ports so might turn into a mess

barsmonster · Unread post by **barsmonster** » Tue Apr 30, 2024 8:16 am

1) It is very attractive to have GoWin 138K module installed on a single "motherboard", which has everything: multiple USBs directly without hubs, fast memory for today's level of emulation, Bluetooth / WiFi module, NVMe slot for proper storage, analog video output, keep Ethernet. This will make product much easier to assemble vs now, and allow solid slim cases. In hardware versions where all these features are unnecessary - components might be left unpopulated.
Update: Slim metal case also allows to transfer FPGA heat to the case via silicone pads, potentially without active cooler.

2) Migrating to a new FPGA platform is surely a very steep ask. It might be more attractive to do so when doing it on open source synthesis flow (Yosys / project Apicula - https://github.com/YosysHQ/apicula ). This will make it "last and forever" port, as future hardware changes will likely use the same open source synthesis toolset (i.e. much smaller delta when porting), if some level of hardware abstraction is there. Unfortunately, while many GoWin parts are already supported by Apicula, 138K is not. Among large FPGA's supported by Yosis now there are Xilinx ones, including large ones (_300k LE+) available in China. While it can be used for development in the short term, it might be not a sustainable path.

3) Efinix FPGA's are also attractive as they have many sizes in the same package at lower cost. It might be possible to have low-end MISTer LE with hardware enough to comfortably support 8-16bit consoles (which is enough for many users), while full capabilities in MISTer PRO. This might make MISTer accessible for wider range of users, as now fully-decked MISTer is quite expensive. But if FPGA is on a module - package size is less of a concern.

Still I think open source synthesis flow has largest potential in the long term.

Unread post by **Sorgelig** » Tue Apr 30, 2024 8:27 am

It's also important what technology process is used for FPGA. GoWin, AFAIK, uses 60nm process which is way too slow. Also GoWin internal PLL is rubbish, so it will be hard to impossible rates to port cores with their native clocks.
I've made one project on GoWin Tang 9K board. It's very low-end and slow FPGA. If 138K version is simple scale up of LE then it's not worth to even try.

barsmonster · Unread post by **barsmonster** » Tue Apr 30, 2024 8:38 am

Sorgelig wrote: ↑Tue Apr 30, 2024 8:27 am
If 138K version is simple scale up of LE then it's not worth to even try.

GoWin 138K is 22nm: https://www.gowinsemi.com/en/about/deta ... t_news/76/

barsmonster · Unread post by **barsmonster** » Tue Apr 30, 2024 7:09 pm

barsmonster wrote: ↑Tue Apr 30, 2024 8:38 am
Sorgelig wrote: ↑Tue Apr 30, 2024 8:27 am
If 138K version is simple scale up of LE then it's not worth to even try.

GoWin 138K is 22nm: https://www.gowinsemi.com/en/about/deta ... t_news/76/

Some high-level comparison:
GoWin GW5AST-138 (version with embedded RISC-V SOC) vs Altera 5CSEBA6U23I7:

Size:
Block memory: 6120 vs 5570 kB
Distributed RAM: 1080 vs 621kB
Slightly less registers: 138k vs 166k
DSP blocks: 298 vs 224
PLLs: 12 (integer) vs 6+3 (fractional).

Performance:
Technology: 22nm vs 28nm
Core voltage: 0.9V/1.0V vs 1.1V (together with 22nm we can expect lower power consumption)
Transcievers: 8 (12.5Gbps) + hard PCI-E 2.0 8x vs 0
SoC clock: 800Mhz RISC-V AE350 vs 800Mhz ARM Cortex-A9
Block RAM frequency: 380Mhz vs 275Mhz
PLL VCO: 800 MHz ~ 2000Mhz vs 600-1600Mhz

So GW5AST-138 is almost everywhere slightly larger and slightly faster.

Main limitations are:
1) Integer PLL will have limitations on output frequency accuracy, not sure if it is critical as VCO frequency is quite high and/or we have so many PLLs we can cascade for more flexibility. If needed, we can gather statistics on frequencies used in cores, and select 1-2 extra quartz oscillators (which are quite cheap) on main board to make sure it is possible to synthesize required frequencies with high accuracy. If integer PLL can get the frequency right - it will likely have better phase noise performance vs fractional.
2) Likely quirky SoC (even though performance is likely the same). As it's early days GoWin SoC / RISC-V - I expect pain and suffering on making it do what we need it to do. Few ideas how to address this will follow. Open source flow is unlikely to support SoC in reasonable timeframe.

barsmonster · Unread post by **barsmonster** » Tue Apr 30, 2024 11:03 pm

On SoC: Any embedded hard core will always be significant long-term vendor lock-in. It won't be easy to migrate for example to embedded RISC-V as it's quite fresh will have to endure all the pain and suffering (at least without extensive GoWin technical support). As open source synthesis support of any hard processor is unlikely, the following can be considered:

1) Separate compute module, for example on CM-style module. CM modules from Raspberry are quite expensive, but alternative suppliers (for example from Orange Pi) can be quite a bit cheaper. Level of community support is varying. CM can output overlay video for example via DSI, which goes into FPGA, forward controllers input via SPI, forward internet and other data via serial port(s). It opens possibility to update system at a later stage, by installing maximum performance CM4 or future CM5 module for cores that will do some significant work on SoC side (for example for emulation of more modern consoles - PS2, and in the future PS3). With CM4/5 - NVMe SSD will be straightforward to support. It is important to have good thermal contact to chassis and/or integrated cooling solution both for CM and FPGA modules, as CM4 is already quite hot (and CM5 will be even hotter).

2) Separate low-end linux module, like Milk-V Duo. No direct video overlay, but rather text ui is sent via SPI or something to FPGA.

3) Soft-core Linux-capable SoC, baked into all cores. For example based on LiteX/VexRiscV. It will eat small part of FPGA, and will work at ~ 15-25% speed of today's hard SoC (~ 150-200Mhz). But it will be completely under our control and portable to any FPGA architecture in the future. As MISTer does not do much computation on SoC side - it can work good enough.

4) Dual FPGA modules: during normal operations one is for soft-core SoC, second for emulation. For maximum capabilities largest cores can utilize both FPGA's. It can be made in a way that there is a minimal configuration with one FPGA module, and it is possible to install second FPGA module for large cores support. Minimal configuration can have smaller FPGA chip, for example 60k LE (to support smaller 8-16-bit cores at lower cost), with second FPGA on a module of full size (138k). So 60K chip can be soldered on the board, and 138K module can be installed later as an upgrade for large cores.

Common for all cases:
DRAM: As we don't need much DRAM by today's standards, does it make sense to remove DDR3 and have more channels of HyperRAM? Such low-latency FPGA module can be produced if we will have large order volume. New module might make sense anyways, if we go with GW5AT chip rather than GW5AST (i.e. without hard SoC, if it is cheaper in high volume).
High-speed IO:Necessity of high-speed interfaces need to be seriously considered. Non-transciever chips are cheaper anyways, and are hard to support especially in opensource. I like NVMe drives, but they are not easy to get working when directly connecting to FPGA. Non-transciever GoWin can go as high as 1.5G on all pins, so maybe even SATA1 for M.2 SSD can work eventually - which can be reliable and fast enough data storage solution.

With no hard IP lock-in and open source synthesis it will be quite easy for amateurs to port MISTer 2.0 to new architectures in the following decades, which might unleash some creative products.

Unread post by **Sorgelig** » Thu May 02, 2024 10:17 am

It's still unclear what they mean by "22nm SRAM process". Why don't write simply "22nm process"? Sounds fishy to me..
PLL role in cores is very important. Some cores may use even 6 clocks from single PLL. Most cores use fractional divisor. HDMI is absolutely needs fractional PLL or pixel clocks won't be good. vsync_adjust won't work with integer PLL. etc, etc... in other words fractional PLL is a must!
Besides the integer-only mode in PLL, in their small FPGAs it was hard to use more than 1 output from PLL. So this is another big issue (unless it's fixed in newer FPGAs). The only good thing in their PLL was ability to feed any signal as a source of clock unlike Intel FPGA.

MiSTer uses almost all aspects of integrated SOC-FPGA chip. It uses shared memory between SOC and FPGA. DDR3 is used as a frame buffer for HDMI.
HyperRAM/PSRAM cannot replace SDR SDRAM due to latency and uncontrolled refresh cycles.

I didn't mention all aspects. Most likely everything can be re-made or solved but it will require a lot different framework and even some parts of cores get rewritten.

barsmonster · Unread post by **barsmonster** » Tue May 07, 2024 4:10 am

Sorgelig wrote: ↑Thu May 02, 2024 10:17 am
It's still unclear what they mean by "22nm SRAM process". Why don't write simply "22nm process"? Sounds fishy to me..

They write it to differentiate to EEPROM process, which is typically used in low-power but slower devices. For example, smaller GoWin LittleBee GW1NR-9 uses 55nm EEPROM process.

I've checked few popular cores and indeed requested frequencies are very fractional. I guess the only thing that could have allowed use of simpler FPGA's is 1-2 external frequency synthesizers with multiple output . But it surely makes system much more complicated and harder to maintain.

NESTang on GoWin FPGA uses PLL at 27Mhz * 55 / 4 = 371.25Mhz, to finally get 74.25Mhz pixel clock (720p).
For comparison, MiSTer NES core uses 85.909088 MHz.

Meanwhile, I realized that what I was proposing "external single board computer + FPGA" is implemented in MiSTeX. It uses SPI + 4 GPIO lines to interface between SBC and FPGA. Main issue as far as I see it right now is slow FPGA reconfiguration, but it's still very early days. MiSTeX developers are very sceptical about soft-core SoC performance though.

MiSTer FPGA Forum

Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform

Re: Discussion and Development of a 'MiSTer 2.0' Modular Hardware Platform