Basic architecture questions

Discussion of developmental aspects of the MiSTer Project.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Basic architecture questions

Unread post by ethern0t »

I'm having a hard time finding any documentation on how the MiSTer system is architected, maybe it's buried over in the dead Atari forums?

Anyway, I'm happy to contribute to any sort of documentation myself - in particular, a quick start guide for developers (for example, I'm not the only one who didn't notice the "you need to use version 17 of Quartus" at the bottom of the docs for the template core).

1. How is a VGA signal generated?

My assumption is that each core generates appropriate timings and generates Vsync, Hsync, and pixel data directly.

2. How is an HDMI signal generated?

Generating a 720p or better signal seems like it would take up a huge amount of bandwidth. Perhaps it's really just a dumb frame buffer in shared FPGA/ARM memory? I keep hearing that the HDMI logic doubles the compile time of most cores, which is why a lot of people doing core development disable it entirely.

3. How do you disable HDMI to speed up compilation?

The NES core references a DEBUG_NOHDMI symbol but there doesn't seem to be a consistent way to disable HDMI across cores. It looks like some preprocessor defines are being set in sys/*.tcl files.

4. How do you launch a particular core from the Quartus Programmer?

If the core doesn't need access to any ROM, like the Vectrex one or the SDRam test, it seems to work fine. Anything else seems to do nothing for me; it's not clear whether the Linux side needs to be running or not, and specifically how you make that happen. There's mention of a boot.rom, but I can't figure out the appropriate format.
User avatar
Grabulosaure
Core Developer
Posts: 78
Joined: Sun May 24, 2020 7:41 pm
Location: Mesozoic
Has thanked: 3 times
Been thanked: 92 times
Contact:

Re: Basic architecture questions

Unread post by Grabulosaure »

@ethern0t
1) Each core generates its own video signal.
Preferably as close as possible to the original computer/game console, this what is normally outputted on the analog VGA output.
- Some arcade cores had a vertical screen so there is an optional screen rotation block that stores a frame in FPGA RAM blocks to rotate line/columns.
- Handheld consoles such as GameBoy had non-standard modes.

2) There is a scaler
It uses a framebuffer in DDR3 memory, 128bits wide data bus, 256bytes burst transfers. More than 1000MB/s. Original images from cores is stored in DDR3, not the up-scaled images which are generated on-the-fly, bandwidth requirements aren't that high.

3) Except for trivial cores which don't do much, disabling the scaler won't reduce that much compilation time.

4) Download the .BIT file after the Linux side has booted normally (showing menu...). ROM files are stored in the respective directories of each cores.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

@Grabuloasure, regarding HDMI - does the core specify video resolution and bit depth, and the ARM/Linux side allocates appropriate memory and gives the FPGA access to said memory (coming out of the gigabyte of shared RAM IIRC?)?

Do most cores just completely "fill" this memory every frame from scratch?
User avatar
Grabulosaure
Core Developer
Posts: 78
Joined: Sun May 24, 2020 7:41 pm
Location: Mesozoic
Has thanked: 3 times
Been thanked: 92 times
Contact:

Re: Basic architecture questions

Unread post by Grabulosaure »

@ethern0t
The scaler auto-detects image sizes (counting horizontal pixels with DE signal, the number of active lines, interlaced mode..).
There is 1GB on the DE10nano.
512MB is used by Linux : 0000_0000 ... 1FFF_FFFF
3*8MB is used by the scaler, for triple-buffering : 2000_0000 ... 2007_FFFF for the first buffer, 2008_0000 for the second.
Remaining memory is used for some audio buffers and cores.

When using the framebuffer mode for Linux console, ScummVM, the base address for the scaler is moved to the Linux-owned area and the framebuffer driver does its magic.

Each pixel generated by cores is written to DDR3 by the scaler then read back to generate an image on HDMI output.
Most cores (particularly consoles) don't have a real framebuffer where images are rendered, images are generated on-the-fly by overlaying background and objects and other effects.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

@Grabulosaure, that all makes sense, thank you!

I tried booting the MisTer, waiting for the OSD to show up, and then ran my core (Arcade_Bagman) from the Programmer but it still just produced a black screen, so I'm still doing something wrong. Is there something I have to edit in the core itself (which I understand may vary between cores) in order to boot it directly?

My goal is to create a "learning core" that can focus on the parts I care most about (cpu, tile and sprite hardware) and not have to worry so much about video signal generation and input device handling.

-Dave
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

Regarding synthesis time... I have a recent MacBookPro (8x2 cores, i9 I think but mobile not desktop obviously).

I tried the Template project, and on first compile (mm:ss)

A&S:1:32, Fitter: 3:30, total time was 5:20.

I tried setting DEBUG_NOHDMI in mycore.qsf, and the timings changed to

A&S: 0:21, Fitter: 1:53, total time was 2:28

To double-check there weren't other factors, I turned DEBUG_NOHDMI back off again and got:

A&S: 0:57, Fitter: 3:24, total time was 4:36

So it seems like Analsys & Synthesis only rebuilds what is necessary, but the fitter is kind of link a linker in software and pretty much has to run from scratch every time.

Coming from a large-scale C++ software development background, these iteration times are something I'll have to adjust to. Maybe it means I do a lot of work in Verilator first.

It also seems like disabling HDMI support does substantially improve iteration time though?

-Dave
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

Hmm... just noticed your qualifier of "except for cores that don't do much..."

I ran a clean.bat in the NES core and then built that from scratch (I couldn't figure out if it supported disabling HDMI, there certainly wan't a DEBUG_NOHDMI symbol I could find)

A&S :3:06, Fitter: 9:54, total time was 13:19

Without a way to disable HDMI in that core I can't really compare but it might save a minute or two if the numbers scale linearly from the template project.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

This article looks useful (ways to improve compilation time)

https://www.intel.com/content/dam/www/p ... i52022.pdf

-Dave
moscow
Posts: 4
Joined: Sun May 24, 2020 7:23 pm

Re: Basic architecture questions

Unread post by moscow »

ethern0t wrote: Wed Jun 10, 2020 12:19 pm This article looks useful (ways to improve compilation time)

https://www.intel.com/content/dam/www/p ... i52022.pdf

-Dave
Are you using Quartus Lite, or Pro/Standard? The first one seems not to have incremental building.
Did you (or maybe somebody else) find any specific tweaks that improve the speed of compilation on Lite version?
-chris-
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

Just the Lite version, and leaving something like that out of the free version doesn't surprise me. I'll play with some of the other settings though and see if they help. The fitter seems to be the real pig.
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

I changed Settings / Compiler Settings / Optimization mode from Performance (High Effort) to Balanced (Normal flow).

It didn't really help A&S at all (took 3:19) or the fitter for that matter (took 9:02). Total time was 12:53.

I changed Settings / Compiler Settings / Advanced Analysis & Sythesis Settings / Synthesis Effort to Fast

and Advanced Settings (Fitter) / Router Timing Optimization Level to Minimum, Physical Synthesis Effort Level to Fast, Fitter Effort to Fast Fit.

This was immediately after the previous run, and it seems like A&S does avoid redoing some work if modules didn't change.

A&S took 2:39 and fitting took 7:35, total time was 10:48, a slight improvement.
User avatar
nullobject
Core Developer
Posts: 16
Joined: Mon May 25, 2020 12:31 am
Has thanked: 5 times
Been thanked: 13 times

Re: Basic architecture questions

Unread post by nullobject »

ethern0t wrote: Tue Jun 09, 2020 4:31 pm Coming from a large-scale C++ software development background, these iteration times are something I'll have to adjust to. Maybe it means I do a lot of work in Verilator first.

It also seems like disabling HDMI support does substantially improve iteration time though?
Yes, unfortunately long compile times are the norm with MiSTer cores. I have heard jotego say that he gets much faster build times when compiling for MiST vs. MiSTer.

I can also attest to this. When I began building the Rygar core, I developed it without the MiSTer framework until I had most things completed. The MiSTer framework adds quite a chunk to the compilation time, even with HDMI disabled.

My advice is to focus on a single component at a time in standalone projects (e.g. sound, video, tile rendering, etc.). When you have it working, then you can integrate it back into your main core.

This is a good way to keep compilation times shorter. It also reduces the complexity of whatever it is you are implementing/debugging.
MiSTer core developer: Rygar, Gemini Wing, Silkworm, CAVE
Support me on Patreon
ethern0t
Posts: 40
Joined: Sat Jun 06, 2020 3:42 pm

Re: Basic architecture questions

Unread post by ethern0t »

Interesting. I can confirm that even the Template core still takes a few minutes to iterate on.

Maybe some sort of "minimal MiSTer" configuration define would be useful? Essentially it turns off everything except VGA and... something... PS/2 keyboard support has got to be pretty simple. I don't know what other parts are expensive to build.

It would still keep the same 'emu' interface, but hopefully would be much faster to iterate on?

Then once you have things working, turn the define off and get the full integration and device support.

-Dave
User avatar
Sorgelig
Site Admin
Posts: 877
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 211 times

Re: Basic architecture questions

Unread post by Sorgelig »

nullobject wrote: Fri Jun 12, 2020 7:32 am
ethern0t wrote: Tue Jun 09, 2020 4:31 pm Coming from a large-scale C++ software development background, these iteration times are something I'll have to adjust to. Maybe it means I do a lot of work in Verilator first.

It also seems like disabling HDMI support does substantially improve iteration time though?
Yes, unfortunately long compile times are the norm with MiSTer cores. I have heard jotego say that he gets much faster build times when compiling for MiST vs. MiSTer.

I can also attest to this. When I began building the Rygar core, I developed it without the MiSTer framework until I had most things completed. The MiSTer framework adds quite a chunk to the compilation time, even with HDMI disabled.

My advice is to focus on a single component at a time in standalone projects (e.g. sound, video, tile rendering, etc.). When you have it working, then you can integrate it back into your main core.

This is a good way to keep compilation times shorter. It also reduces the complexity of whatever it is you are implementing/debugging.
Completely misleading conclusion. Compilation time depends on FPGA fabric and Quartus version.
1) Using Quartus 13 gives faster compilation time than Quartus 17, but support of Cyclone V in Q13 was preliminary and sometimes it produces misbehaving core. So Q13 is good for debug compilation, but releases should be compiled in Quartus 17.
2) Compiling For Cyclone III is about 4-5 times faster than for Cyclone V. This is why compilation for MiST is much faster than for MiSTer.

Nothing to do with MiSTer framework besides about 5-10% longer compilation for full featured core (not Template or Menu cores which are basically empty cores).
User avatar
nullobject
Core Developer
Posts: 16
Joined: Mon May 25, 2020 12:31 am
Has thanked: 5 times
Been thanked: 13 times

Re: Basic architecture questions

Unread post by nullobject »

Sorgelig wrote: Sat Jun 13, 2020 1:35 pm Completely misleading conclusion. Compilation time depends on FPGA fabric and Quartus version.
1) Using Quartus 13 gives faster compilation time than Quartus 17, but support of Cyclone V in Q13 was preliminary and sometimes it produces misbehaving core. So Q13 is good for debug compilation, but releases should be compiled in Quartus 17.
2) Compiling For Cyclone III is about 4-5 times faster than for Cyclone V. This is why compilation for MiST is much faster than for MiSTer.

Nothing to do with MiSTer framework besides about 5-10% longer compilation for full featured core (not Template or Menu cores which are basically empty cores).
Fair enough. Maybe I misunderstood what jotego was telling me. Thanks for correcting me :D
MiSTer core developer: Rygar, Gemini Wing, Silkworm, CAVE
Support me on Patreon
Post Reply