Lets actually try Hybrid Emulation

Neocaron
Posts: 341
Joined: Sun Sep 27, 2020 10:16 am
Has thanked: 187 times
Been thanked: 66 times

Re: Lets actually try Hybrid Emulation

Unread post by Neocaron »

I like this frankenstein aproach, it could lead to impossible core on the Mister. If this pan out it could have great implications. How many cpu cores can be used for emulation on the arm chip and at what speed?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It’s dual arm A9’s at 800MHz. Clearly some of that needs to be used for handling the usb stack and sd card access.
I was thinking of dedicating one of the cpus to the 68k emulation. I guess theoretically could do smarter stuff like run jit compilation on one and execution on the other. I will just be doing something basic to get this running ‘pretty well’ then leave further optimisation for others.
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

You should probably do a single dedicated core. If you look on a running mister, one of the cores is completely pinned at 100% usage, that's Main_Mister's polling loop. Any CPU you steal from that loop can increase input processing delay etc.

Honestly though it's probably going to be the job of Main_Mister to schedule the 'companion' emulator properly when it starts it up. So maybe not really a concern for you; although I'd manually pin your cpu emulator to the unused core during testing
Neocaron
Posts: 341
Joined: Sun Sep 27, 2020 10:16 am
Has thanked: 187 times
Been thanked: 66 times

Re: Lets actually try Hybrid Emulation

Unread post by Neocaron »

foft wrote: Fri Apr 09, 2021 9:01 pm It’s dual arm A9’s at 800MHz. Clearly some of that needs to be used for handling the usb stack and sd card access.
I was thinking of dedicating one of the cpus to the 68k emulation. I guess theoretically could do smarter stuff like run jit compilation on one and execution on the other. I will just be doing something basic to get this running ‘pretty well’ then leave further optimisation for others.
Very interesting, how would one CPU core compare to a P3 733mhz if anyone knows in raw power?
Does the arm chip has any OC capabilities? What is the lowest latency possible between the arm and fpga chip?

Sorry for all the questions, but it's sooo interesting and exciting! :mrgreen:
Good luck and have fun!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Re MiSTer polling, sure we could make it interrupt based if more cpu is needed. Anyway yes I plan to lock it to the 2nd cpu at real-time priority for now.

Re Pentium III: Unlikely! In raw performance I see some papers saying that cortex A9 is comparable to an Atom, which does look close to P3. Still I’d expect it to beat the A0486 cpu and potentially leave space in that core for other items.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Re latency question: probably a question of how long it waits to be handled by the slower clocked fpga side
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So continuing the dev... the actual component is defined in:
quartus/libraries/vhdl/wysiwyg/cyclonev_components.vhd

Code: Select all

component cyclonev_hps_interface_clocks_resets
        generic (
                h2f_user0_clk_freq      :       natural := 100;
                h2f_user1_clk_freq      :       natural := 100;
                h2f_user2_clk_freq      :       natural := 100;
                lpm_type        :       string := "cyclonev_hps_interface_clocks_resets"        );
        port(
                f2h_cold_rst_req_n      :       in std_logic := '0';
                f2h_dbg_rst_req_n       :       in std_logic := '0';
                f2h_pending_rst_ack     :       in std_logic := '0';
                f2h_periph_ref_clk      :       in std_logic := '0';
                f2h_sdram_ref_clk       :       in std_logic := '0';
                f2h_warm_rst_req_n      :       in std_logic := '0';
                h2f_cold_rst_n  :       out std_logic;
                h2f_pending_rst_req_n   :       out std_logic;
                h2f_rst_n       :       out std_logic;
                h2f_user0_clk   :       out std_logic;
                h2f_user1_clk   :       out std_logic;
                h2f_user2_clk   :       out std_logic;
                ptp_ref_clk     :       in std_logic := '0'
        );
end component;
It looks like the generics are not set in either case.

So, I need to instantiate in sysmem and just plumb h2f_rst_n back out to the qsys I think.

Code: Select all

signal                 dir  type        sysmem              hps fpga bridge  merged
f2h_cold_rst_req_n     in   std_logic   f2h_cold_rst_req_n  1                f2h_cold_rst_req_n
f2h_dbg_rst_req_n      in   std_logic   1                   1                1
f2h_pending_rst_ack    in   std_logic   1                   1                1
f2h_periph_ref_clk     in   std_logic
f2h_sdram_ref_clk      in   std_logic
f2h_warm_rst_req_n     in   std_logic   f2h_warm_rst_req_n  1                f2h_warm_rst_req_n
h2f_cold_rst_n         out  std_logic;
h2f_pending_rst_req_n  out  std_logic;
h2f_rst_n              out  std_logic;  h2f_rst_n           h2f_rst_n        h2f_rst_n
h2f_user0_clk          out  std_logic;  h2f_user0_clk                        h2f_user0_clk
h2f_user1_clk          out  std_logic;
h2f_user2_clk          out  std_logic;
ptp_ref_clk            in   std_logic
			
DONE: fingers crossed that it now works!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So this is reading the first 100 words...

1114:4ef9:00f8:00d2:0000:ffff:0028:0044:0028:000a:ffff:ffff:0041:4d49:4741:2052:4f4d:204f:7065:7261:7469:6e67:2053:7973:7465:6d20:616e:6420:4c69:6272:6172:6965:7300:436f:7079:7269:6768:7420:a920:3139:3835:2d31:3939:3320:0043:6f6d:6d6f:646f:7265:2d41:6d69:6761:2c20:496e:632e:2000:416c:6c20:5269:6768:7473:2052:6573:6572:7665:642e:0033:2e31:2052:4f4d:2000:6578:6563:2e6c:6962:7261:7279:0065:7865:6320:3430:2e31:3020:2831:352e:372e:3933:290d:0a00:4e71:4e71:4afc:00f8:00b6:00f8:3706:0228:0969:00f8:008e:

Which seems to match the kickstart rom:
00000000 11 14 4e f9 00 f8 00 d2 00 00 ff ff 00 28 00 44 |..N..........(.D|
00000010 00 28 00 0a ff ff ff ff 00 41 4d 49 47 41 20 52 |.(.......AMIGA R|
00000020 4f 4d 20 4f 70 65 72 61 74 69 6e 67 20 53 79 73 |OM Operating Sys|
00000030 74 65 6d 20 61 6e 64 20 4c 69 62 72 61 72 69 65 |tem and Librarie|
00000040 73 00 43 6f 70 79 72 69 67 68 74 20 a9 20 31 39 |s.Copyright . 19|
00000050 38 35 2d 31 39 39 33 20 00 43 6f 6d 6d 6f 64 6f |85-1993 .Commodo|

so then I did this on the arm:
for (;;)
{
int i = 0xdff180;
virtual_base16[i/2] = rand();
}

and I get this:
Attachments
IMG_8644.JPG
IMG_8644.JPG (3.33 MiB) Viewed 6786 times
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So, it works... Now need a 68K emulator to plug in and also to plumb interrupts. I could add another avalon slave for polling interrupts thought it'd be better with real interrupts. I'll try the slave method for now then improve after.

Doing some timing and seem to get ~2MB/s. So better fix that!
Blitzwing
Posts: 103
Joined: Sat Sep 05, 2020 9:52 pm
Has thanked: 11 times
Been thanked: 24 times

Re: Lets actually try Hybrid Emulation

Unread post by Blitzwing »

foft wrote: Sat Apr 10, 2021 9:34 am So, it works... Now need a 68K emulator to plug in and also to plumb interrupts. I could add another avalon slave for polling interrupts thought it'd be better with real interrupts. I'll try the slave method for now then improve after.

Doing some timing and seem to get ~2MB/s. So better fix that!
Congrats, great to see something come to life.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I'm only supposed to achieve 3.5MB/s on A500 and 7MB/s on A1200 right?

It seems that there is delay as follows:
i) 70% waiting for chip ram.
ii) 30% waiting for avalon to start next transfer.

I think tackling (ii) by making use of waitrequestAllowance will get it up to 3.5MB/s.

Does anyone have to hand actual MB/s for AGA and OCS chip reads and writes on the Minimig?
User avatar
Sorgelig
Site Admin
Posts: 877
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 211 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

qsys is heavy and cluttered system.
I suggest to generate the code in qsys and then just take *fpga_interfaces module which will include all bridges you've configured in qsys.
you don't even need to add it to framework. You can instantiate the bridges right in your code. Just make sure you don't use already used modules such as DDR.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It seems to have a fair bit of interconnect logic in there between the axi mm bridge and the slaves (hps_fpga_bridge_mm_interconnect_0). Perhaps removing that will cut some latency. I'm checking the signals on the logic analyzer to see where they appear on the master side.

Intel do seem to want everyone to use qsys, given the gui does not allow instantiating these lower level entities and there isn't proper documentation that I can find.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Seems worth clocking the Avalon slave a little higher.

I had been using the 28MHz system clock. If I make the slave response immediate (i.e. never set waitrequest) with a tight loop on the HPS side doing 16-bit reads I get an access every 10 cycles. i.e. 2.8MHz, so 5.6MB/s.
With the clock at 4x (114MHz), I get an access every 16 cycles so 7.1MHz or 14.2MB/s.

Of course to get more throughput I could use 32-bit/64-bit or even 128-bit (giving 112MB/s max). However this is chipram we're talking about, so even on A1200 7MB/s will cut it.

Also of course I'm still talking about an immediate chipram response, which is far from true.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

On MiST and TC64 the SDRAM controller and CPU logic runs at 114MHz - necessary since on those platforms everything has to run from a single SDRAM - MiSTer uses a simpler SDRAM controller and lower clock since it only has to deal with Chip RAM.

Making it 32-bits wide makes sense, since AGA machines had that bus width. If it turns out that you can't match AGA chip RAM bus speeds it won't be completely disastrous, because (a) for games it makes more sense to use the FPGA-side CPUs, and (b) for the kinds of things where the speed of the hybrid emulation make more sense, chances are you'll be using RTG. (Also, until recently MiST Minimig's chip RAM speed was somewhat lower than a real AGA machine, and in practice most games were still fine.)
User avatar
Sorgelig
Site Admin
Posts: 877
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 211 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

I think it's better to drop the idea to have both FPGA and HPS CPUs in a single core. It will require a lot of interconnects and non-optimal work. We still can have 2 cores - original Minimig and hybrid one. They may share the same internal name and use the same folders and files, so it won't have much difference between changing the CPU inside the core or load another core.
User avatar
Sorgelig
Site Admin
Posts: 877
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 211 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

Actually, ao486 core is more suitable for first hybrid attempt. It requires less work on integrating as there is no special ChipRAM there nor timing accurate bus. it's also grown from avalon system bus. It was using real avalon when it was based on qsys design.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

For either ao486 or minimig its the same setup that we need to get right. i.e understand and connect up the hps-fpga bridge and get it working efficiently. Figure out the interrupt plumbing to the arm: e.g. polled avalon slave or using the fpga2hps interrupts and writing a kernel driver to handle it.

How much throughput do we need on ao486? In some ways minimig is not demanding because the cpu-chip ram interface is so slow on the A500 and A1200 anyway! We might need to push it further on ao486?

Last night I did some tests on the bridge. With a 16-bit bridge I checked 8-bit accesses, misaligned 16-bit accesses and 32-bit accesses (aligned and misaligned 1 byte,2 bytes). All of these 'just work' which is convenient.

As I said earlier I tried clocking the axi bridge higher and got much better performance in terms of (nop) transactions per second. I used the x4 clock since it was there, though it could be clocked higher. Of course this means we need to add clock domain crossing. Since the clocks are aligned from the same pll this is just a case of putting a register in to make meeting timing workable (just about anyway...). I found in qsys there is a handy component to do cross domain sync for me, so thought I'd save the work by including the avalon mm cross domain crossing bridge (works using a fifo). However it didn't work, I don't know why so back to plan A of just adding this myself. As a reminder it took 10 cycles per transaction at 28MHz and 16 cycles at 4x, so 10 vs 4 at 28MHz. So adding a 2 cycle delay will get us to 10 vs 6 (worst case).
edit: small note: I put the axi bridge itself on signaltap to see when the axi levels are triggered, so see if stripping that layer is worthwhile. I see maybe 2-3 (axi clock) cycles here. AXI looks much more complex though so I think its not worth it.

I also make a start on cross compiling the cpu code so I can try running some software soon.

First things first though, I have to fix the springs on my dishwasher door!
chanunnaki
Posts: 104
Joined: Tue Jul 07, 2020 1:33 am
Been thanked: 19 times

Re: Lets actually try Hybrid Emulation

Unread post by chanunnaki »

Forget about the dishwasher, this is far more urgent. :P
lordoftime79
Posts: 97
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 2 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

This is amazing progress! really well done!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Status:
i) Mushahi cross compiled/wired up in the most basic way!
ii) Interrupts exposed, for now polled every 100 cycles!
iii) Bus speed up completed
edit iv) Dishwasher door fixed!

So, fingers crossed... here we go!
Neocaron
Posts: 341
Joined: Sun Sep 27, 2020 10:16 am
Has thanked: 187 times
Been thanked: 66 times

Re: Lets actually try Hybrid Emulation

Unread post by Neocaron »

Good luck :D
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Code: Select all

bool works = false;
while(!works)
{
  fix();
  works = test();
}
Getting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Mon Apr 12, 2021 5:52 pmGetting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!
"Hopefully this will now not work less than it didn't work before!"
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Diagrom is now running...

I was just running into endian issues, which I fixed fairly quickly. The HPS-FPGA is little endian, so I need to do conversion on both sides. I missed one of the places in the hardware side - there are two writedata's for some reason - which confused me quite a lot until I found it.
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

foft wrote: Mon Apr 12, 2021 5:52 pm

Code: Select all

bool works = false;
while(!works)
{
  fix();
  works = test();
}
Getting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!
I can't help but notice your test function is undefined...
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

zakk4223 wrote: Mon Apr 12, 2021 6:27 pm I can't help but notice your test function is undefined...
Ah, that's the problem. I thought it was the lack of a fix function all along!

I've got something weird still with the chipmem accesses. Sometimes it responds much more slowly and sometimes not at all.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I added fast ram. Oddly I'm getting spurious memory failures when using it. This is straight from the arm, so no fpga side involved (unless the fpga-hps bridge is writing here?).

I'm mmapping 0x20000000-0x20000000+384MB. That is the correct area reserved in the DDR for fast ram right?

edit: changed to 0x10000000 and it seems happier - be good to know the 'correct' address though.
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

Code: Select all

#define MISTER_SCALER_BASEADDR     0x20000000
That might explain things ;)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Haha, that'd do it! Do you know where the fast ram should go?

I can get to the boot screen and start loading some stuff now. Though its rather crashy! Probably this chip ram timing issue I need to dig into.
Post Reply