Lets actually try Hybrid Emulation

foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Lets actually try Hybrid Emulation

Unread post by foft »

Hybrid has been talked about for years:
https://www.atari-forum.com/viewtopic.php?f=117&t=32674

I've decided to give it a try! I know vhdl since I made the Atari800 core, but have a bit to learn about minimig and 68k world.

Will post here when I have something running.

Steps...
i) DONE: Checkout/build a working mister core - as is
ii) IN PROGRESS: Familiarise myself with minimig core a bit
iii) Checkout/build github.com/aranym/aranym/tree/master/src/uae_cpu - as is
iv) Remove 68k from minimig and add avalon slave interface: either qsys or cyclonev_hps_interface_hps2fpga. I know how to do with qsys so might liberally discard half the mister stuff temporarily to use that for now...
v) Try accessing minimig from arm side. mmap and write to the background colour.
vi) Try uae_cpu against the avalon slave, with diagrom
vii) Try booting real kickstart
viii) Try booting workbench!
ix) Remove 2nd cpu from scheduler in uboot, force 68k jit onto this cpu.
x) Discuss with Alexey how to merge and reactivate anything I had to hack out in (iv)

Wish me luck, I'm up to step (i) ;-)
User avatar
limi
Top Contributor
Posts: 619
Joined: Sun May 24, 2020 6:53 pm
Has thanked: 135 times
Been thanked: 418 times

Re: Lets actually try Hybrid Emulation

Unread post by limi »

Good luck! Excited to see what can come out of this.
Higgy
Posts: 83
Joined: Mon May 25, 2020 9:37 am
Has thanked: 4 times
Been thanked: 27 times

Re: Lets actually try Hybrid Emulation

Unread post by Higgy »

Sound fun, good luck with it.

So summarising I think you are keep custom Amiga chips in FPGA and using the ARM for the CPU.
You might know but there are 2-3 projects ongoing with actual Amiga hardware that are using an ARM to have a super CPU; PiStorm, Buffee and ? (This one is planned for A500 & A1200).
Those Devs might be able to help if you get any issues along the line.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

Sounds awesome!

Two comments:

I don't know how far he got, but I know Chaos was exploring this idea, so might be a good idea to compare notes and maybe avoid duplicating effort.

Also, the current Minimig core already has two different CPU implementations (TG68k for speed, fx68k for cycle-accuracy), and provision for switching between them - so it should be possible to add the hybrid as a third option without stripping out the existing ones completely. (Though of course that might be useful just to reduce build times.)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Thanks. Yeah I found the switching of CPU logic, that helped to see which signals are needed. We should definitely keep the two existing options, though I might strip it out for testing.

BTW for anyone interested in trying out hps-fpga comunication, this is worth a read.
https://zhehaomao.com/blog/fpga/2013/12 ... kit-3.html
I implemented this some time ago when I was playing with the sockit, replacing an internal ZPU with a process on Linux.
http://www.64kib.com/atarixlfpga_svn/tr ... xl/sockit/

So far I've just instantiated a qsys hps and wired it up. I think it may conflict with some of the existing manual cyclonev_... instances, I've not used those before. Hoping quartus will let me know.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Sure enough quartus did let me know... I was hoping that it'd work since I'd disabled all these in my qsys. So it seems that I need to replumb the existing ones to qsys, or directly instantiate HPS_INTERFACE_HPS2FPGA. I'm hesitant to directly instantiate it since there is a 3000+ page manual on this stuff and the qsys method is really simple.

Error (14566): The Fitter cannot place 7 periphery component(s) due to conflicts with existing constraints (1 HPS_INTERFACE_HPS2FPGA(s), 1 HPS_INTERFACE_CLOCKS_RESETS(s), 1 HPS_INTERFACE_FPGA2HPS(s), 1 HPS_INTERFACE_TPIU_TRACE(s), 1 HPS_INTERFACE_DBG_APB(s), 1 HPS_INTERFACE_BOOT_FROM_FPGA(s), 1 HPS_INTERFACE_FPGA2SDRAM(s)). Fix the errors described in the submessages, and then rerun the Fitter. The Intel FPGA Knowledge Database may also contain articles with information on how to resolve this periphery placement failure. Review the errors and then visit the Knowledge Database at https://www.altera.com/support/support- ... earch.html and search for this specific error message number.
lordoftime79
Posts: 97
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 2 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

This is something I am seriously looking forward to! well done on taking up the challenge!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Looks like I can just make it in qsys, then comment out the ones already used in sysmem.sv. I think sysmem.sv has its roots in qsys generated hdl at some point in the past?
User avatar
Grabulosaure
Core Developer
Posts: 78
Joined: Sun May 24, 2020 7:41 pm
Location: Mesozoic
Has thanked: 3 times
Been thanked: 92 times
Contact:

Re: Lets actually try Hybrid Emulation

Unread post by Grabulosaure »

How does it work in Amiga land?

The Chip RAM is only accessible to the CPU and don't need any cache coherency with the rest of the system (except for RTG video) ?

I suppose other architectures with 68K CPUs are likely to use coherent RAM for DMA.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I'm not an expert but I think the DMA controller on Agnes feeds data into Paula (audio), Denise (video) and itself (blitter/copper) from chip ram. I think this accesses chip ram every other cycle.

The 68k cache of course needs to be set appropriately to actually write/read from chip ram/hardware registers as needed. This is controlled by CACR I think.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Builds with qsys now, with some parts commended in here and the svmem. Just doing a little remaining avalon slave -> minimig plumbing then I can try some chip ram + register read/writes from the arm :-)
arglmauf
Posts: 8
Joined: Fri Mar 05, 2021 11:20 am
Has thanked: 3 times
Been thanked: 1 time

Re: Lets actually try Hybrid Emulation

Unread post by arglmauf »

Good luck champ.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So I wired up an hps avalon slave to the cpu_wrapper, temporarily replacing the fx68k (though I'll extend that mux later). It synthesizes ok without error. In theory this is enough to try accessing chip ram and registers from the arm, though I expect I will have a little bit to fix first!

I also need to add the interrupts, which I guess I can just add to f2h_irq.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Hit a small roadblock. /dev/fpga0 no longer exists in the Linux kernel 4 so I need to learn the new way to reconfigure the fpga!

Looking like the simple one liner dd (+ cat to disable/enable bridges) has been replaced by an incredibly complicated setup called a 'device tree overlay', which seems to involve writing some json-like patch to a configfs mount point. Or something like that...

---
Things found so far:
ls /proc/device-tree/ #device tree exists in filesystem form in proc
ls /proc/device-tree/soc/*fpga* #Some fpga devices in the device tree: bridge and fpga manager
ls /sys/firmware/devicetree/base/soc/*fpga* #mirror of proc?
mount -t configfs none /sys/kernel/config/ #configfs allows making runtime changes to the device tree, usually mounted here
ls -l /sys/kernel/config/device-tree/overlays/ #empty except this folder
https://rocketboards.org/foswiki/Docume ... eGenerator #Has documentation on device trees in general
https://forum.rocketboards.org/t/load-f ... 0nano/1836 # Useful snippets!
installed dtc compiler does not seem to understand /plugin/ in the .dts file
---
Voodoo for overlay:
Put this in program.dts

Code: Select all

/dts-v1/;
/{
fragment@0 {
target-path = "/soc/base-fpga-region";
  #address-cells = <1>;
  #size-cells = <1>;
  __overlay__ {
  	#address-cells = <1>;
  	#size-cells = <1>;
  	firmware-name = "core.rbf";
  };
};
};
Then run this to flash:

Code: Select all

cp _Computer/Minimig_20210308.rbf /lib/firmware/core.rbf
mount -t configfs none /sys/kernel/config/
mkdir /sys/kernel/config/device-tree/overlays/programfpga
dtc --in-format dts --out-format dtb program.dts > /sys/kernel/config/device-tree/overlays/programfpga/dtbo
If all is well will see core.rbf here:
cat /proc/device-tree/soc/base-fpga-region/firmware-name
...
Problem: doesn't flash rbf!
---
https://elixir.bootlin.com/linux/latest ... region.txt #More light reading
---
Unclear why it doesn't work, is there a kernel patch?
https://github.com/MiSTer-devel/Linux-Kernel_MiSTer
Nope, looks normal. Should see fpga_mgr_firmware_load in fpga-mgr.c which logs 'writing to'.
---
Plan C: fpga-mgr device tree overlays do not work, no idea why. So use direct hardware access.
https://github.com/nhasbun/de10nano_fpga_linux_config
This tool runs, logs stuff that looks promising, then linux crashes!
It works if I kill the Mister process first :-)
Then I start mister afterwards and I see the core and mister communicating. However the core does not work beyond the overlay menu, so clearly missing one step still.
Final step: start /media/fat/Mister from the / working directory. Now we're cooking with gas!
---


In summary
--------------
Cross compiler working
Can flash rbf core locally on mist and have it work (yes I know I can flash a sof via jtag, wasn't sure how that'd work with the hps2fpga bridges...)
Trying to map memory next and talk to chip ram
mahen
Posts: 185
Joined: Sun May 24, 2020 8:25 pm
Has thanked: 19 times
Been thanked: 6 times

Re: Lets actually try Hybrid Emulation

Unread post by mahen »

Good luck ! Much appreciated effort !!
User avatar
Sorgelig
Site Admin
Posts: 877
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 211 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

All FPGA interfaces are accessible through mmap of registers address space. This is how Main works. So kernel is not required for this. You may want to explore fpga_io.cpp from Main.

Probably you will need to enable bridges in u-boot as well. Some HPS registers must be configured on very early boot stage.

about Amiga memory: You will need to use only ChipRAM on FPGA side. FastRAM is accessible only by CPU and better to use on HPS side in emulated CPU only. You will need to mmap the same region of FastRAM as on FPGA.
Cache controller on FPGA won't be required as HPS will use it's own ARM cache controller.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Looks like the bridges are enabled already:
/root# cat /sys/class/fpga_bridge/br0/name
lwhps2fpga
/root# cat /sys/class/fpga_bridge/br0/state
enabled
/root# cat /sys/class/fpga_bridge/br1/name
hps2fpga
/root# cat /sys/class/fpga_bridge/br1/state
enabled
/root# dmesg | grep bridge
[ 0.240318] altera_hps2fpga_bridge ff400000.fpga_bridge: fpga bridge [lwhps2fpga] registered
[ 0.240631] altera_hps2fpga_bridge ff500000.fpga_bridge: fpga bridge [hps2fpga] registered

I thought this meant they were at 0xff400000 and 0xff500000. Though mmapping this area then doing a read does not set any of the chipselect,read or write signals on the avalon slave. On the SOCkit it was just a case of mmaping 0xc0000000, I wonder if I need that as an offset... Hmmm. (edit: guess not, that'd be out of range!)

Let me check out that file you mention for clues (fpga_io.cpp).
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Ah, perhaps that programming command I'm using is disabling the bridges... - or MiSTer even
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Reading the technical manual 0xff500000 is the address of the module, which has a bunch of config - rather than where the axi slave is mapped... Better read some more!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So looks like it is still at 0xC0000000, so not sure why it doesn't work...

"The FPGA Slaves RegionThe Cortex-A9 MPU subsystem supports the variable-sized FPGA slaves region to communicate withFPGA-based peripherals. This region can start as low as 0xC0000000, depending on the L2 cache filtersettings. The top of the FPGA slaves region is located at 0xFBFFFFFF. As a result, the size of the FPGAslaves region can range from 0 to 0x3C000000 bytes."
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Wondering if its some of the stuff I commented out and left in sysmem - e.g. cyclonev_hps_interface_clocks_resets is slightly different. Trying that...
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Yes, that seemed to work!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Though it also broke the sysmem stuff... So I need to have some combination.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So my version which makes hps bridge work:
cyclonev_hps_interface_clocks_resets clocks_resets(
.f2h_pending_rst_ack({
1'b1 // 0:0
})
,.f2h_warm_rst_req_n({
1'b1 // 0:0
})
,.f2h_dbg_rst_req_n({
1'b1 // 0:0
})
,.h2f_rst_n({
h2f_rst_n[0:0] // 0:0
})
,.f2h_cold_rst_req_n({
1'b1 // 0:0
})
);

& sysmem version, which makes sysmem work but breaks hps bridge:
//cyclonev_hps_interface_clocks_resets clocks_resets(
// .f2h_warm_rst_req_n({
// f2h_warm_rst_req_n[0:0] // 0:0
// })
//,.f2h_pending_rst_ack({
// 1'b1 // 0:0
// })
//,.f2h_dbg_rst_req_n({
// 1'b1 // 0:0
// })
//,.h2f_rst_n({
// h2f_rst_n[0:0] // 0:0
// })
//,.f2h_cold_rst_req_n({
// f2h_cold_rst_req_n[0:0] // 0:0
// })
//,.h2f_user0_clk({
// h2f_user0_clk[0:0] // 0:0
// })
//);
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

OK, this is a little fiddly. There are signals in both sysmem and the qsys ip that need to connect to cyclonev_hps_interface_clocks_resets. I guess I need to plumb them into each other. Or add them all to my qsys and then wire up sysmem.sv to that.

Wishing that verilog or vhdl had a 'goto' of kinds!
Neocaron
Posts: 341
Joined: Sun Sep 27, 2020 10:16 am
Has thanked: 187 times
Been thanked: 66 times

Re: Lets actually try Hybrid Emulation

Unread post by Neocaron »

I like this frankenstein aproach, it could lead to impossible core on the Mister. If this pan out it could have great implications. How many cpu cores can be used for emulation on the arm chip and at what speed?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It’s dual arm A9’s at 800MHz. Clearly some of that needs to be used for handling the usb stack and sd card access.
I was thinking of dedicating one of the cpus to the 68k emulation. I guess theoretically could do smarter stuff like run jit compilation on one and execution on the other. I will just be doing something basic to get this running ‘pretty well’ then leave further optimisation for others.
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

You should probably do a single dedicated core. If you look on a running mister, one of the cores is completely pinned at 100% usage, that's Main_Mister's polling loop. Any CPU you steal from that loop can increase input processing delay etc.

Honestly though it's probably going to be the job of Main_Mister to schedule the 'companion' emulator properly when it starts it up. So maybe not really a concern for you; although I'd manually pin your cpu emulator to the unused core during testing
Neocaron
Posts: 341
Joined: Sun Sep 27, 2020 10:16 am
Has thanked: 187 times
Been thanked: 66 times

Re: Lets actually try Hybrid Emulation

Unread post by Neocaron »

foft wrote: Fri Apr 09, 2021 9:01 pm It’s dual arm A9’s at 800MHz. Clearly some of that needs to be used for handling the usb stack and sd card access.
I was thinking of dedicating one of the cpus to the 68k emulation. I guess theoretically could do smarter stuff like run jit compilation on one and execution on the other. I will just be doing something basic to get this running ‘pretty well’ then leave further optimisation for others.
Very interesting, how would one CPU core compare to a P3 733mhz if anyone knows in raw power?
Does the arm chip has any OC capabilities? What is the lowest latency possible between the arm and fpga chip?

Sorry for all the questions, but it's sooo interesting and exciting! :mrgreen:
Good luck and have fun!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Re MiSTer polling, sure we could make it interrupt based if more cpu is needed. Anyway yes I plan to lock it to the 2nd cpu at real-time priority for now.

Re Pentium III: Unlikely! In raw performance I see some papers saying that cortex A9 is comparable to an Atom, which does look close to P3. Still I’d expect it to beat the A0486 cpu and potentially leave space in that core for other items.
Post Reply