Lets actually try Hybrid Emulation

foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I don't see the issue in musashi, so guess its a software problem.

I tried qemu user mode to dump 8,16,32-bit reads from kickstart rom. I see that qemu does some automatic endian voodoo.

/media/fat# ./qemu_system_test/go_user ./endian_m68k
ADDR:40801000 c0000000
11:14:4e:f9:00:f8:00:d2:00:00:ff:ff:00:28:00:44:00:28:00:0a:ff:ff:ff:ff:00:41:4d:49:47:41:20:52:
1114:4ef9:00f8:00d2:0000:ffff:0028:0044:0028:000a:ffff:ffff:0041:4d49:4741:2052:
11144ef9:00f800d2:0000ffff:00280044:0028000a:ffffffff:00414d49:47412052:
/media/fat#
/media/fat#
/media/fat# ./endian_arm
ADDR:b5e67000 c0000000
11:14:4e:f9:00:f8:00:d2:00:00:ff:ff:00:28:00:44:00:28:00:0a:ff:ff:ff:ff:00:41:4d:49:47:41:20:52:
1411:f94e:f800:d200:0000:ffff:2800:4400:2800:0a00:ffff:ffff:4100:494d:4147:5220:
f94e1411:d200f800:ffff0000:44002800:0a002800:ffffffff:494d4100:52204147:

I presume its the same for the 'machine' but will check. Being able to run 68k cross-compiled elf binary inside the machine will be useful anyway to debugging.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

One other idea... the qemu device can easily be set up to be mmio, to emulate hardware. I could do that for hardware regs which would allow easy logging of all hardware regs/writes.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Two other useful properties as wrapping the hardware regs as qemu ‘hardware’:
i) Qemu will do appropriate things with cache settings.
II) I can directly update the interrupt flag after a write to interrupt regs.
lordoftime79
Posts: 97
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 2 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

I think our folder structures must not be the same - I do have an Amiga folder with the hdfs in but its under fat/games... it works though!!! and it works good! how come the memory cant be higher than the 8meg setting?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Ah, well the 8MB was because I didn't figure out the autoconfig settings and DDR memory mapping.

I found in the code that DDR was mapped at 0x20000000. This was right, but I didn't spot the offsets. So 0x20000000 clashes with the scaler...

Anyway cpu_wrapper has the offsets:
// Main DDx RTG 8M 128M 256M
// ---- --- --- -- ---- ----
// SDR DDR RTG Z2 Z3_0 Z3_1
// 28 0 0 0 1 0 1
// 27 0 0 0 1 1 X
// 26 0 1 1 0 X X
// 25-23 0 111 110 0 X X
// supported configs: SDR + (Z2, Z3_1, Z3_0+Z3_1)

i.e. at 28000000-0x2FFFFFFF there is 128MB and at 30000000-0x3FFFFFFF there is 256MB. Which ties in nicely with...
the scaler and /proc/iomem.
0000000-1fefffff : System RAM
00008000-00ffffff : Kernel code
01100000-012bcfbf : Kernel data
ff702000-ff703fff : ethernet@ff702000
ff704000-ff704fff : dwmmc0@ff704000
ff706000-ff706fff : fpgamgr@ff706000
ff708000-ff708fff : gpio@ff708000
ff709000-ff709fff : gpio@ff709000
ff70a000-ff70afff : gpio@ff70a000
ffb40000-ffb4fffe : usb@ffb40000
ffb90000-ffb90003 : fpgamgr@ff706000
ffc02000-ffc0201f : serial
ffc03000-ffc0301f : serial
ffc04000-ffc04fff : i2c@ffc04000
ffc06000-ffc06fff : i2c@ffc06000
ffd02000-ffd02fff : watchdog@ffd02000
ffd05000-ffd05fff : rstmgr@ffd05000
ffe01000-ffe01fff : pdma@ffe01000
ffe01000-ffe01fff : pdma@ffe01000
fff00000-fff00fff : spi@fff00000
fff01000-fff01fff : spi@fff01000
ffff0000-ffffffff : ffff0000.sram

So I'd popped the 8MB fastmem at 0x10000000 after finding the scaler clash, right in the middle of linux ram! When according to the table above, it should be at 0x38000000...

Going to try fixing that mapping, irqs and hardware stuff now.
bbond007
Top Contributor
Posts: 521
Joined: Tue May 26, 2020 5:06 am
Has thanked: 86 times
Been thanked: 204 times

Re: Lets actually try Hybrid Emulation

Unread post by bbond007 »

lordoftime79 wrote: Sun Apr 18, 2021 9:34 am I think our folder structures must not be the same - I do have an Amiga folder with the hdfs in but its under fat/games... it works though!!! and it works good! how come the memory cant be higher than the 8meg setting?

maybe try "#ln -s /media/fat/Games/Amiga/ /media/fat/Amiga" from console or SSH and see if it works then

Let me know if it works and I can try to fix the patch to MiSTer...

also, if the first thing does not work, "#chmod 755 /media/fat/Games/Amiga/68000.sh" might not be a bad idea...

EDIT:
oops, messed up ln command
Anyway that is not the problem anyway...
lordoftime79
Posts: 97
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 2 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

bbond007 wrote: Sun Apr 18, 2021 4:06 pm
lordoftime79 wrote: Sun Apr 18, 2021 9:34 am I think our folder structures must not be the same - I do have an Amiga folder with the hdfs in but its under fat/games... it works though!!! and it works good! how come the memory cant be higher than the 8meg setting?

maybe try "#ln -s /media/fat/Amiga" from console or SSH and see if it works then

Let me know if it works and I can try to fix the patch to MiSTer...

also, if the first thing does not work, "#chmod 755 /media/fat/Games/Amiga/68000.sh" might not be a bad idea...
Hi there so tried the first thing but still getting black screen no activity on the led's gone back to the fat/games/Amiga and it works fine :)
bbond007
Top Contributor
Posts: 521
Joined: Tue May 26, 2020 5:06 am
Has thanked: 86 times
Been thanked: 204 times

Re: Lets actually try Hybrid Emulation

Unread post by bbond007 »

lordoftime79 wrote: Sun Apr 18, 2021 4:58 pm Hi there so tried the first thing but still getting black screen no activity on the led's gone back to the fat/games/Amiga and it works fine :)
Sorry,

try and edit 68000.sh
change
CPU_DIR="/media/fat/Amiga"
to
CPU_DIR="/media/fat/games/Amiga"

this should reflect the path to musashi_68020_mister

if its in /media/fat then put that...

I tested and it does work fine from /media/fat/Games/Amiga if that change is made
Attachments
68000.zip
(434 Bytes) Downloaded 142 times
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

This is frustrating, not making much headway here...

Here is the latest qemu:
i) Updated to use 6.0.0 rc3
ii) irq issue fixed (I think) using an io device to capture io writes and updating irq flag
iii) Mapped 384MB fastram
http://www.64kib.com/qemu_system_testv2.tar.xz

Unfortunately still the only thing that runs is diagrom!

Not sure what the problems are now, guess I should go back to tracing.
Attachments
qemuv2.patch.gz
(3.55 KiB) Downloaded 119 times
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

I notice the CIA test fails, complaining that all four timers are too slow (though the TODs are OK) - not sure how it measures that, but the numbers are approximately half what they should be. Each test is taking about 2 seconds of real time, and reporting a little over 1,000,000ms (I guess it means µs!) - whereas the regular core reports approximately 2,000,000.

For the hell of it, I cobbled together a tiny diagnostic ROM of my own, to make sure byte writes into chip RAM are OK, and they do seem to be.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I guess focussing on tracing hardware reads/writes/timings on that CIA test would be a good idea. The Diagrom source code is included so I can see exactly what its doing.

I wonder how long the JIT pauses are. In a JIT emulator the hardware usually stops while it compiles, here it keeps on ticking away. So we might 'miss' many scanlines. I wonder if that is having an impact. Anyway its best if I focus on something understood that I can dig into, it would seem unlikely that a 50 scanline 'gap' would prevent the kickstart logo form appearing. I wish I had a summary of what kickstart actually DOES before showing this logo, perhaps I can find a description somewhere online.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I should probably use this too, to debug in 68k space...
https://qemu-project.gitlab.io/qemu/system/gdb.html
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Wow, check out the raster test! I'd have thought it'd just jit compile that loop once and then ... work. However its all over the place.
lordoftime79
Posts: 97
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 2 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

you are making exellent progress, I would reccomend a little and often approach so you dont get "burn out" I have gone at a couple of retro computer projects a little too hard and a little too fast since the pandemic and it has caused me to become frustrated and annoyed. needless to say I havnt touched those projects since xmas :( what you are doing here is nothing short of amazing.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Mon Apr 19, 2021 8:03 am Wow, check out the raster test! I'd have thought it'd just jit compile that loop once and then ... work. However its all over the place.
Wow! :shock:

If it can't reliably busy-wait for raster positions that would explain the CIA test failing, too, since it appears to count frames using VHPOSR (despite the message saying LEV3 IRQ is required - which would imply frame counting in a VBlank intterrupt. I've just verified that VBlank interrupts are being triggered at the correct rate BTW, and I can't see evidence of any being missed. Might be good to test priorities though - does a level 3 interrupt correctly usurp a level 2 interrupt?)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It looks much better if I stop the mister process!
User avatar
WiteWulf
Posts: 42
Joined: Tue Feb 09, 2021 3:09 pm
Has thanked: 13 times
Been thanked: 10 times

Re: Lets actually try Hybrid Emulation

Unread post by WiteWulf »

The mister process typically uses ~100% of one CPU core on the HPS side. That can vary, depending on what it's doing, but input polling and disk IO are very processor intensive. This is in part due to Sorg never intending anything else to be running and has caused problems in the network throughout testing that I did for loading large disc images over the network.
Bas
Top Contributor
Posts: 558
Joined: Fri Jan 22, 2021 4:36 pm
Has thanked: 73 times
Been thanked: 261 times

Re: Lets actually try Hybrid Emulation

Unread post by Bas »

What if you renice the Amiga CPU to a very high level of priority over other processes?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Even setting it to realtime and killing mister isn't enough.
chrt -f -p 99 $qemu_pid

I've asked the 68k qemu maintainer if he knows of any tricks to cut the latency. This will most likely come at a cost of absolute performance of course.

I wonder if we can patch out the raster busy loops from the OS.
Bas
Top Contributor
Posts: 558
Joined: Fri Jan 22, 2021 4:36 pm
Has thanked: 73 times
Been thanked: 261 times

Re: Lets actually try Hybrid Emulation

Unread post by Bas »

I'm no expert by any means but JIT combined with an FPGA seems to me like a recipe for subtle timing issues unless you can force the JIT into unfailing lockstep with the rest of the system. The way it looks to me now is that CPU and chipset have a different timing source, or at least they experience the passing of time differently from one another (for lack of a better expression).
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Mon Apr 19, 2021 3:54 pm I wonder if we can patch out the raster busy loops from the OS.
I wouldn't have thought so - the raster loop I'm aware of is all about detecting power supply ticks and genlocks (I bumped into that on MiST recently - under OS3.1.4 the system clock was running too fast... So now I know more about how the Amiga figures out its power supply frequency than I ever wanted to!)

But even if you can patch it out, there's enough Amiga software that relies upon the near real-time nature of the system that it suddenly becoming non-realtime is going to break many things.

As you say, fixing it is likely to incur a performance penalty - ideally it would be possible to pay the penalty only when running code that's on the other side of the HPS bridge - and keep the speed for stuff in Fast RAM. Is there any scope for making qemu do the JIT translation in much smaller chunks?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

These are timings in microseconds of:
checking interrupts+printf(this logging!), find block to execute, executing block, align clocks (no-op)

So when it comes out and finds a new block ~150us is typical.
One scanline takes 64us, so this is >2 scanlines.

1486,8,39,109
322,8,1179,5
380,8,38,4
372,7,948,5
601,8,41,3
372,8,1244,5
377,9,41,4
417,7,29071,5
379,7,40,3
381,9,233554,6
6515,7,167,13
158,8,38,4
160,10,216,5
161,8,40,3
138,8,52,4
134,7,786,6
145,8,38,4
133,7,728,5
141,8,38,4
136,7,800,5
146,7,39,3
141,8,767,5
147,7,58,4
136,7,5419,5
157,7,39,3
152,8,769,11
140,7,39,3
155,16,195844,6
6357,8,191,5
148,7,39,4
143,25,217,5
150,8,41,3
147,7,53,4
136,7,759,20
139,7,49,4
134,8,785,5
140,8,47,4
134,15,5590,5
143,7,48,4
126,7,751,5
139,7,47,4
135,35,780,5
813,17,40,3
155,7,664,4
160,7,38,4
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

foft wrote: Mon Apr 19, 2021 11:42 am It looks much better if I stop the mister process!
Main_Mister pins itself to core #1

Try using taskset to start the cpu emulator on core 0
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I removed the printf from the loop and its better than this...
7,4,744,6
8,4,38,4
7,5,768,5
8,5,46,4
8,5,5499,6
7,5,38,4
7,5,780,6
7,4,38,4
6,5,673,5
7,5,51,3
7,5,777,5
7,5,65,5
6,5,85330,5
7,5,6065,5
7,5,39,3
7,9,196,5
7,5,40,3
6,6,38,4
7,4,4448,5
7,5,38,4
7,5,5592,6
7,5,38,13
8,5,798,5
26,5,39,3
6,5,769,5
7,5,38,3
7,5,643,5
7,4,47,4
6,5,762,6
7,4,38,4
7,5,236494,7
7,6,169,6
7,5,52,4
7,8,216,5
8,4,40,4
7,5,50,4
6,5,788,6
16,5,38,4
7,4,649,5
7,4,37,4
7,4,757,5
8,4,47,4
7,5,5441,6
8,4,39,4
7,4,695,5
7,4,54,4
14,5,767,5
7,5,38,4

So normally finding a block takes ~20us, which is a more reasonable 1/3 of a scanline.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Regarding jit translation chunk size:
diff ../qemu_clean/qemu-6.0.0-rc3/include/tcg/tcg.h include/tcg/tcg.h
279c279
< #define TCG_MAX_INSNS 512
---
> #define TCG_MAX_INSNS 128
867c867
< return tcg_ctx->nb_ops >= 4000;
---
> return tcg_ctx->nb_ops >= 1000;

You can limit the number of m68k opcodes by modifying TCG_MAX_INSNS.
You can limit the number of ARM opcodes by playing with the value in tcg_op_buf_full().

Many thanks to the qemu 68k maintainer Laurent for this info

Oh and see here for the fetch, exec loop
accel/tcg/cpu-exec.c
Round about line 800
bbond007
Top Contributor
Posts: 521
Joined: Tue May 26, 2020 5:06 am
Has thanked: 86 times
Been thanked: 204 times

Re: Lets actually try Hybrid Emulation

Unread post by bbond007 »

zakk4223 wrote: Mon Apr 19, 2021 6:22 pm
foft wrote: Mon Apr 19, 2021 11:42 am It looks much better if I stop the mister process!
Main_Mister pins itself to core #1

Try using taskset to start the cpu emulator on core 0
I believe you want 'taskset 1'

Intuitively, 1 is CPU #0

That seems to work for me in the 68000.sh

I ran into that problem initially when starting the CPU from the menu. CPU was about 1/2 speed but disk IO was much slower :)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I wonder if we can get singlestep mode fast enough for the busy loops then dynamically switch to that when the raster pos is read
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

bbond007 wrote: Mon Apr 19, 2021 7:15 pm
zakk4223 wrote: Mon Apr 19, 2021 6:22 pm
foft wrote: Mon Apr 19, 2021 11:42 am It looks much better if I stop the mister process!
Main_Mister pins itself to core #1

Try using taskset to start the cpu emulator on core 0
I believe you want 'taskset 1'

Intuitively, 1 is CPU #0

That seems to work for me in the 68000.sh

I ran into that problem initially when starting the CPU from the menu. CPU was about 1/2 speed but disk IO was much slower :)
No, I don't believe that's the case. Taskset shows a cpu list of '0,1' for many tasks, and only '1' for Main_Mister. CPU core numbering starts from zero.
Main Mister sets the cpu affinity via CPU_SET(1, &set), and that means cpu 1 (it's just a bitmask with the 'cpu' value shifted over, so it starts at zero)

The comments in main.cpp say it is pinned to cpu #1 because cpu #0 is the hardware interrupt handler. I suspect you have a tough choice here: either compete with Main_Mister for CPU, or compete with interrupt handlers which are probably going to cause timing issues. You can see interrupt counts
via /proc/interrupts. CPU0 gets quite a bit of them.
bbond007
Top Contributor
Posts: 521
Joined: Tue May 26, 2020 5:06 am
Has thanked: 86 times
Been thanked: 204 times

Re: Lets actually try Hybrid Emulation

Unread post by bbond007 »

zakk4223 wrote: Mon Apr 19, 2021 7:51 pm
No, I don't believe that's the case. Taskset shows a cpu list of '0,1' for many tasks, and only '1' for Main_Mister. CPU core numbering starts from zero.
Main Mister sets the cpu affinity via CPU_SET(1, &set), and that means cpu 1 (it's just a bitmask with the 'cpu' value shifted over, so it starts at zero)

The comments in main.cpp say it is pinned to cpu #1 because cpu #0 is the hardware interrupt handler. I suspect you have a tough choice here: either compete with Main_Mister for CPU, or compete with interrupt handlers which are probably going to cause timing issues. You can see interrupt counts
via /proc/interrupts. CPU0 gets quite a bit of them.
Processor 0 is bit 0 and processor 1 is bit 1:

'taskset 1' would be CPU 0 (0001)
'taskset 2' would be CPU 1 (0010)
'taskset 3' would be CPU 1 & 2 (0011)
'taskset 4' would be CPU 3 (0100)
etc...

https://www.man7.org/linux/man-pages/ma ... set.1.html
zakk4223
Posts: 270
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 107 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

Oh right, single integer arguments to taskset are actually a mask, not a list. But the output is always a list. linux_ui.txt

I was running some tests to see if I could figure out the timing impact of running on the cpu handler core, but I suspect the test needs to be more complicated than I'm willing to write right now (probably need to do some i/o in the thread so you also have to be woken up on some interrupts). Running on the same core as Main_Mister is most certainly bad though.

edit: the cpu emulator needing to effectively be real-time complicates this, and isn't something I can test easily right now. It may be that you can do X operations in Y seconds consistently, but that's an average and there is a lot of room for jitter there. Especially if your requirements are hitting scanline timings. I guess you can sort of test this by killing Main_Mister and pinning your cpu emulator to one core and seeing if running on one or the other has any noticeable differences.
Post Reply