Lets actually try Hybrid Emulation

robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Fri May 14, 2021 9:14 pmBut 0xe0 isn't rom is it? ... checks memory map, hmmm perhaps I should put that first 512k here too.
Not usually, but on systems with 1 Meg ROMs it can be. (The CD32's extended ROM lives there, for instance.)

Anyhow, if you're copying the ROM to Fast then it's not that.
Is the dhrystone_m68k prebuilt, or are you compling it? If the latter, which compiler?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

The compiler is different.

Latest m68k gcc for Debian and this one for Amiga:
https://github.com/AmigaPorts/m68k-amigaos-gcc
I’ll have a look at the code they produce.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Although I know it’s not just executed code since I tested the same binary with musashi and qemu. Musashi won massively!
bbond007
Top Contributor
Posts: 519
Joined: Tue May 26, 2020 5:06 am
Has thanked: 85 times
Been thanked: 198 times

Re: Lets actually try Hybrid Emulation

Unread post by bbond007 »

foft wrote: Sat May 15, 2021 9:37 am The compiler is different.

Latest m68k gcc for Debian and this one for Amiga:
https://github.com/AmigaPorts/m68k-amigaos-gcc
I’ll have a look at the code they produce.
Would it be worthwhile trying to compile with SAS/c?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Here is the updated qemu that changes the irq handling, making it more stable:
http://www.64kib.com/qemu_system_testv9.tar.xz

As a reminder here is the kernel module that you need to setup on boot:
http://www.64kib.com/minimig_irq_core.tar.gz

Hopefully v10 will be stable and faster...
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So I run this simple code:
00000000 <_start>:
0: 4e56 fffc linkw %fp,#-4
4: 42ae fffc clrl %fp@(-4)
8: 6004 bras e <_start+0xe>
a: 52ae fffc addql #1,%fp@(-4)
e: 0cae 00ff ffff cmpil #16777215,%fp@(-4)
14: fffc
16: 66f2 bnes a <_start+0xa>
18: 4e71 nop
1a: 4e71 nop
1c: 4e5e unlk %fp
1e: 4e75 rts

In C:
for (int i=0;i!=0xffffff;++i)
{
if ((i&0xffff)==0)
{
}
}

On entry the regs are this:

D0 = ffffffff A0 = 8006afa8 F0 = 7fff ffffffffffffffff ( nan)
D1 = 8006afa8 A1 = 00000000 F1 = 7fff ffffffffffffffff ( nan)
D2 = 8006afc9 A2 = 00000000 F2 = 7fff ffffffffffffffff ( nan)
D3 = 00000000 A3 = 00000000 F3 = 7fff ffffffffffffffff ( nan)
D4 = 00000000 A4 = 00000000 F4 = 7fff ffffffffffffffff ( nan)
D5 = 00000000 A5 = 00000000 F5 = 7fff ffffffffffffffff ( nan)
D6 = 00000000 A6 = 40800c5c F6 = 7fff ffffffffffffffff ( nan)
D7 = 00000000 A7 = 40800c44 F7 = 7fff ffffffffffffffff ( nan)

& the code is placed a 0x401ae964 (i.e. malloc-ed ram, no fpga involved)
Note that A6, aka stack pointer, is also in fast ram.

...
I have a binary that loads the same code to a malloc'ed array then executes it:
User mode: 0.6 seconds
my 'Mister' machine (from newcli): 5 mins, 15 seconds
official 'Virtual 68k' machine: 1.3 seconds

The only reasons I can think for that are:
i) The qemu machine has some throttling settings and I need to tell it to go flat out?
ii) It accesses the chip memory or io area all the time, despite the code not telling it to. MMU tables, or something like that?
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

What happens if you surround the test program with a Disable() / Enable() pair?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

robinsonb5 wrote: Sun May 16, 2021 5:47 pm What happens if you surround the test program with a Disable() / Enable() pair?
That didn't seem to change it.

Though, something interesting. I ran it several times and it went at full speed sometimes! (To be clear that was with the unchanged build where I didn't add Disable/Enable)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I thought I should at least try building qemu with 'enable-profiler' to see if it tells me anything!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It normally seems to say something like this:
(qemu) info profile
async time 814716981 (0.815)
qemu time 754223562 (0.754)

When I run the test app I see this:
(qemu) info profile
async time 34714891552 (34.715)
qemu time 0 (0.000)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

After some red herrings with icount etc... It seems to be a single chain of translation blocks. Which is what I'd expect. So I guess one of them accesses the hardware area, otherwise I really don't understand.

Time to signaltap...
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

& I see no accesses to the hardware/chipram!

Checking the irq area too, just in case.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I do see IRQ avalon bus accesses, which is unexpected. After Disable they should be off - and indeed I do not see the ipl lines changing in signaltap.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Not sure why, trying to register them in the rtl. Perhaps glitches?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So, found out a few more things...
i) The irq implementation was still incorrect
ii) The slow code is running completely locally in fast ram, no irqs and and hps-fpga bridge.

The problem with the irqs as an off-by-one error and not understanding edge triggered irqs properly.

I thought 'edge triggered' meant that on any edge I'd get an irq. So had just wired up the irq lines directly, thinking whenever they changed I'd get an irq. So I've changed this to an xor on old/new irq flags, or'ed together to give a single irq on any change.

The off-by one error was a mistake in the .dts file. I'm actually pretty shocked it worked at all and passed all the diagrom tests like this. Anyway fixed it now.

For the 'slow loop' code I now know its all in one tb (translation block) chain. I have the 68k code and the arm code logged. When its running nothing further is logged since its all in the (previously logged) dynamically compiled arm code. While it was running I had signal tap up to check for irqs and any hps avalon slave access - no access, no irqs (since I call Disable/Enable now). So, next step is ... trying to run this block of arm machine code to figure out why it doesn't work.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Is anyone here an arm assembly whiz?

http://www.64kib.com/qemu_slow_stuck_fragment.log

It logs IN: for the 68k code to translate, OUT: for the arm version then it logs which one its running. It also logs 68k addresses on entry.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So in theory qemu is executing these jit instructions...

However when I debug it with gdb, I don't seem to get code at these addresses. In case there is an offset (it maps it as both read/execute and read/write at two addresses) I thought I'd do this a few times on all the executing threads:
display/i $pc
stepi

Unfortunately I can't find the code its running! + the stack doesn't show properly (and yes I did try rebuilding qemu with -g and -O0). Ufff!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So, still no performance fix...

However at least the irqs are better! Core/kernel module for the irq fix:
http://www.64kib.com/minimig_irq_corev2.tar.gz

(no qemu change needed, so still v9 - http://www.64kib.com/qemu_system_testv9.tar.xz)
User avatar
Caldor
Top Contributor
Posts: 930
Joined: Sat Jul 25, 2020 11:20 am
Has thanked: 112 times
Been thanked: 111 times

Re: Lets actually try Hybrid Emulation

Unread post by Caldor »

Good to hear there is still progress :)
User avatar
LamerDeluxe
Top Contributor
Posts: 1160
Joined: Sun May 24, 2020 10:25 pm
Has thanked: 798 times
Been thanked: 257 times

Re: Lets actually try Hybrid Emulation

Unread post by LamerDeluxe »

Feels like it is getting really close to a performance breakthrough now. Really interesting topic to follow, thanks for the frequent updates!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

The performance issue is related to qemu write handling somehow.

On every write it calls 'notdirty_write' (see access/tcg/cputlb.c) which does a glib tree lookup and a bunch of other messing around..

Which I can see if I set this trace event...
trace-event memory_notdirty* on

memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4

Now I just need to work out why this happens and ... how to stop it!

Note that this 'mmu' stuff is used even without the 68040 mmu, its how qemu handles the memory in system mode I think.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It seems to take this path rather a lot... All the time?

/* Handle anything that isn't just a straight memory access. */
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So... this seems to be happen if the stack shares an mmu page with code. I refer here to the qemu mmu not the emulated 68k mmu.

When starting my tests from newcli this is the case and presumably other times.

Are there any amiga programs to force the stack location? They might be worth a try.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Tue May 18, 2021 6:18 pm So... this seems to be happen if the stack shares an mmu page with code. I refer here to the qemu mmu not the emulated 68k mmu.

When starting my tests from newcli this is the case and presumably other times.

Are there any amiga programs to force the stack location? They might be worth a try.
Globally, not that I'm aware of - but here's an example of how, as a programmer, you can change the stack of your own task:
http://blackfiveservices.co.uk/amiga-c/ ... cktest.lha

If you change the MEMF_ANY to MEMF_REVERSE the new stack will be allocated from the opposite end of the free memory pool.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Perhaps we can patch this, to add an extra 8k to each stack then grow downwards from there. So we never share code and stack.
http://aminet.net/package/util/boot/StackAttack2
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Tue May 18, 2021 6:45 pm Perhaps we can patch this, to add an extra 8k to each stack then grow downwards from there. So we never share code and stack.
http://aminet.net/package/util/boot/StackAttack2
Even unmodified it makes the stack significantly larger if there's loads of available memory - so it might already help. If you have more than 128 meg free the stack will be 128k - so that alone should be enough to make sure there's no page clash, yes?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Since it grows down any code right after the stack allocation will clash, however large it is?

Still I might install it then put an 8k variable on the stack at the start of the dhrystones program.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

Or just allocate a new stack in a completely different part of RAM?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I just installed stackatack2 'as-is'. With my simple loop it worked every time (of about 5-6 tries) now. I tried (real) dhrystones and get about 55000. This isn't quite the 400000 still so I wonder if something else is going on there, anyway its much much better than I got before.

Anyway this seems worth improving. Though of course the issue remains for other programs with data near code, so if its possible to cut the overhead in qemu that'd be good. I guess the same tlb is often hit so caching the last might save a tree lookup for instance.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

What is the memory map for rtg? I’d like to get that working properly with this.
Post Reply