Lets actually try Hybrid Emulation

foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Is anyone here an arm assembly whiz?

http://www.64kib.com/qemu_slow_stuck_fragment.log

It logs IN: for the 68k code to translate, OUT: for the arm version then it logs which one its running. It also logs 68k addresses on entry.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So in theory qemu is executing these jit instructions...

However when I debug it with gdb, I don't seem to get code at these addresses. In case there is an offset (it maps it as both read/execute and read/write at two addresses) I thought I'd do this a few times on all the executing threads:
display/i $pc
stepi

Unfortunately I can't find the code its running! + the stack doesn't show properly (and yes I did try rebuilding qemu with -g and -O0). Ufff!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So, still no performance fix...

However at least the irqs are better! Core/kernel module for the irq fix:
http://www.64kib.com/minimig_irq_corev2.tar.gz

(no qemu change needed, so still v9 - http://www.64kib.com/qemu_system_testv9.tar.xz)
User avatar
Caldor
Top Contributor
Posts: 930
Joined: Sat Jul 25, 2020 11:20 am
Has thanked: 112 times
Been thanked: 111 times

Re: Lets actually try Hybrid Emulation

Unread post by Caldor »

Good to hear there is still progress :)
User avatar
LamerDeluxe
Top Contributor
Posts: 1160
Joined: Sun May 24, 2020 10:25 pm
Has thanked: 798 times
Been thanked: 257 times

Re: Lets actually try Hybrid Emulation

Unread post by LamerDeluxe »

Feels like it is getting really close to a performance breakthrough now. Really interesting topic to follow, thanks for the frequent updates!
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

The performance issue is related to qemu write handling somehow.

On every write it calls 'notdirty_write' (see access/tcg/cputlb.c) which does a glib tree lookup and a bunch of other messing around..

Which I can see if I set this trace event...
trace-event memory_notdirty* on

memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4
memory_notdirty_write_access 0x40025768 ram_addr 0x245768 size 4

Now I just need to work out why this happens and ... how to stop it!

Note that this 'mmu' stuff is used even without the 68040 mmu, its how qemu handles the memory in system mode I think.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It seems to take this path rather a lot... All the time?

/* Handle anything that isn't just a straight memory access. */
if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So... this seems to be happen if the stack shares an mmu page with code. I refer here to the qemu mmu not the emulated 68k mmu.

When starting my tests from newcli this is the case and presumably other times.

Are there any amiga programs to force the stack location? They might be worth a try.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Tue May 18, 2021 6:18 pm So... this seems to be happen if the stack shares an mmu page with code. I refer here to the qemu mmu not the emulated 68k mmu.

When starting my tests from newcli this is the case and presumably other times.

Are there any amiga programs to force the stack location? They might be worth a try.
Globally, not that I'm aware of - but here's an example of how, as a programmer, you can change the stack of your own task:
http://blackfiveservices.co.uk/amiga-c/ ... cktest.lha

If you change the MEMF_ANY to MEMF_REVERSE the new stack will be allocated from the opposite end of the free memory pool.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Perhaps we can patch this, to add an extra 8k to each stack then grow downwards from there. So we never share code and stack.
http://aminet.net/package/util/boot/StackAttack2
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Tue May 18, 2021 6:45 pm Perhaps we can patch this, to add an extra 8k to each stack then grow downwards from there. So we never share code and stack.
http://aminet.net/package/util/boot/StackAttack2
Even unmodified it makes the stack significantly larger if there's loads of available memory - so it might already help. If you have more than 128 meg free the stack will be 128k - so that alone should be enough to make sure there's no page clash, yes?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Since it grows down any code right after the stack allocation will clash, however large it is?

Still I might install it then put an 8k variable on the stack at the start of the dhrystones program.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

Or just allocate a new stack in a completely different part of RAM?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I just installed stackatack2 'as-is'. With my simple loop it worked every time (of about 5-6 tries) now. I tried (real) dhrystones and get about 55000. This isn't quite the 400000 still so I wonder if something else is going on there, anyway its much much better than I got before.

Anyway this seems worth improving. Though of course the issue remains for other programs with data near code, so if its possible to cut the overhead in qemu that'd be good. I guess the same tlb is often hit so caching the last might save a tree lookup for instance.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

What is the memory map for rtg? I’d like to get that working properly with this.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

foft wrote: Tue May 18, 2021 9:05 pm What is the memory map for rtg? I’d like to get that working properly with this.
Is it just in Z3 fast ram? So if I fix the caching and start using the 'correct' DDR3 area again rather than malloc'ed ram will RTG work again?
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Wed May 19, 2021 7:21 am Is it just in Z3 fast ram? So if I fix the caching and start using the 'correct' DDR3 area again rather than malloc'ed ram will RTG work again?
Without actually checking, I think so, yes.
(MiSTer's RTG was based loosely on the rather hackish and area-constrained RTG solution I put together for TC64 and MiST. On those platforms the RTG region is certainly just an allocated chunk of Fast RAM - I believe, though I haven't checked, that it's the same on MiSTer.)
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Right, so to get that working again need to work out how to enable the caching.

in uboot we pass these kernel options
mem=511M memmap=513M$511M
i.e. use 511M and restart 513M.

When I started I was mmaping the 384MB fast from the 513MB region. Even though I didn't use O_SYNC on the open it seemed uncached.

Pass we can pass mem=1024M and tell the kernel to reserve this some other way.

Alternatively I guess I need to find the kernel api to set cache properties on a page. Then I can set up caching for the chip ram region too.

Also I guess qemu does something sensible to flush the pages when the host cpu cache settings are changed, though have no idea.

Some light background reading: https://elinux.org/Tims_Notes_on_ARM_memory_allocation
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Remembered there was a solution for this on gp2x.

mmuhack kernel module here (by squidge, modified by notaz)
https://notaz.gp2x.de/dev.php

Perhaps this can be adapted.
User avatar
Grabulosaure
Core Developer
Posts: 78
Joined: Sun May 24, 2020 7:41 pm
Location: Mesozoic
Has thanked: 3 times
Been thanked: 92 times
Contact:

Re: Lets actually try Hybrid Emulation

Unread post by Grabulosaure »

RTG on MiSTer is just some DDRAM area directly fetched by the scaler.
The framebuffer address is the rtg_base[31:0] signal = 0x2700 0000

But, it is mapped in the 68K memory map @ 0x200 0000, in an unused memory area outside ZIII space.
(RTG doesn't need to allocate any fastram memory)

For this hybrid version, mapping RTG into ZIII memory could be simpler though.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

The control regs are in that region too?
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Wed May 19, 2021 1:14 pm The control regs are in that region too?
No, they're at 0xb80100 (Following the not-yet-implemented Akiko CD control regs. MiST and TC64 have the ChunkyToPlanar reg but not the rest of Akiko as yet.)
ByteMavericks
Posts: 53
Joined: Tue Oct 27, 2020 4:52 pm
Has thanked: 69 times
Been thanked: 11 times

Re: Lets actually try Hybrid Emulation

Unread post by ByteMavericks »

Robinsonb5, I should pick this up separately, but is it possible to port the blitter (etc) from fpgaarcade for performance?
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

ByteMavericks wrote: Wed May 19, 2021 8:37 pm Robinsonb5, I should pick this up separately, but is it possible to port the blitter (etc) from fpgaarcade for performance?
Probably - though the current implementation doesn't resemble the fpgaarcade on at all. I just used their driver as a skeleton for mine, and then mine was adapted for MiSTer.
It might also make more sense to subcontract blitter duty to the ARM, since it has more direct access to the DDR than the FPGA does?
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So I took a look at the mmuhack. It was armv6 based and the cortex v9 is arm7.

I found this, which dumps the armv7 page tables. (I had to add it to the kernel module since it needs supervisor mode)
https://github.com/yifanlu/ARMv7_MMU_Dumper

Anyway I ended up using the kernel api. Its easy to add mmap to fops (for debugfs need to use debugfs_create_file_unsafe or it ignores it) then in there can use something this simple:
static int minimig_mmap_cached(struct file *filp, struct vm_area_struct * vma)
{
vma->vm_page_prot = pgprot_cached(vma->vm_page_prot);
printk("mmap of %lx into %lx, with cached\n",vma->vm_pgoff << PAGE_SHIFT,vma->vm_start);

return io_remap_pfn_range(vma,
vma->vm_start,
vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot);
}


Similarly for pgprot_writecombine, pgprot_uncached, pgprog_dmacoherent. I don't really know what they all map to on the armv7 hardware. Also for some reason pgprot_cached was not defined...

Anyway it works, I can map the DDR ram reserved for Z3 fastram and it runs at the same speed as using malloc.

I also tried a couple more things:
i) cached chipram mapping: Led to corrupted screen. I wonder is qemu does anything to host caching with CACR etc.
ii) mapped 16MB from 0x27000000 (phys) to 0x2000000 (amiga mem space) for rtg. When I select rtg modes I lose monitor sync. However, I double checked my rtg setup with the original core and in that case I just get a black screen (but keep sync). So I have a setup problem AND another problem I think. I had used the adf from here (viewtopic.php?p=12186#p12186).

I'm also wondering why disk access speed is 50% that with TG68. I'm postponing look at that until I figure out how to do proper chipram caching.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

qemu seems to have some cache flushing logic in the core. However I don't see any cache action taken on cacr changes or CINV,CPUSH. I guess the latter aren't used much on the Amiga anyway since its only 68040+.
ByteMavericks
Posts: 53
Joined: Tue Oct 27, 2020 4:52 pm
Has thanked: 69 times
Been thanked: 11 times

Re: Lets actually try Hybrid Emulation

Unread post by ByteMavericks »

I’ve automated downloading and installing the extensions for rtg (and networking): https://github.com/ByteMavericks/MinimigMiSTer
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

Remember that the CPU isn't the only thing that writes to chip RAM - you're going to need some kind of bus snooping if you want to cache chip RAM.
foft
Posts: 334
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

robinsonb5 wrote: Fri May 21, 2021 8:23 pm Remember that the CPU isn't the only thing that writes to chip RAM - you're going to need some kind of bus snooping if you want to cache chip RAM.
Wouldn’t this be the case on the original hardware? So the cpu is unaware that chip ram has changed and the software has to handle it?

I guess there is a cache inhibit signal for hardware regs.
robinsonb5
Posts: 129
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 57 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Fri May 21, 2021 9:12 pm Wouldn’t this be the case on the original hardware? So the cpu is unaware that chip ram has changed and the software has to handle it?

I guess there is a cache inhibit signal for hardware regs.
There is - and I believe the system uses a combination of function code and address to disallow the data cache (if present - 68030+ only, of course) for chip RAM and the hardware regs.
I'm not sure of the exact mechanism; I do know that bus snooping was necessary in order to enable full caching on Chip RAM for the turbo mode on MiST's Minimig.
Post Reply