ZXR 0.1c (updated)

Latest news in the Atari world

Moderators: Mug UK, Silver Surfer, Moderator Team

User avatar
shoggoth
Nature
Nature
Posts: 953
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: ZXR 0.1c (updated)

Postby shoggoth » Mon Dec 10, 2018 11:15 am

Eero Tamminen wrote:Ok. Looking at the (attached) profile assembly, "xx_im0" (which takes more cycles than "cmd_xx") is actually latter part of the "cmd_xx" subroutine, so looking at the whole indeed makes sense. :-)

Btw. Have you looked at the callgraphs yet?


Yeah! Finally. I've been a bit too busy with other stuff. I'm not suprised how it looks though :) A CPU emulator is bound to generate an extremely yucky callgraph, but then again there are some nice clues in there. I'll definitely start using these tools myself now, it's pretty awesome.

Do e.g. calls to "cmd_xx" look sensible, or does it get called more through some route than expected?


cmd_xx() handles a range of Z80 opcodes and basically carries on to different handlers using a jump table. So it's expected to be called very often. I may be able to shave off a couple of cycles there (but not much). To be sure, one probably need to test several speccy binaries when profiling. People wrote everything in assembly language, and all coders have their darlings.

Also, "loop8" causes half of the instruction cache misses in the whole profile. Would it be possible to split that large loop into smaller loops that would fit within i-cache?


Quite possibly! This is the screen conversion code. I have no good way of detecting Z80 writes to screen memory without slowing down the core, hence I need to batch-convert as much as I can - or a minimum amount - depending on CPU load. I can probably make it fit into the I-cache, and I should probably disable the D-cache while doing this (because it's using a huge lookup table to convert ZX graphics -> ST bitplane format).

If anyone has ideas for an efficient way to convert speccy screen memory to Atari ditto, I'm all ears.

Note that profiler can be used to validate optimizations too. Just set breakpoints before and after the part you want to profile. Once code hits the latter breakpoint, debugger shows how many instructions and cycles were spent between the debugger invocations.

And if things actually took longer than before optimizations, you have the profile & annotated disassembly ("profile save profile.txt") where you can check in details what went wrong... Often I just check the instruction count stats in the disassembly to see typical flow within the code.


I really need to learn this. I've written a PC/~286 emulator too, and optimisations are badly needed (obviously).
Ain't no space like PeP-space.

User avatar
shoggoth
Nature
Nature
Posts: 953
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: ZXR 0.1c (updated)

Postby shoggoth » Mon Dec 10, 2018 11:18 am

Eero Tamminen wrote:Tested few 48k TAP files from World of Spectrum with Hatari TT/EmuTOS combo.

While others worked fine, Cybernoid came up just with a black screen: http://www.worldofspectrum.org/infoseek ... id=0001196


The CPU core is optimised for speed rather than compatibility. Some fast loaders and copy protections fail. Also, my TAP implementation only cover the ROM loader (it's trapped and faked). Once it encounters a non-ROM loader, it should switch to reading/decoding "audio". I don't do that atm.

So... if the TAP fail - try a snapshot. If that fails - try another one. If that fails, switch to another game :)
Ain't no space like PeP-space.

User avatar
shoggoth
Nature
Nature
Posts: 953
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: ZXR 0.1c (updated)

Postby shoggoth » Mon Dec 10, 2018 11:21 am

sashapont wrote:Can you port it for FireBee?


With a little work it could be made to run on the Firebee, but it won't be native ColdFire code (hence it will be sub optimal).

It would probably make more sense to use another emulator in such case. Or, perhaps I should add an additional CPU core written in C... That could actually make sense. I'll have a think about it, but I can't promise anything since I'm on a tight time budget...

Another problem on the Firebee is that graphics etc is inherently broken in FireTOS. It's simply not compatible with anything besides another Firebee with FireTOS. When I use Falcon XBIOS Video calls, which are supposed to be there, I get a black screen and the machine hangs. I guess the situation on EmuTOS is a lot better though.
Ain't no space like PeP-space.

vido
Atari Super Hero
Atari Super Hero
Posts: 635
Joined: Mon Jan 31, 2011 7:39 pm

Re: ZXR 0.1c (updated)

Postby vido » Mon Dec 10, 2018 11:42 am

shoggoth wrote:
sashapont wrote:Can you port it for FireBee?


With a little work it could be made to run on the Firebee, but it won't be native ColdFire code (hence it will be sub optimal).

It would probably make more sense to use another emulator in such case. Or, perhaps I should add an additional CPU core written in C... That could actually make sense. I'll have a think about it, but I can't promise anything since I'm on a tight time budget...

Another problem on the Firebee is that graphics etc is inherently broken in FireTOS. It's simply not compatible with anything besides another Firebee with FireTOS. When I use Falcon XBIOS Video calls, which are supposed to be there, I get a black screen and the machine hangs. I guess the situation on EmuTOS is a lot better though.

I am another one who would like to see also FireBee version.
But I think most reasonable would be to run amulator in a GEM window. Like SDL ports.
Another CPU core written in C makes a sense. But also the suboptimal code could be ststed. If it is usable on the Falcon then I believe it should be fast enough also on the FireBee. But still it depends how it is written = how much commands have to be emulated.

User avatar
shoggoth
Nature
Nature
Posts: 953
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: ZXR 0.1c (updated)

Postby shoggoth » Mon Dec 10, 2018 12:20 pm

vido wrote:I am another one who would like to see also FireBee version.
But I think most reasonable would be to run amulator in a GEM window. Like SDL ports.
Another CPU core written in C makes a sense. But also the suboptimal code could be ststed. If it is usable on the Falcon then I believe it should be fast enough also on the FireBee. But still it depends how it is written = how much commands have to be emulated.


The initial goal for me was to make it run "good enough" on Falcons/TT, since there are better emulators which could be used on high end machines (I ported x128 many years ago; runs in a window or fullscreen, but likely fails on the Coldfire).

Since there seems to be some interest, I might add an additional CPU core which could run natively on the Coldfire. That probably means 128k support too (but only for high end machines).
Ain't no space like PeP-space.

vido
Atari Super Hero
Atari Super Hero
Posts: 635
Joined: Mon Jan 31, 2011 7:39 pm

Re: ZXR 0.1c (updated)

Postby vido » Mon Dec 10, 2018 1:47 pm

shoggoth wrote:The initial goal for me was to make it run "good enough" on Falcons/TT, since there are better emulators which could be used on high end machines (I ported x128 many years ago; runs in a window or fullscreen, but likely fails on the Coldfire).

Since there seems to be some interest, I might add an additional CPU core which could run natively on the Coldfire. That probably means 128k support too (but only for high end machines).

I was searching for your old version of x128 to try it but didnt find it. :(
Sure there is interest! Having it run on the FireBee would be really great! :)

penguin
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 145
Joined: Tue Dec 24, 2013 10:43 am

Re: ZXR 0.1c (updated)

Postby penguin » Mon Dec 10, 2018 10:10 pm

vido wrote:
shoggoth wrote:The initial goal for me was to make it run "good enough" on Falcons/TT, since there are better emulators which could be used on high end machines (I ported x128 many years ago; runs in a window or fullscreen, but likely fails on the Coldfire).

Since there seems to be some interest, I might add an additional CPU core which could run natively on the Coldfire. That probably means 128k support too (but only for high end machines).

I was searching for your old version of x128 to try it but didnt find it. :(
Sure there is interest! Having it run on the FireBee would be really great! :)


I thought I had it, but I only recovered Frodo and v2600. Not sure where the other ported emulators (x128, SMS Plus) are...
AtariUpToDate - Atari ST/TT/Falcon software database and version tracker: http://www.atariuptodate.de
st-computer magazine - http://st-computer.atariuptodate.de/

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1864
Joined: Sun Jul 31, 2011 1:11 pm

Re: ZXR 0.1c (updated)

Postby Eero Tamminen » Tue Dec 11, 2018 11:34 pm

shoggoth wrote:
Eero Tamminen wrote:Btw. Have you looked at the callgraphs yet?


Yeah! Finally. I've been a bit too busy with other stuff. I'm not suprised how it looks though :) A CPU emulator is bound to generate an extremely yucky callgraph, but then again there are some nice clues in there. I'll definitely start using these tools myself now, it's pretty awesome.


Just remember that all costs in callgraphs are based on symbols and what is executed between successive symbols. If you're missing symbols for functions, or other subroutines, profiler output can be misleading.

If you label loops in assembly, it's better to filter those out before giving the symbols address list to Hatari, unless you're specifically interested about cost of the code following the loop label. Call sites visited in a tight loop can slow down profiling a lot, and huge symbol address visit counts make percentages for other call sites seem less significant/misleading.

(It's not possible for profiler to detect from instruction flow where functions start, that's why Hatari profiler uses symbol addresses as call sites to track. That way user also has full control over it.)

shoggoth wrote:
And if things actually took longer than before optimizations, you have the profile & annotated disassembly ("profile save profile.txt") where you can check in details what went wrong... Often I just check the instruction count stats in the disassembly to see typical flow within the code.


I really need to learn this. I've written a PC/~286 emulator too, and optimisations are badly needed (obviously).


IMHO the overall assembly level code flow, e.g. how much each of the branches are used, or which parts of the code get run, is *much* easier to see from the profiler assembly output (instruction count annotations), than from a CPU instruction trace. For starters, it's in the same (memory/address) order as one's own code, not in excution order. :-)

(Memory addresses are disassembled when the profiler data is saved -> CPU trace can be useful if self-modifying code is used, and those code modification happen while the code is being profiled.)

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1864
Joined: Sun Jul 31, 2011 1:11 pm

Re: ZXR 0.1c (updated)

Postby Eero Tamminen » Tue Dec 11, 2018 11:43 pm

Few other notes about profling:

* If code does tricks where it modifies subroutine return address directly from stack, to get RTS/RTE to return somewhere else than to the caller, that confuses call hierarchy tracking. E.g. EmuTOS uses this (and Hatari profiler special-cases that), but I don't know how often it's used elsewhere.

* If you're working with a large code base where you want to find a what is causing e.g. call to disk read; just set breakpoint there, and enable profiling. When breakpoint is hit, profiler gives you backtrace of symbol call-chain leading to the breakpoint (added initially for tracking down causes of BadMood game play freezes :-))


Social Media

     

Return to “News & Announcements”

Who is online

Users browsing this forum: No registered users and 4 guests