Falcon Doom

All about games on the Falcon, TT & clones

Moderators: Mug UK, moondog/.tSCc., [ProToS], lp, Moderator Team

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Sun Jan 20, 2013 3:12 pm

Output looks like this:

Code: Select all

> profile addresses
$01f77a :             pea       $1fe18(pc)                 0.00% (1, 12)
$01f77e :             move.w    #$26,-(sp)                 0.00% (1, 16)
$01f782 :             trap      #$e                        0.00% (1, 12)
[...]
$01fe18 :             move.w    $ffff8266.w,d0             0.00% (1, 8)
$01fe1c :             btst      #4,d0                      0.00% (1, 12)
$01fe20 :             beq       $1fe2c                     0.00% (1, 10)
[...]
$01fe2c :             moveq     #$f,d4                     0.00% (1, 12)
$01fe2e :             moveq     #$3f,d3                    0.00% (1, 4)
$01fe30 :             move.w    #$25,-(sp)                 0.00% (64, 760)
$01fe34 :             trap      #$e                        0.00% (64, 768)
$01fe36 :             addq.l    #2,sp                      0.00% (64, 1280)

Values in parenthesis are counts and cycles.

User avatar
calimero
Atari God
Atari God
Posts: 1941
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: Falcon Doom

Postby calimero » Sun Jan 20, 2013 3:52 pm

dml wrote:Small read/write interleaving optimizations around the CPU<->DSP handshaking for floor spans + some extra pixel unrolling + inserting a lighting cache between textures and pixel drawing is showing a 17% speed increase over BM 3.07 using the default 320x168 / truecolour display.

Wow! 17% is quite good! and pixel drawing is most time consuming part...

dml wrote:c2p/256c isn't looking promising so far. The best methods are just too slow and it seems the highly optimized versions are targeted at plain 68000 or 68060 (and are quite different). There is no magic c2p for 68030 and there may never be one fast enough for this. For now that route is a dead-end.

25% of CPU time for C2P... :( I was convinced that it is less but I was wrong!

so basically Amiga AGA demos had penalty of 25% on CPU while doing 3D texture mapped graphics. So Falcon should be better for 3D/texturing but I also see great demoes on AGA (but it is hard to compare since low-end spec. for Amiga 1200 3D demos is usually 030/50Mhz... 2.4x faster than Falcon). offtopic: what is most impresive 3D AGA demo for stock Amiga1200 (with fast ram)?
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 5:09 pm

Eero Tamminen wrote:Output looks like this:


Brilliant! I'll be using this next week for sure.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 5:14 pm

calimero wrote:Wow! 17% is quite good! and pixel drawing is most time consuming part...


...yes although there are some other heavy costs associated with floor/ceiling - and not specifically pixel drawing - which I'm trying to do something with.

calimero wrote:25% of CPU time for C2P... :( I was convinced that it is less but I was wrong!
so basically Amiga AGA demos had penalty of 25% on CPU while doing 3D texture mapped graphics. So Falcon should be better for 3D/texturing but I also see great demoes on AGA (but it is hard to compare since low-end spec. for Amiga 1200 3D demos is usually 030/50Mhz... 2.4x faster than Falcon). offtopic: what is most impresive 3D AGA demo for stock Amiga1200 (with fast ram)?


It's not quite that straightforward - c2p benefits from fastram and Amiga blitter can do half of the work while the CPU does the rest... and so on. And it depends on the image size and the number of colours/planes being converted.

For 030 alone, with just slow 16bit STRam and converting all 8 bitplanes for most of the screen - not ideal. A faster Falcon-specific version could be brewed but I can't imagine one fast enough that it would beat TC rendering as it is.

EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 831
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK
Contact:

Re: Falcon Doom

Postby EvilFranky » Sun Jan 20, 2013 5:43 pm

Do you need any testers for your new version? To run on a bog standard Falcon?

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 6:07 pm

EvilFranky wrote:Do you need any testers for your new version? To run on a bog standard Falcon?


It won't be possible to compare the current build against BM3.07 because profiling is always-on and the FPS display also shows the profiler info and all of that costs something. But some of the profiler overhead is shown as a %age in the display so the FPS can be worked back a bit and get reasonably close.

You can test the attached build to see if it runs at all, and let me know what the FPS counter says. A PAL/50hz RGB setup is preferred for comparison but I can compare against VGA as well so long as I know the Hz (helps work out videl bus load). Don't use any screen expanders like Videlity/Blowup in your TC mode as BM picks that up and you'll end up with a different res and confusing results.

Hit the '5' key to get profiling and fps - but don't hit any other keys or move the player (dont switch to fullscreen either just leave it as-is). It will *definitely* crash if you try to change the pixel size with the '3' key. It will also look very blue and messed up - that's expected.

Note: it might not run at all with real DSP/CPU timings so if you get stuck on a black screen on startup which persists for more than 10 seconds I'll rewind by one small step and retry.

[edit]

Oh yes - and the FPS timer isn't exactly 100% trusted since i had to fiddle the TimerC event counter to a prime number for profiling purposes and the FPS counter uses that. It compensates but I didn't give testing much attention - with any luck it will be right. I'll verify it properly later.
You do not have the required permissions to view the files attached to this post.
Last edited by dml on Sun Jan 20, 2013 6:56 pm, edited 1 time in total.

EvilFranky
Atari Super Hero
Atari Super Hero
Posts: 831
Joined: Thu Sep 11, 2003 10:49 pm
Location: UK
Contact:

Re: Falcon Doom

Postby EvilFranky » Sun Jan 20, 2013 6:11 pm

OK Doug, I'll get the bird out and hook it up to my plasma for RGB 8-)

Sent from my Nexus 4 using Tapatalk 2

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 8:34 pm

Eero Tamminen wrote:Output looks like this:[code]
> profile addresses
Values in parenthesis are counts and cycles.


I have one more, perhaps more difficult 'request', but I believe it would be very worthwhile ;)

that is... to count cache misses on instruction addresses (and ideally, on data addresses too but i-cache is more interesting most of the time). I realize this means the cache behaviour should be quite close to a real 68030 but it doesn't rely on timing accuracy so it seems like it might be practical without much margin for error.

Being able to view cache misses on the code would be fantastic for larger areas of code. We are normally blind to this information and it's difficult to test non-loop code offline except to measure the size - and that doesn't work well if the code has complex program flow (it's always doable but it gets impractically difficult on bigger programs).

So this would be a 'magical' feature for me I think.

(no pressure! ;-) )

f030
Atari User
Atari User
Posts: 41
Joined: Wed Dec 07, 2011 1:46 pm

Re: Falcon Doom

Postby f030 » Sun Jan 20, 2013 8:52 pm

does not work for me, says: Could not find & open WAD file

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 9:05 pm

f030 wrote:does not work for me, says: Could not find & open WAD file


my bad. the test build is probably looking for c:\n\doom1.wad

I'll need to prepare another one and repost

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 9:12 pm

...here are 3 to try with different things 'backed off' just in case. they should read the WAD you point them at now...

BMOPT1bcd.zip
You do not have the required permissions to view the files attached to this post.

f030
Atari User
Atari User
Posts: 41
Joined: Wed Dec 07, 2011 1:46 pm

Re: Falcon Doom

Postby f030 » Sun Jan 20, 2013 9:25 pm

[/quote]my bad. the test build is probably looking for c:\n\doom1.wad[/quote]

yes
rgb = 6.0156 fps

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Sun Jan 20, 2013 9:31 pm

f030 wrote:yes
rgb = 6.0156 fps


Interesting. I get 4.7fps in Hatari in RGB mode.

The 2nd zip has 3 more -

- the first (b) just has the WAD path fix so it will also be 6.01fps
- the second (c) turns off a sketchy optimization that isn't very important but could have stopped it workin (apparently it didn't)
- the third (d) sets up a nearly 100% d-cache hit scenario on the textures. if it's faster than (b) it means mipmaps are worth implementing for speed. otherwise they won't be much help except to reduce swimming pixels a bit in the distance

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Mon Jan 21, 2013 12:16 am

dml wrote:Being able to view cache misses on the code would be fantastic for larger areas of code.


On Linux both Valgrind and Oprofile provide this kind of information, so I think Hatari debugger should provide that information too, to not to have too large feature gap. :-)

I can add it to debugger only if CPU core supports it. However, Old UAE CPU core doesn't have any cache emulation, and I'm not sure about does WinUAE have something. I've asked about that on Hatari mailing list and will look a bit into code and get back within few days.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Mon Jan 21, 2013 9:01 am

Eero Tamminen wrote:I can add it to debugger only if CPU core supports it. However, Old UAE CPU core doesn't have any cache emulation, and I'm not sure about does WinUAE have something. I've asked about that on Hatari mailing list and will look a bit into code and get back within few days.


I think Laurent said the WinUAE core does have some kind of cache emulation but it may not be complete or accurate. I think it's a bit of an unknown.

However one of the nice things about cache analysis is that it can be done at exactly the same point as the address hit profiling/counting. So I think any core that supports address counting can be modified to estimate cache activity in a roughly accurate way - even if there is no support for cache control flags or timing differences or any of the other cache stuff since it would be a modified gathering exercise on the same addresses (e.g. the cache could be assumed to be 'always on' at a configuration level, ignoring CACR, and would still be quite useful!).

I haven't looked at the UAE projects for a very long time and I seem to remember it being quite dense and hard to follow (not sure if it was UAE but I had looked at some of the Amiga 68k emulation and one of them was heavily macro-ized, another used code generation). So it's possible that even a straightforward sounding 'add on' could be a real headache. I guess you or Laurent will know best what can work and what will be too difficult or impractical.

If it does look manageable, I will definitely be able to make use of it.

(I'm going to try building Hatari under Cygwin next so I can use the Hatari profiler and the Suite56 tools on the same system...)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Mon Jan 21, 2013 10:18 am

re: Hatari and BadMood - I'm impressed with Hatari's DSP emulation. It's described as 'experimental' and has some caveats in the docs, but it seems amazingly solid for something that's really specific to the Falcon.

I printed out the DSP sourcefile for BadMood without estimating the size - and ended up with a 38 page 'book'. Hatari runs that code just fine without any faults.

BM's DSP engine is highly synchronous and is launched only once, having to stay up for the duration of the program without a single glitch, or it will lock up. The module has around 10 sub-modules which get used in semi-random order and the execution pattern is very sensitive to what is being viewed at that moment (i.e. it's not a static, repetitive bit of code which just goes round and round forever as most DSP modules are). The DSP and CPU are talking frequently but briefly and much of the performance gain comes from keeping both sides busy with occasional exchanges.

So I find it pretty amazing this works at all on an emulator. :-o

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Mon Jan 21, 2013 10:50 am

dml wrote:re: Hatari and BadMood - I'm impressed with Hatari's DSP emulation.


The original hard work was done by Patrice Mandin, for Aranym. However, his DSP emulation was threaded and no real Falcon program worked with it, just some trivial clean DSP code. Thomas ported the Aranym DSP emulation code to Hatari. Laurent has done a huge amount of work on top of that, but I think the main breakthrough was when he ditched the threading to run DSP code in lockstep with the CPU. :-) I've helped by providing debugger facilities for the DSP side.

dml wrote:It's described as 'experimental' and has some caveats in the docs, but it seems amazingly solid for something that's really specific to the Falcon.


Where it's still described as experimental? At least the latest Hatari manual doesn't state that...

DSP is a "standard" one, just how it's interfaced to Falcon is Atari specific. But it's true that no other emulator has properly working emulation of it. :-)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Mon Jan 21, 2013 12:21 pm

...has anyone managed to get timings for build (b) versus builds (c) & (d) from this post, on a 'real' Falcon? Even VGA timings will be ok for relative comparisons.

viewtopic.php?f=26&t=6857&p=224238#p224209

build (d) exercises the 68030 data cache using 1-pixel textures and shows no speed change from (b) under Hatari. This has me wondering if Hatari is a 68020 core with 030 timings - no data cache. This would perhaps explain the speed differences between BM and Hatari and on a real Falcon? I think there were more Amigas with 020 chips than 030 for a long time so it could be the case.

My standard Falcon is now working and booting from CFLASH but my PC flashcard unit died recently and the USB floppy drive hates anything formatted by Atari and vice versa - so I still can't move files easily. Will have a new one by wednesday and will be able to do my own tests from then.

kristjanga
Captain Atari
Captain Atari
Posts: 400
Joined: Sat Jul 25, 2009 3:35 pm

Re: Falcon Doom

Postby kristjanga » Mon Jan 21, 2013 1:30 pm

working on getting a pc with a floppy to do tests of this!
keep up the grat job :cheers:

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Mon Jan 21, 2013 1:42 pm

kristjanga wrote:working on getting a pc with a floppy to do tests of this!


I seem to have very bad luck with hardware and any sort of file or data transferring. Doesn't matter what it is. It just never works for me :) I could carry a hard disk from one end of the room to the other and it would be 'formatted clean' when my feet stopped...

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Mon Jan 21, 2013 1:49 pm

...more results from last night's experiments...

The entire BSP-walk (that's the whole 'Doom' engine 3D scene traversal part - everything except final texture lookup and drawing) takes about 15% of the total time, including the time needed to generate all the span tables for the walls and floors.

Assuming for now that Hatari and Falcon timings are near enough equivalent by ratio...

At 6fps that works out at 166ms * 0.15 = 24ms. In other words, if you take out the pixel-centric part and let the Doom rendering engine run freely, it wants to run at about 40FPS on a stock Falcon!

So while there may be bags of optimizations which can be done in there (that's the majority of the code!), it's hardly worth the trouble IMO. It's already more than fast enough as it is.

Textured drawing is an obvious bottleneck which barely needs a mention, but there is a significant gap between the time taken for drawing & texturing, and the time taken by the drawing pass in total. I know that because tests with flat-shaded pixels yield about 10fps, and with no pixels drawn I'm seeing only 14fps, not 40!. So something between scene traversal and actual drawing is costing a lot.

Turning off drawing isn't a trustworthy test because 100% overlap between CPU and DSP can end up being only a small overlap and the CPU ends up waiting constantly for an otherwise 'free' bit of DSP work. But I can check for that, and the DSP is so fast it's unlikely to become the bottleneck in that bit of code.


That leaves the CPU<->DSP exchanges which feed the drawing. This remains in place for all the tests I did - it's currently the only way to flush the scene out of the DSP. While I haven't confirmed it yet by timing or omitting it specifically, it may be the second biggest cost in the system.

If that turns out to be the case and the number of DSP exchanges can be reduced/packed together for each span drawn it could make quite a big difference to the speed of the whole thing. More than any other optimization could.

User avatar
dma
Atari Super Hero
Atari Super Hero
Posts: 809
Joined: Wed Nov 20, 2002 11:22 pm
Location: France
Contact:

Re: Falcon Doom

Postby dma » Mon Jan 21, 2013 2:15 pm

Really cool to read about your detailed progress, favorite thread here at the moment. ;)

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Mon Jan 21, 2013 2:35 pm

dml wrote:My standard Falcon is now working and booting from CFLASH but my PC flashcard unit died recently and the USB floppy drive hates anything formatted by Atari and vice versa - so I still can't move files easily. Will have a new one by wednesday and will be able to do my own tests from then.


You can format the floppies on PC, Atari should understand them fine. Recent EmuTOS should even accept VFAT format without corrupting it, but to be on the safe side, it would be better to use plain FAT formatting. And you can mount the floppy directory as harddisk partition inside Hatari. You can do this either by symlinking the floppy mount point to something you already give to Hatari as GEMDOS HD directory, or you can mount it as separate partition:

Code: Select all

  mkdir hd
  cd hd/
  ln -s ~/dsp-stuff C
  ln -s /mnt/floppy D
  hatari .


Hatari will then mount "dsp-stuff" directory as partition C, and "floppy" directory as partition D.


dml wrote:That leaves the CPU<->DSP exchanges which feed the drawing. This remains in place for all the tests I did - it's currently the only way to flush the scene out of the DSP. While I haven't confirmed it yet by timing or omitting it specifically, it may be the second biggest cost in the system.


Do you need something extra on the debugger side to investigate this?

PS. I found in WinUAE the places where it supposedly checks for cache hits/misses. After I've tested hooking that info, and if it seems to works (doesn't report all instructions either to hit or miss :)), I can attach a minimal/ugly patch to get the profiling info.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3429
Joined: Sat Jun 30, 2012 9:33 am

Re: Falcon Doom

Postby dml » Mon Jan 21, 2013 2:56 pm

Eero Tamminen wrote:You can format the floppies on PC, Atari should understand them fine.


This seems to be a problem with the drives, as opposed to the format. Even using the 'Atari friendly' formatting procedure there are read errors on one side or the other, most of the time.

Eero Tamminen wrote:Do you need something extra on the debugger side to investigate this?


I'm not sure - I'll think about that one. It may just be something I have to study in different ways and by contriving more tests. It shouldn't be too difficult I think.

Eero Tamminen wrote:PS. I found in WinUAE the places where it supposedly checks for cache hits/misses. After I've tested hooking that info, and if it seems to works (doesn't report all instructions either to hit or miss :)), I can attach a minimal/ugly patch to get the profiling info.


That sounds promising! Does WinUAE support a data cache as well as instruction cache? Or is it mainly an '020 emulation core?

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1481
Joined: Sun Jul 31, 2011 1:11 pm

Re: Falcon Doom

Postby Eero Tamminen » Mon Jan 21, 2013 3:14 pm

dml wrote:
Eero Tamminen wrote:You can format the floppies on PC, Atari should understand them fine.


This seems to be a problem with the drives, as opposed to the format. Even using the 'Atari friendly' formatting procedure there are read errors on one side or the other, most of the time.


Ah, you meant the drive head alignment or something similar. That's indeed annoying.

Using zmodem over serial could work and might even be faster than using floppies.


dml wrote:That sounds promising! Does WinUAE support a data cache as well as instruction cache? Or is it mainly an '020 emulation core?


If you look to the end of the src/cpu/newcpu.c file, there are functions for both instruction and data cache, for 020, 030 and 040. And complaints about trickiness of implementing data cache for 030...

Note that these are enabled only when you've enabled cycle exact mode (which is default in latest Hatari WinUAE configuration).

Looking at the sources, I'm not completely sure this happens when you have MMU enabled, but I would assume the code *eventually* to fall into going through the prefetch stuff that does icache check. :-)


Social Media

     

Return to “Games”

Who is online

Users browsing this forum: No registered users and 2 guests