SuperVidel performance boost for byte/word/long access

News, Support and Development discussions relating to SuperVidel.

Moderators: Mug UK, moondog/.tSCc., [ProToS], lp, instream, Moderator Team, Nature

mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

SuperVidel performance boost for byte/word/long access

Postby mikro » Thu Jan 30, 2014 7:35 pm

Hi guys,

just FYI, with some hacking of the PMMU I managed to get much better performance from the SuperVidel. Read this first: http://dhs.nu/bbs-ct60/index.php?request=13274

My original numbers (running at 66 MHz):

Fastram -> ST-ram (c2p): 37 FPS
Fastram -> SV-ram (c2p): 43 FPS
Fastram -> SV-ram (move16): 55 FPS
SV-ram direct render: 41 FPS

This clearly indicates that the move16 approach is the fastest. Not anymore! After setting SV-RAM area as "cache-inhibited, imprecise", I'm getting:

Fastram -> ST-ram (c2p): 37 FPS (no change to ST-RAM)
Fastram -> SV-ram (c2p): 50 FPS (much faster as it's long access)
Fastram -> SV-ram (move16): 55 FPS (no change since move16 ignores cache)
SV-ram direct render: 62 FPS (woooohooooooo, byte copy unchained, thanks to the FSB buffer)

So what? So trash your copy code, only thing needed is to write directly to SV-RAM, this is how it was supposed to be.

This has been verified also with http://sparemint.org/cgi-bin/cvsweb/fre ... s/memspeed, the performance boost is the same.

P.S. I've sent the "patch" to Nature, don't worry. Ask Peter when he is going to release it ;)

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Thu Jan 30, 2014 8:22 pm

Thx darling
Ain't no space like PeP-space.

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Fri Jan 31, 2014 7:57 am

Will this require additional cache management in the VDI? (caches vs hw acceleration)
Ain't no space like PeP-space.

mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby mikro » Fri Jan 31, 2014 10:48 am

It doesn't require anything else. It just "limits" the amount of usable memory to 1 GB, then it perhaps crashes on an PMMU error because there are no PMMU entries for addresses between 0x40000000 and 0x7FFFFFFF (and these are no longer translated as copy back, they are illegal from now). So the correct solution would be of course to map the SV-RAM area (0xA0000000 - 0xA7FFFFFF) in the PMMU tree but it's too much work for too little (zero) effect. It's software, can be changed later ;)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: SuperVidel performance boost for byte/word/long access

Postby dml » Fri Jan 31, 2014 12:26 pm

Very cool Miro!

Some time back I defined a cookie to provide something like this for display memory on the Afterburner040 - the cookie provides a bunch of function pointers which let you mark areas of memory noncacheable / nonserialized etc. The main reason was for marking display memory as 'write, buffer and forget' for the BadMood v3.07 viewer :-)

However, I'm not recommending anyone does the same here (!). It's just nice to learn that the 060+SV behaves 'as it should' when the memory is configured properly, and the all the move16 timing results stuff was not indicative of weird and special problems with SV ram...

BTW in this scenario, providing there is 'enough work' between each pixel write - the write itself can end up being 100% free. There's no way for move16 to beat that ;-)

[EDIT]

Rather pointless trivia now, but here's some code from the 'XMMU' cookie reader and display page marker from BM307:

Code: Select all

*=======================================================*
*   68040 extensions: updated 12/06/97      *
*=======================================================*

page_size      =   8192

*-------------------------------------------------------*
initialise_pmmu:
*-------------------------------------------------------*
   move.l      #-1,bss_handle
   move.l      #-1,display_handle
   ifd      use_xmmu
   move.l      #'XMMU',d0
   bsr      cookie_search
   tst.l      d0
   bmi.s      .npmu
   move.l      4(a0),a0
   move.l      pmmu_read_pds(a0),pmmu_read_rout
   move.l      pmmu_write_pds(a0),pmmu_write_rout
   move.l      pmmu_chg_2_cb(a0),pmmu_cbc_rout
   move.l      pmmu_chg_2_ns(a0),pmmu_noc_rout
   bsr      mark_bss
   bsr      mark_display
   endc
.npmu:   rts

*-------------------------------------------------------*
mark_bss:
*-------------------------------------------------------*
*   Allocate space for BSS pages         *
*-------------------------------------------------------*
   lea      bss_start,a0
   lea      bss_all_end,a1
   moveq      #13,d2
   move.l      a0,d0
   move.l      a1,d1
   add.l      #page_size-1,d1
   lsr.l      d2,d0
   lsr.l      d2,d1
   sub.l      d0,d1
   lsl.l      #2,d1
   move.l      d1,d0
   movem.l      a0-a1,-(sp)
   moveq      #VRAM_preferred,d1
   bsr      allocate_chunk
   tst.l      d0
   ble      err_super_xmmu
   movem.l      (sp)+,a0-a1
   tst.l      d0
   ble.s      .nbss
   pushall
   move.l      d0,a0
   move.l      d1,d0
   jsr      turbo_memclr
   popall
   move.l      d0,bss_handle
   move.l      d0,a2
   jsr      ([pmmu_read_rout.l])
   jsr      ([pmmu_cbc_rout.l])
.nbss:   rts

*-------------------------------------------------------*
mark_display:
*-------------------------------------------------------*
*   Allocate space for display pages      *
*-------------------------------------------------------*
   move.l      display_start,a0
   move.l      display_size,d0
   lea      (a0,d0.l),a1
   moveq      #13,d2
   move.l      a0,d0
   move.l      a1,d1
   add.l      #page_size-1,d1
   lsr.l      d2,d0
   lsr.l      d2,d1
   sub.l      d0,d1
   lsl.l      #2,d1
   move.l      d1,d0
   movem.l      a0-a1/d1,-(sp)
   moveq      #VRAM_preferred,d1
   bsr      allocate_chunk
   tst.l      d0
   ble      err_super_xmmu
   movem.l      (sp)+,a0-a1/d1
   tst.l      d0
   ble.s      .ndsp
   pushall
   move.l      d0,a0
   move.l      d1,d0
   jsr      turbo_memclr
   popall
   move.l      d0,display_handle
   move.l      d0,a2
   jsr      ([pmmu_read_rout.l])
   jsr      ([pmmu_noc_rout.l])
.ndsp:   rts

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Fri Jan 31, 2014 12:49 pm

mikro wrote:It doesn't require anything else. It just "limits" the amount of usable memory to 1 GB, then it perhaps crashes on an PMMU error because there are no PMMU entries for addresses between 0x40000000 and 0x7FFFFFFF (and these are no longer translated as copy back, they are illegal from now). So the correct solution would be of course to map the SV-RAM area (0xA0000000 - 0xA7FFFFFF) in the PMMU tree but it's too much work for too little (zero) effect. It's software, can be changed later ;)


Awesome. I'll make this optional for now in case it clashes with some other hardware (CTPCI?).
Ain't no space like PeP-space.

instream
Nature
Nature
Posts: 165
Joined: Mon Aug 03, 2009 9:08 am
Location: Göteborg, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby instream » Fri Jan 31, 2014 3:04 pm

shoggoth wrote:
mikro wrote:It doesn't require anything else. It just "limits" the amount of usable memory to 1 GB, then it perhaps crashes on an PMMU error because there are no PMMU entries for addresses between 0x40000000 and 0x7FFFFFFF (and these are no longer translated as copy back, they are illegal from now). So the correct solution would be of course to map the SV-RAM area (0xA0000000 - 0xA7FFFFFF) in the PMMU tree but it's too much work for too little (zero) effect. It's software, can be changed later ;)


Awesome. I'll make this optional for now in case it clashes with some other hardware (CTPCI?).

I will test Mikro's patch as soon as possible to see wether actual line bursts of 4 longs are written or still 4 separate longword writes are performed like when we attempted this patch a year ago. But this is awesome news and we should add it to the driver as soon as possible! :)

User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2049
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby calimero » Fri Jan 31, 2014 4:55 pm

so Quake now will be even faster? :D
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby mikro » Fri Jan 31, 2014 7:46 pm

calimero wrote:so Quake now will be even faster? :D

Don't get your hopes too high, buddy. The bottleneck in Quake isn't surprisingly the c2p process but Quake itself. Absolute difference between SV-featured rendering and not rendering at all is 0.5 FPS (!). So from C2P to this "new" method you get about 2-3 FPS at most. Not so cool, is it.

User avatar
calimero
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2049
Joined: Thu Sep 15, 2005 10:01 am
Location: STara Pazova, Serbia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby calimero » Fri Jan 31, 2014 7:58 pm

well 2-3fps is not so bad. if you have 15fps, two more fps is almost 15% speed up ;)
using Atari since 1986.http://wet.atari.orghttp://milan.kovac.cc/atari/software/ ・ Atari Falcon030/CT63/SV ・ Atari STe ・ Atari Mega4/MegaFile30/SM124 ・ Amiga 1200/PPC ・ Amiga 500 ・ C64 ・ ZX Spectrum ・ RPi ・ MagiC! ・ MiNT 1.18 ・ OS X

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Fri Jan 31, 2014 8:04 pm

mikro wrote:
calimero wrote:so Quake now will be even faster? :D

Don't get your hopes too high, buddy. The bottleneck in Quake isn't surprisingly the c2p process but Quake itself. Absolute difference between SV-featured rendering and not rendering at all is 0.5 FPS (!). So from C2P to this "new" method you get about 2-3 FPS at most. Not so cool, is it.


What about higher screen rez? :)
Ain't no space like PeP-space.


mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby mikro » Fri Jan 31, 2014 8:15 pm

dml wrote:
shoggoth wrote:higher screen rez? :)


This man speaks wisdom :)

Hmm, that's not a bad idea, I absolutely haven't thought of that. It will definitely slow down things a little but maybe not that much, esp. on faster (>66 MHz) Falcons. I'm going to give it a try, shouldn't be too complicated.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: SuperVidel performance boost for byte/word/long access

Postby dml » Fri Jan 31, 2014 8:18 pm

IIRC there's a magic constant in Quake's surface renderer which would let you configure the perspective divide to 16 pixels vs 8 pixels, something that might help with higher res...

(i don't doubt the 060 divide speed - but this also changes the ratio of outer to inner loop work, which may affect things)

instream
Nature
Nature
Posts: 165
Joined: Mon Aug 03, 2009 9:08 am
Location: Göteborg, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby instream » Fri Jan 31, 2014 8:54 pm

Now we have tested Mikro's program with our oscilloscope, with and without the PMMU changes. Unfortunately byte/word/long writes are not grouped into move16 lines as we hoped but still written as bytes/words/longs. The speed gain comes from that the five idle cycles between writes have disappeared. This matches our earlier tests. So byte writes to SV gain the most, from 7.5MB/s to 12.8MB/s on my falcon (95MHz). Not bad. :) Words went from14.2MB/s to 22.5MB/s, and longs from 25MB/s to 36MB/s. All these numbers with 060 cache on.

mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby mikro » Fri Jan 31, 2014 9:18 pm

instream wrote:Now we have tested Mikro's program with our oscilloscope, with and without the PMMU changes. Unfortunately byte/word/long writes are not grouped into move16 lines as we hoped but still written as bytes/words/longs. The speed gain comes from that the five idle cycles between writes have disappeared.

But how come the move16 tests report the same bandwidth all the time, with or without the change to the PMMU? If I understood our discussion right, you claim that those move16 bursts do happen without my change.

EDIT: Ah, now I get it. You have hoped that bytes and words would be grouped into a long access at least. Anyway, I don't care why it's faster, it's faster and that's important :D

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Fri Jan 31, 2014 9:33 pm

Lots of VDI operations require byte or word accesses, so this will improve the performance for the VDI quite a bit.
Ain't no space like PeP-space.

mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby mikro » Fri Jan 31, 2014 10:13 pm

mikro wrote:It will definitely slow down things a little but maybe not that much

AHEM. Although it has been an amazing look to see all the stuff in hires, FPS in 640x480 dropped to maybe 5 or 6, hard to say because my favorite demo2 has frozen when jumping into the water :) Therefore - not playable.

User avatar
Eero Tamminen
Atari God
Atari God
Posts: 1547
Joined: Sun Jul 31, 2011 1:11 pm

Re: SuperVidel performance boost for byte/word/long access

Postby Eero Tamminen » Fri Jan 31, 2014 11:36 pm

mikro wrote:The bottleneck in Quake isn't surprisingly the c2p process but Quake itself. Absolute difference between SV-featured rendering and not rendering at all is 0.5 FPS (!).


Not so surprising considering Doom's rendering vs. thinking cost discussed in BadMood thread. Maybe time to profile Quake with Hatari?

Note: Douglas' Quake1 port has some scripting for that. Problem is that Hatari doesn't yet support fast-RAM and Quake barely works in 14MB (crashes pretty soon after demo starts when it runs out of memory). Using TOS calls directly instead of linking MiNTlib stuff for that could help a bit. I can attach that Hatari scripting here if there's interest.

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Sat Feb 01, 2014 9:46 am

mikro wrote:
mikro wrote:It will definitely slow down things a little but maybe not that much

AHEM. Although it has been an amazing look to see all the stuff in hires, FPS in 640x480 dropped to maybe 5 or 6, hard to say because my favorite demo2 has frozen when jumping into the water :) Therefore - not playable.


Thats 4x the original resolution. Perhaps something in between would be better, like 400x300 or something? (on the SV, set 800x300 but enable double line and vertflag. Note however that since this is a SV-specific resolution, you'll have to use the SV physbase pointers instead of the VIDEL one in your tripple buffering code).
Ain't no space like PeP-space.

instream
Nature
Nature
Posts: 165
Joined: Mon Aug 03, 2009 9:08 am
Location: Göteborg, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby instream » Sat Feb 01, 2014 4:04 pm

mikro wrote:
instream wrote:Now we have tested Mikro's program with our oscilloscope, with and without the PMMU changes. Unfortunately byte/word/long writes are not grouped into move16 lines as we hoped but still written as bytes/words/longs. The speed gain comes from that the five idle cycles between writes have disappeared.

But how come the move16 tests report the same bandwidth all the time, with or without the change to the PMMU? If I understood our discussion right, you claim that those move16 bursts do happen without my change.

EDIT: Ah, now I get it. You have hoped that bytes and words would be grouped into a long access at least. Anyway, I don't care why it's faster, it's faster and that's important :D

Yes, we hoped that the store buffer would act as some kind of cache line, and when it is filled completely, wether by writing bytes, words or longs, a single move16 line would be written on the bus. With that algorithm the 12.8MB/s above for the byte test would show the same as the move16 test. I think that was 42MB/s on my falcon.
But as we read the 060 manual yesterday, it became apparent that the move16 bursts are used only for copy-back writes and the move16 instruction. :(

But 70% increase on byte writes is still something to celebrate. :D Pep, get to work! :lol:

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Fri Feb 07, 2014 7:12 pm

I made some VDI benchmark tests with existing drivers + patched PMMU, and it's looking really good.

In 16bpp modes, I 110-190% performance depending on which graphics primitive we're dealing with :) 190% is for graphical text, which means text editors etc will get a really nice performance improvement! This stuff will be included in the next rev of the SV_XBIOS.PRG.
Ain't no space like PeP-space.

instream
Nature
Nature
Posts: 165
Joined: Mon Aug 03, 2009 9:08 am
Location: Göteborg, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby instream » Fri Feb 07, 2014 10:40 pm

Yay! :cheers:

User avatar
shoggoth
Nature
Nature
Posts: 853
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby shoggoth » Tue Feb 18, 2014 9:00 pm

As mentioned in some other thread - this features has been implemented and will be included in the next driver revision. It needs some testing first though. Good news - ~30% VDI performance improvement, and some drawing primitives (notably graphical text rendering) are 90-160% faster (depending on colour depth).

Another thing that will improve performance a bit is the ability to switch of an unused display output. This saves a significant amount of bandwidth, which matters a lot in higher resolutions & colour depths.
Ain't no space like PeP-space.

mikro
Atari God
Atari God
Posts: 1288
Joined: Sat Sep 10, 2005 11:11 am
Location: Brisbane, Queensland, Australia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Postby mikro » Sun Jul 16, 2017 1:03 pm

Just making sure - had this been really implemented? If so, what is the optional parameter to turn it on and off?


Social Media

     

Return to “SuperVidel”

Who is online

Users browsing this forum: No registered users and 1 guest