Possible to improve Falcon blitter sprites with NFSR?

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Zamuel_a
Atari God
Atari God
Posts: 1223
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Possible to improve Falcon blitter sprites with NFSR?

Postby Zamuel_a » Thu Feb 26, 2015 11:18 pm

I implemented the blitter sprite technic Anima came up with that was disscussed some time ago and it works, but to speed it up more, I had liked to use the NFSR function, but I can't get it to work. I'm not sure if I do something wrong (most likely :) ) Or maybe Hatari isn't emulationg this correctly. I haven't tried it on a real machine yet so I can't tell for sure.

Here is the code for a normal, working 32x20 pixel wide sprite:

I'm even using A7 in the sprite loop so that there are only registers inside, which I guess is best for the cache.

Code: Select all

   move.w   #3,XCOUNT(a4)      ;2 + 1 words each bitplane (line)
   move.w   #16,SRCXINC(a4)      ;16 bytes between words (8bpl)
   move.w   #-32+2,SRCYINC(a4)   ;go to next bitplane (line)
   move.w   #16,DSTXINC(a4)      ;16 bytes between words (8bpl)
   move.w   #-32+2,DSTYINC(a4)   ;go to next bitplane (line)
   move.b   #2,HOP(a4)   
   move.b   #3,BLITTER+OP      ;SOURCE DIRECT for sprite data

;d2 = sprite skew value
;a0 = destination address

   lea   BLITTER+LINENUM,a4   ;start blit (HOG mode)
   lea   BLITTER+DSTADDR,a5   ;destination address
   lea   BLITTER+YCOUNT,a6   ;lines
   lea   BLITTER+ENDMSK1,a2

   ;;;;or.b   #%01000000,d2      ;NFSR      

   move.b   #%11000000,d0      ;start blit (HOG mode)
   move.b   d2,BLITTER+SKEW      ;skew value

   move.l   a0,d3

   lea   BLITTER+SRCADDR,a0   ;source address
   lea   BLITTER+ENDMSK1,a2   ;endmask 1

   moveq.w   #8,d1         ;num lines = bitplanes
   move.l   #32,d7         ;offset to next line, source

   move.l   (a3)+,d5      ;get left and right mask

   move.w   sr,-(sp)      ;save status register
   move.w   #$2700,sr      ;turn off interrupts
   move.l   a7,old_a7      ;save stack pointer

   move.l   #320,a7         ;offset to next line dest

   move.w   #20-1,d4
sprite_loop
   move.w   d5,d6
   swap   d6
   clr.w   d6
   lsr.l   d2,d5
   lsr.l   d2,d6
   move.l   d5,(a2)         ;set left and middle mask
   move.w   d6,4(a2)      ;set right mask
   move.l   a1,(a0)         ;source address
   move.l   d3,(a5)         ;dest address
   move.w   d1,(a6)         ;lines
   move.b   d0,(a4)         ;start blit (HOG mode)
   move.l   (a3)+,d5      ;get mask value during blit
   add.l   a7,d3         ;next dest plane
   add.l   d7,a1         ;next source plane
   dbra   d4,sprite_loop

   move.l   old_a7(pc),a7      ;restore stack pointer
   move.w   (sp)+,sr      ;turn on interrupts


If I enable the NFSR bit, I also must change this line:

Code: Select all

   move.w   #-32+2,SRCYINC(a4)   ;go to next bitplane (line)


This moves the address back to the start of the next plane, which is 32-2 bytes back on a 32 pixel wide sprite. If NFSR skip the last word I had guessed I only need to remove 2 from this value, but the output is just a mess so there must be something else.
I have used the NFSR bit on STE without any problems and the only thing I must do there is to jump over 2 bytes in the SRCYINC register.
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe

User avatar
Cyprian
Atari God
Atari God
Posts: 1464
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Cyprian » Fri Feb 27, 2015 1:44 pm

in my sprite test routine, when I set NFSR I have also to increase "Destination Y Increment" by 6.
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Aranym / Steem / Saint
http://260ste.appspot.com/

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 661
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Anima » Fri Feb 27, 2015 3:20 pm

Setting NFSR depends on the skew parameter. If skew is zero you have to clear NFSR else set it. In other words your 32 pixels wide sprite in memory uses two words each line and plane. However, when the sprite is being drawn on the screen it may use three words each line and plane depending on the X position (and therefore the skew parameter). While the blitter now has to write three words each plane there are still only two words in your original data. To avoid reading the third ("final") word you need to set NFSR ("No Final Source Read"). Note that you also have to increase the X-Count by one and lower the Y-Increment by 16 (as an example for 8 planes).

Here's my latest version of the fast blit function:

Code: Select all

blit_fast2:
    lea     0x8a28.w,a0
    move.l  work_screen,a1
    lea     sprite_mask,a2
    lea     0x8a2c.w,a3
    lea     0x8a32.w,a4
    lea     0x8a38.w,a5
    lea     0x8a3c.w,a6

    moveq   #2,d3 | X count.
    moveq   #8,d6 | Y count.
    moveq   #-16+2,d4 | Destination Y Increment.
    move.b  #0xc0,d5

    moveq   #0xf,d2
    and     d0,d2
    jeq     1f

    or.b    #0b01000000,d2 | FXSR, NFSR, Skew.
    addq    #1,d3 | X count.
    sub     #16,d4 | Destination Y Increment.
1:
    move.b  d2,0x8a3d.w | FXSR, NFSR, Skew.
    move    d3,0x8a36.w | X Count.
    move    d4,0x8a30.w | Destination Y Increment.
    move.l  #sprite_reordered,0x8a24.w | Source Address.

    and     #0xfff0,d0
    add     d0,a1
    ext.l   d1
    lsl.l   #8,d1
    add.l   d1,d1
    add.l   d1,a1

    moveq   #0xf,d0
    and     d2,d0
    move.l  (a2)+,d1

    clr     d3

    move.l  #512,d4

    moveq   #32-1,d7
1:
    cmp.l   d3,d1
    jeq     3f

    move.l  d1,d3

    move    d1,d2
    swap    d2
    clr     d2
    lsr.l   d0,d1
    lsr.l   d0,d2

    tst     d0
    jne     2f

    move    d1,d2
2:
    move.l  d1,(a0) | Endmask 1 + 2.
    move    d2,(a3) | Endmask 3.
3:
    move.l  a1,(a4) | Destination Address.
    move    d6,(a5) | Y Count.
    move.b  d5,(a6) | Busy, HOG, Smudge, Line Number.
    move.l  (a2)+,d1

    add.l   d4,a1

    dbf     d7,1b

    rts


User avatar
Frank B
Atari Super Hero
Atari Super Hero
Posts: 959
Joined: Wed Jan 04, 2006 1:28 am
Location: Boston

Postby Frank B » Fri Feb 27, 2015 4:10 pm

I used it with skew set. I will see if I can dig out the sources. It saved me one word fetched per line on the source data.
The source data for my sprite was two words wide for 32 pixels.

Zamuel_a
Atari God
Atari God
Posts: 1223
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Zamuel_a » Fri Feb 27, 2015 8:21 pm

Setting NFSR depends on the skew parameter. If skew is zero you have to clear NFSR else set it. In other words your 32 pixels wide sprite in memory uses two words each line and plane. However, when the sprite is being drawn on the screen it may use three words each line and plane depending on the X position (and therefore the skew parameter). While the blitter now has to write three words each plane there are still only two words in your original data. To avoid reading the third ("final") word you need to set NFSR ("No Final Source Read"). Note that you also have to increase the X-Count by one and lower the Y-Increment by 16 (as an example for 8 planes).


I haven't made a special case for when skew = 0, so I always set X-Count to 3 and the other registers to what they should have. Seems like no point in treat something that happens 1/16 time special. This works. No problem, but when I add the NFSR bit and tries to change the amount of bytes in the source y inc, I can't get anything useful on the screen. It's just a mess, whatever I tried. Seems like Hatari isn't supporting this or something.

I can't see your source Y inc somethere in the code? That is the one that should be changed when NFSR is set. Atleast when I do it on the STE
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 661
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Anima » Fri Feb 27, 2015 10:18 pm

Zamuel_a wrote:I can't see your source Y inc somethere in the code? That is the one that should be changed when NFSR is set. Atleast when I do it on the STE

Please note that when you use NFSR you have to change "destination Y increment" and "X count" as well.

In my example the sprite data has been reordered. Each consecutive line of sprite data lies right behind the previous so that "source Y increment" has a fixed value for the whole drawing process and even the source address only needs to be set once per sprite.

Edit: "source Y increment" removed.
Last edited by Anima on Sat Feb 28, 2015 11:42 am, edited 1 time in total.

Zamuel_a
Atari God
Atari God
Posts: 1223
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Zamuel_a » Sat Feb 28, 2015 8:43 am

Please note that when you use NFSR you have to change "source Y increment", "destination Y increment" and "X count".


But I have these values set to what you have in the code when I'm NOT using NFSR.

For a 32 pixel wide sprite (stored as normal interleaved bitplane data in 8 planes) with NO NFSR bit set. I have:
XCOUNT = 3
SRCXINC = 16
SRCYINC = -32+2
DSTXINC = 16
DSTYINC = -32+2

This works fine. But if I enable the NFSR bit I had guessed that I need to changed the SRCYINC, but whatever I change it to, it doesn't work. On STE I just had to add 2 to this value since it is an extra word to jump over, but that doesn't work here.

I guess your code works since you have rearanged the source data, which seems very smart to do so the source address doesn't need to be taken care off in the loop.
So you store the data with complete lines for each bitplane instead of interleave it like normal? So for a 32 pixel wide sprite you have 32 bits in a row with plane 1, 32 bit for plane 2 and so on. So that SRCXINC = 2 and SRCYINC = 0? This is how I stored the data for STE sprites and when I used NFSR I had to change SRCYINC from 0 to 2.

EDIT:
I tried to rearrange the data so that each line for each bitplane is after each other and set SRCXINC to 2, SRCYINC to 0 and it worked! So that saved some cpu time. I also tried to enable NFSR and changed the SRCYINC to 2 instead and it worked to! So now everything seems to work and being faster :D

Why do you take extra care about the case when the sprite has skew = 0? Seems like a lot extra time spent for something that happenes just 1/16 time.
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 661
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Anima » Sat Feb 28, 2015 12:28 pm

Zamuel_a wrote:This works fine. But if I enable the NFSR bit I had guessed that I need to changed the SRCYINC, but whatever I change it to, it doesn't work. On STE I just had to add 2 to this value since it is an extra word to jump over, but that doesn't work here.

Well, I was wrong saying "source Y increment" needs to be changed when NFSR is being used. The source bitmap has always the same width.

Zamuel_a wrote:Why do you take extra care about the case when the sprite has skew = 0? Seems like a lot extra time spent for something that happenes just 1/16 time.

There's no real reason to do it and you can simply ignore that. However, it's not that much extra CPU time for each sprite and it can speed up things when the sprite width is less than 32 pixels.

Zamuel_a
Atari God
Atari God
Posts: 1223
Joined: Wed Dec 19, 2007 8:36 pm
Location: Sweden

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Zamuel_a » Sun Mar 08, 2015 6:04 pm

Have you found a way to use this technique for sprites bigger than 32 pixels wide? Since there is only 3 endmasks and they are used for a 32 pixel sprite, I guess it's not so easy to use this for anything bigger, unless in some situations there they sprite only need a mask for the first and last 16 pixels of course.
ST / STFM / STE / Mega STE / Falcon / TT030 / Portfolio / 2600 / 7800 / Jaguar / 600xl / 130xe

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 661
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby Anima » Mon Mar 09, 2015 9:29 am

Zamuel_a wrote:Have you found a way to use this technique for sprites bigger than 32 pixels wide? Since there is only 3 endmasks and they are used for a 32 pixel sprite, I guess it's not so easy to use this for anything bigger, unless in some situations there they sprite only need a mask for the first and last 16 pixels of course.

Unfortunately this solution works only for sprites which are up to 32 pixels wide. However, it would work for wider sprites when the Endmask2 bit pattern can be repeated within the sprite "body" like circle-, rectangle- and polygon-shaped sprites.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Possible to improve Falcon blitter sprites with NFSR?

Postby dml » Mon Mar 09, 2015 11:15 am

Zamuel_a wrote:Why do you take extra care about the case when the sprite has skew = 0? Seems like a lot extra time spent for something that happenes just 1/16 time.


It's only 1/16th of the time best case. Worst case it can be 100% of the time - it depends on what you are using the routine for and it's behaviour onscreen.

It's better to design routines to be aware of these things, and 'optimize it out' as a last step when you're sure you don't need it for a specific situation.

Note: Quite often I'll write such code in a meta form, as an include or macro, and use settings to determine how the routine gets expanded (with or without certain features). So you can expand the same routine 5 different ways for 5 different specific cases, but only one source routine which can deal with all the cases in one place. In this case you could expand a version where scroll=0 is ignored, and in another case retain it. And in another case assume scroll=0 always (e.g. for masked icons which don't move). etc. etc..

This sort of expansion doesn't *always* allow for the most optimized code but it does improve development speed and reliability. However the difference is usually very small if anything - and if it matters you can always go back and make a hand-expanded version by editing the meta version. This is usually easy and does no harm, because it isn't interfering with the other cases.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 5 guests