Beats of Rage coding technic (the hardcoded sprites)

All 680x0 related coding posts in this section please.

Moderators: exxos, simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 254
Joined: Mon Jan 05, 2009 5:41 pm

Beats of Rage coding technic (the hardcoded sprites)

Postby LaurentS » Sun Jan 13, 2013 9:25 pm

Hi all,

As some people asked, I'll try to explain how I did the coding for beats of rage.

All the init sequence is from DHS's system routs (I've adapted their code for my own purpose).

I run the game in 320x240 True color (this works for both VGA and RGB screens).

I also use a virtual screen around the physical screen (128 pixels left, 128 pixels right, 20 pixels top 20 pixels bottom).
This takes more place in memory (for the 2 screens), but allows no clipping during the game (that's important for the hardcoded sprites).
I also activate the instruction cache of the 68030.


The main game code:
-------------------

- I don't clear the screen (I refresh all the background every time)
- All the background is filled with dedicated movem.l functions
- all the movem.l loops must feet in the instruction cache

Example :

; Draw a picture of 100 pixels
; bg_height contains the height of the picture
; bg_offset contains the size of the picture
draw_picture_100: movea.l screen_adr,a5
add.l bg_y_pos,a5

.loop: movem.l (a6)+,d0-d7/a0-a4
movem.l d0-d7/a0-a4,(a5)
movem.l (a6)+,d0-d7/a0-a4
movem.l d0-d7/a0-a4,52(a5)
movem.l (a6)+,d0-d7/a0-a4
movem.l d0-d7/a0-a4,52*2(a5)
movem.l (a6)+,d0-d7/a0-a2
movem.l d0-d7/a0-a2,52*3(a5)

lea.l _SCREEN_WIDTH(a5),a5
add.l bg_offset,a6
subq.w #1,bg_height
bpl.s .loop
rts

I've got the same function for draw_picture_120, draw_picture_128 and draw_picture_200
All the display is in long and not in word (it's faster for the 68030 to use 1 long instead of 2 words)



The engine :
------------

- in the engine, eveything is optimized to run only the needed code when necessary.
By example: - the recomputing of the score digits to display is done only when the score changes and not every frame
- I change the move animation of a player only when its animation is finished, ...

- I avoid as much tests as possible



The sprites technic :
---------------------

I've spent 5 months on this part. ;)

The main idea is to hardcode the sprites for 2 reasons :
- gain the maximum of speed
- gain the maximum of memory

An ennemy is composed of nearly 80 different sprites (sometimes more) that can be up to 140*110 in size.
All sprite has a left and a right picture according to it's direction (~160 images).
This fills quickly the 4 megs of the Falcon.

So, looking at the 68030 cycles table, the fastest instruction to display 2 pixels is : move.l d0,(a6)+
where d0-a5 contains the value of the 2 pixels and (a6) is the screen position

I use one move.l as it is faster than 2 move.w
Of course, if I need to display only one pixel, I use a move.w

For the transparence, I use a add.w immdiate,a6 which takes into account the transparency to the "end of the line" +the virtual screen size to the next line + the position of the 1st pixel to display after the transparency area of the next line.

The problem here is that my sprites are using more than 15 colors.

So, I've written a C program under linux that does the following :

- keep in memory the value of min_cycles
- count all the different colors (in long mode, ie #$f79cf79c is considered as one color and #$f79cbdd4 as another one).
- sort them from the most frequent to the least frequent

The most important part is here : I compute the mathematic variance of all the possible groups of the same color to get the groups of colors that are used closer to each other :
- loop var from 600 to 8000
loop for each color
loop from max occurence of this color to 2
compute the variance of this group
if the computed variance is lower than var, keep it in a table

When the table is filled with variances, I dispatch the packets of similar colors in the registers, I compute the number of cycles (move.w and move.l) used.
(no need to compute the add cycles as they're constant in all the iterations)

Then I compare this value with the value of min_cycles and if lower, I keep the current variance value used for this computing.
Then, I loop everything for a new variance.

At the end of the algo, I recompute the whole sprite with the lower variance (ie lower number of cycles, which is also the lower number of octets by the same time), and I transform it to assembly (I finish with a RTS).

I just have to compile the code for this sprite.

In the game, I just have to give the correct address (working screen + sprite position) to a6 and call BSR (addr_sprite) which is really easy to use.


An example to clarify a little my explanation :
----------

Let's say the sprite looks like this :

abc
abc de ff fg
abbb ee fghi
abbbb ee aeeebd

a, b, c ... are .w colors (ie 1 pixel in the falcon true color mode)
a can be any value (example $114d)
b can be any value (example $774f)
...

the first pixel to display is a, then b, then c, then there's only transparency to the next line.
On the second line, the first pixel to display is again a, then b, then c, then, there are three transparent pixels, then display d, then e, then again two transperent pixels, ...
At the end of the third line, I display pixels f,g,h,i, then there's some transparency, then the end of this line, then again some transparency at the beginning of the fourth line, then I display abbbb ...

The algo :
First, I parse the image to compute the adds (for transparency), the word and long "colors" and I create a table with the long colors

color 1 is : ab
color 2 is : de (ab at beginning of the line is already the second occurence of the color 1)
color 3 is : ff
color 4 is : fg
color 5 is : bb (ab at beginning of the line is already the third occurence of the color 1)
color 6 is : ee
...

The final code will be something like :
move.l color_1,(a6)+
move.w "c",(a6)+
add.w #offset_to_next_line - current_position + transparent area offset in the next line,a6
move.l color_1,(a6)+
move.w "c",(a6)+
addq.w #6,a6 ;(3 transparent pixels)
move.l color_2,(a6)+
addq.w #4,a6
move.l color_3,(a6)+
...
...
rts


At this stade, the sprite is hardcoded and the transparency is taken into account with the add instruction.
You can notice there's no condition code here (neither for clipping nor transparency).
Only MOVE for a pixel, ADD for transparency or jump to next line.
The code does exactly what he's supposed to do without unuseful instruction.


Of course, if you need left and right sprites, you have to hardcode them both.
(There's no way to display the left sprite from the hardcoded right sprite).

The code of the sprite currently looks like:

(real example)
move.l #$f58e72f2,(a6)+
move.l #$72f2526c,(a6)+
move.l #$294672f2,(a6)+
move.l #$526c41a6,(a6)+
addq.w #$0006,a6
move.l #$526841a6,d4
move.l #$41a60001,d0
add.w #$0454,a6
move.l #$41a641a6,(a6)+
...


The goal is now to fill the maximum of colors into the registers D0-D7 and A0-A5

I first took the 15 more frequently used colors and moved them into the registers.
This allow to cover 35-40% of the pixels.
But I thought I could do much better :)


Imagine that the color "aa" is used like this in a sprite:

aaaaaaaaaaaababaccbaddeeffgghhiiiiiiijjkkllmmnnooppqqrrssttuuvvwwxxyyzzaa

Why keeping the color "aa" in a register after the beginning (it only serves at the last position on this line)
After the first using of the color "aa" in a register, we could use the same register for the "ba" and then the "ii" color.

Using as much as possible the registers optimise a lot the speed of the sprite's displaying and the size it takes in memory.


So, that's why I compute the mathematic Variance of the color in the whole sprite.

In the previous example, "aa" will have a very big variance if I compute it on the whole line, but a very low variance if I compute it only on the first part of the line.
So, I loop on all the image, searching the lower variance for a color (there may be more than one if the color is used at the beginning and then later many times).

Then, I fill the registers by filling the holes :
move.l color_a,d0
move.l d0,(a6)+
move.l d0,(a6)+
move.l d0,(a6)+
move.l d0,(a6)+
move.l color_i,d0
move.l d0,(a6)+
move.l d0,(a6)+
move.l d0,(a6)+
..
move.l color_a,(a6)+

You can notice that I use the immediate move for the last "aa" color, as the last value of the d0 register is "ii"
I use the 15 registers to fill as much as possible the holes inbetween two groups of colors.

For the move.w pixels (odd number of pixels), I compare the word part of the 15 registers to verify if I can use a move.w reg,(a6)+ instead of the immediate value.

By limiting the move immediate to the minimum, I maximize the use of the registers.

This technic allows to save a lot of cycles (and space) for each sprite.


If everything is not clear, just ask.

Don't forget that it's not only the sprites that are optimized, but the whole code :
- the loops are in the instruction cache as much as possible
- I call some code only when something has changed that need to refresh something else.
- I avoid tests in the graphic loops (only movem or hardcoded sprites here)


Best regards
Have fun

Thadoss

PS : an example of the content of a sprite (taken from a sprite of the game)

[...]
move.l d3,(a6)+
add.w #$044c,a6
move.w d4,(a6)+
move.l #$00015268,d3
move.l d3,(a6)+
move.l d4,(a6)+
move.l d0,(a6)+
move.l d1,(a6)+
move.l #$f58e526c,(a6)+
move.l d2,(a6)+
move.l d2,(a6)+
move.l #$2946526c,(a6)+
move.l #$526cf58e,(a6)+
move.l d5,(a6)+
move.l #$736c736c,(a6)+
move.l a3,(a6)+
move.l d4,(a6)+
add.w #$0448,a6
move.w d4,(a6)+
move.l #$5268736c,d6
move.l d6,(a6)+
[...]
Last edited by LaurentS on Tue Jan 22, 2013 12:32 pm, edited 1 time in total.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 4774
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby simonsunnyboy » Sun Jan 13, 2013 10:12 pm

Amazing, this is already a wellthoughtout implementation of socalled "compiled sprites".
I was thinking myself to implement something similar but now I realize, any implementation of mine would be less efficient. My code probably would have looked more like move.w s Thanks for the tip that move.l are faster here. It means one practically deals with chunky 32bit pixels and actually always referring to 2 16bit TC pixels?

Atm I'm stuck at my next crappy ST(e) game anyway but I still keep Falcon TC in mind ;)

Thank you for sharing!
Simon Sunnyboy/Paradize - http://paradize.atari.org/ - STOT: http://www.npoi.de/stot/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Jabber: simonsunnyboy@atari-jabber.org

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 254
Joined: Mon Jan 05, 2009 5:41 pm

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby LaurentS » Sun Jan 13, 2013 11:05 pm

simonsunnyboy wrote: Thanks for the tip that move.l are faster here. It means one practically deals with chunky 32bit pixels and actually always referring to 2 16bit TC pixels?
Thank you for sharing!


You're welcome.

According to the motorola doc :

{0, 1, 5,0,0,1, 8,0,1,1}, // MOVE.W Dn,(An)+
{0, 1, 9,0,0,1, 12,0,1,1}, // MOVE.L Dn,(An)+

Nb: the cycles are recomputed to take into account the 4 cycles access memory time to the STRam.
As these instruction don't fit in the cache, it takes 8 cycles in word and 12 cycles in long.

It seems logic, as you read the opcode for the move every other time (in long) compared to everytime (in word)

Memory read/write is something like :
(word): move_opcode word_write
(long): move_opcode word_write word_write

regards
Laurent / Thadoss

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 4774
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby simonsunnyboy » Mon Jan 14, 2013 9:43 am

Another thought, is the limitation to , (a6)+ and adda more speedy than using addressing with displacement?

I had the idea to do my sprites similar to this and repeating for all colors (mainly for smaller sprites up to 32x32 pixels):

Code: Select all

move.w #$f000,d0  ; load pixel data
move.w d0,2(a0)
move.w d0,162(a0)


etc...
Simon Sunnyboy/Paradize - http://paradize.atari.org/ - STOT: http://www.npoi.de/stot/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Jabber: simonsunnyboy@atari-jabber.org

User avatar
Zorro 2
Administrator
Administrator
Posts: 2190
Joined: Tue May 21, 2002 12:44 pm
Location: Saint Cloud (France)
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby Zorro 2 » Mon Jan 14, 2013 9:45 am

Some tricks are explain here.

Thanks Laurent to put them on the board.

Beats of Rage is a cool game, many thanks to code game yet on Falcon :D
Member of NoExtra Team

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3468
Joined: Sat Jun 30, 2012 9:33 am

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby dml » Mon Jan 14, 2013 10:02 am

Laurent that is a very nice system. So the aim is to maximise use of registers per group, to minimise the number of immediate values encoded, but to also trade against the frequency of skips (holes) in each group? So each group drawn has the most compact encoding possible and the cost of initiating each group is small. :)

My machines were disassembled for a few days already I but I'll be able to try Beats of Rage very soon!

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 254
Joined: Mon Jan 05, 2009 5:41 pm

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby LaurentS » Mon Jan 14, 2013 10:40 am

> Laurent that is a very nice system. So the aim is to maximise use of registers per group, to minimise the number of immediate values encoded, but to also trade against the frequency of skips (holes) in each group? So each group drawn has the most compact encoding possible and the cost of initiating each group is small. :)

That's exactly the point.
As I compute for each iteration the number of cycles it will generate, the algo repeats (in brute strenght) all the possible values from var=600 to var=8000 (before 600, the groups are too small and after 8000, the var is too high). At the end, I know which var gives me the lower cycles.
I can gain more than 3000 cycles only by changing the var value by +1 or -1.

I'm pretty sure it can be optimized more, but I would have spent more months on this algo instead of in the coding of the game :)


SimonSunnyboy :
> move.w d0,2(a0)
> move.w d0,162(a0)

I think the approach to sprites encoding depends of the sprite. I mean, my algo won't work if all the colors are different and if I can't find repeating patterns.
I think addressing with displacement is slower than the move immediate (because of the extra read of the word for the displacement), but it should be verified in the Motorola doc.
Of course, if you repeat the same value many times with a register, it may be faster, to be verified too in the motorola cycles tables.

For that, I use the table I wrote for Hatari_falcon (in the cpu directory), which integrates the 4 cycles per read/write bus access correction.

The best would probably to code your sprite the same way you suggest, but with long paterns instead of word paterns (to be verified in the cycles tables too).

The main goal here is to avoid dbf, cmp, tst and memory reading/writing (like move.w (a0)+,(a6)+ ) which takes too much time.
It also means no X clipping.

It's possible to add y clipping by adding a cmp after each line of displayed pixels directly in the code of the sprite (this slows a little the display, but again, less than a loop).

Regards
Laurent / Thadoss
Last edited by LaurentS on Tue Jan 22, 2013 12:42 pm, edited 1 time in total.

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 627
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby Anima » Mon Jan 14, 2013 10:54 am

Thanks for the explanation. I think this shows how much potential a Falcon has even when using the TC graphics mode in combination with the unfortunate 16 bit bus design. ;)

Cheers
Sascha

User avatar
Anima
Atari Super Hero
Atari Super Hero
Posts: 627
Joined: Fri Mar 06, 2009 9:43 am
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby Anima » Mon Jan 14, 2013 12:30 pm

An advantage having only 15 colours for the sprites is that the engine needs only registers to draw and you can select a different palette for each sprite as well. Galaga 88 heavily uses this feature. For example the asteroids in the scrolling stage will get brighter the more you hit them:

http://www.youtube.com/watch?v=kOH_uMRS ... age#t=373s

Cheers
Sascha

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 254
Joined: Mon Jan 05, 2009 5:41 pm

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby LaurentS » Mon Jan 14, 2013 2:09 pm

> An advantage having only 15 colours for the sprites is that the engine needs only registers to draw and you can select a different palette for each sprite as well

That's true. I also use this for the very little sprites (like the money, food or 1_UP sprites).

The problem is that when you use the colors 2 by 2 (in long instead od word), the number of colors increase a lot, and it's harder to keep only 15 colors.
For example, if you have 3 word colors (a,b,c) in the sprite, it can generate up to 9 colors in long (aa, ab, ac, ba, bb, bc, ca, cb, cc).

My "use long pixels instead of word" is good when the sprites are very big (like in BOR).

Optimizing the filling of the registers reduce a lot the use of immediate addressing, and using the long format can let you save up to 1/3 of the cycles and memory (compare to the word format).
The cpu makes 2 memory access for reading the MOVE.W opcode and writing to (a6)+ while it does 1 opcode read (MOVE.L) and 2 write access to (a6)+ memory in long.

On a big sprite, it allows to put one or two more sprites on the screen (that's why I can display up to 5 ennemies + the player + the score, items, ...) in Beats of rage.

That's also why I can do a transparent parallax scrolling (with the wirenettings for example) in the background display.

Regards

Laurent

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 4774
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby simonsunnyboy » Mon Jan 14, 2013 2:55 pm

The sideeffect ist: for smaller sprites (typical up to 32x32pixels or so), using move.w wouldn't be such a big penalty and might be used aswell?

(Sorry, i suck a cyclecounting :( )
Simon Sunnyboy/Paradize - http://paradize.atari.org/ - STOT: http://www.npoi.de/stot/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Jabber: simonsunnyboy@atari-jabber.org

User avatar
LaurentS
Captain Atari
Captain Atari
Posts: 254
Joined: Mon Jan 05, 2009 5:41 pm

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby LaurentS » Mon Jan 14, 2013 3:08 pm

Exactly: using hardcoded .w for little sprites and .l for big sprites seems to be a good approach.

For a small sprite (16*16 or 32*32), the gain/loose is not important compared to a 120*70 sprite.

The main idea is always to keep the pixels in registers.

Laurent

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3468
Joined: Sat Jun 30, 2012 9:33 am

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby dml » Mon Jan 14, 2013 3:15 pm

simonsunnyboy wrote:The sideeffect ist: for smaller sprites (typical up to 32x32pixels or so), using move.w wouldn't be such a big penalty and might be used aswell?

(Sorry, i suck a cyclecounting :( )


Without any cyclecounting, it's a reasonably good estimate to measure the total amount of data you have removed from the process - i.e. the amount of reading you hope to remove by compiling the sprites - because instructions themselves need to be fetched.

So 'move (a0)+,(a1)+' involves 3 memory 'events' - a source, a dest and an instruction fetch. Two of those are reading events and are viable for optimisation. If you change that to move xx,yy(a0) then you have removed the source, but you've just moved it into the instruction fetch, so the total remains the same. the cost changes a bit depending on the instructions themselves (back to cycle counting!) but without reducing the quantity of data being read you can't expect to see significant gains.

In an ideal world you would do both (eliminate the source without also increasing the instruction stream size per pixel) - while still supporting instruction cacheing as well (effectively removing some of the remaining instruction fetching, for cases of repeat use - so there is actually more writing than reading happening in total!).

That would be a huge win, but it's ridiculously hard to find workable scenarios for it. e.g. finding common patterns within the same large sprite, or drawing lots of very small identical sprites (game-dependent), or drawing many medium sized identical sprites in pieces, by drawing the repeated pieces in batches (but the bigger the sprite, the fewer you'll be drawing anyway, right?).

In general though, just moving bandwidth from a sprite source into the instruction stream isn't a win except at the margins (cycle counts for specific instructions). You need to get the resulting total bandwidth (source+dest+ifetch) lower than it would have been for the uncompiled one - which is exactly what Laurent and Anima have done in their projects, with some additional compression as well! i.e. longword writes (ifetch compression) and skips (write compression).

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 4774
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby simonsunnyboy » Fri Nov 01, 2013 2:11 pm

Today I finished my first own version of a sprite compiler. It can convert a GODPAINT file to a compiled sprite in assembly source.
However atm my generator algorithm is pretty dumb, it does not combine mask skips nor scanline skips, it does not handle empty scanlines and pixel data is done with move.w for each pixel.

The first result is working..and I will probably publish my code generator in some way. It will probbly run on all platforms with gcc and maybe TOS native with AHCC.

I have included the sample files. The color $FF00FF or $F81F in Falcon truecolor is taken as the mask color....
You do not have the required permissions to view the files attached to this post.
Simon Sunnyboy/Paradize - http://paradize.atari.org/ - STOT: http://www.npoi.de/stot/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Jabber: simonsunnyboy@atari-jabber.org

User avatar
Orion_
Captain Atari
Captain Atari
Posts: 333
Joined: Sat Jan 10, 2004 12:20 pm
Location: France
Contact:

Re: Beats of Rage coding technic (the hardcoded sprites)

Postby Orion_ » Fri Dec 06, 2013 11:37 am

I made my own sprite code generator too, it generate a compiled file ready to use as 68k code (no need to assemble), and it takes .png file as input.
the problem is, I only did move.w and d0-d7 optimisation at the moment (but it use lea/addq)
LaurentS > will you release source code of your converter some day ?


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 1 guest