Optimize idea/help needed

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

FedePede04
Atari God
Atari God
Posts: 1072
Joined: Fri Feb 04, 2011 12:14 am
Location: Denmark
Contact:

Optimize idea/help needed

Postby FedePede04 » Mon Sep 17, 2012 7:44 pm

hi
i said and play with my draw program and made this mockup.
and I thought that I would try to bring a little life to it

Tracker mockup .PNG


The problem is that it takes really a lot of time to print the score
as you can see that it takes nearly all raster time, and i have Steem running 24mhz.
so i takes allmost 3 reaster screen to print the graphic.

the way i make the score print is, that i make each score line, as a string and then print the string, i have made my own print string routine.

i have narrow the problem down to the gfx output loop it.

Code: Select all


String_loop_wolc:
   Move.b   (a1)+,d2
   beq.s      string_end_wolc
   
   and.w   #$00ff,d2
   sub.w      #32,d2

   lsl.w      #4,d2         ; finds X cord in Clip Art
   Lea      (a3,d2.w),a5
   move.w   (a2,d1.w),d4
   Lea      (a0,d4.w),a4
   

; printer char to screen
;HERE IS THE TIME KILLER.
   move.b   (a5)+,(a4)   
   move.b   (a5)+,2(a4)   
   move.b   (a5)+,160(a4)   
   move.b   (a5)+,162(a4)   
   move.b   (a5)+,320(a4)   
   move.b   (a5)+,322(a4)   
   move.b   (a5)+,480(a4)   
   move.b   (a5)+,482(a4)   
   move.b   (a5)+,640(a4)   
   move.b   (a5)+,642(a4)   
   move.b   (a5)+,800(a4)   
   move.b   (a5)+,802(a4)   
   
   addq.w   #2,d1
   bra.S   String_loop_wolc


i don't remember if it is the time i shall expect, or am i, Way out in the grass
i hope some of you, have some advice or a good idea.
You do not have the required permissions to view the files attached to this post.
Atari will rule the world, long after man has disappeared

sometime my English is a little weird, Google translate is my best friend :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Optimize idea/help needed

Postby dml » Mon Sep 17, 2012 8:21 pm

The first thing that springs to mind, is this - cache anything that takes time to compose (such as your output string graphic), and draw from the cache. Use a unique cache line per string, or per score source row (not per displayed line - since that will defeat scrolling).

This will let you blit / move your cached graphic to the planes you want using word or longword ops and fewer other operations that are needed by a string print.

If you have to modify one of the lines, dirty that cache line and refill it.

Any other optimisation is just going to be fiddly and probably not as fast - unless several of your strings change constantly, other than scrolling up or down. I imagine that the strings only change when they are introduced during scrolling, or when edited? So it seems reasonable to me.

(OTOH I may have misunderstood your program, in which case ignore me :)


note: it's easier to do this sort of thing if you have some way for the cache/draw stuff to detect when your string changes, without checking the whole string. some sort of dirty flag or better still, a revision counter works best. this way you can just keep working with strings and have the drawing stuff worry about what needs rebuilt or just redrawn.

------

and if you don't like the cache idea, or if your font/text style benefits from lots of dead space between characters, you could go the self-modifying-code / JIT compile approach, where you generate & cache code that makes characters/strings, and run the code from the cache. you can also generate 'undraw' code which wipes that string.

but that's really really fiddly to get right, i don't recommend it unless it makes/breaks your program - I wouldn't bother myself.

distantminds
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 106
Joined: Thu Sep 29, 2005 5:03 pm

Re: Optimize idea/help needed

Postby distantminds » Mon Sep 17, 2012 8:28 pm

Hi!

That's a lot of individual characters you are parsing. It's not too surprising it takes a lot of CPU time.

Some ideas for optimising...

Use single plane for chars, for alternating highlights use a mask in the second plane.

Only draw one line per update, and Scroll the rest up/down... Much much cheaper than printing all chars

Is your pattern step size uniform? If so, then don't tackle the step data as a string, but hardcode the line, block by block. You don't want to waste time checking for end of string each char.

Precalc your char data, all chars 00 to FF for example.. In 16pix word blocks.. Precalc all note representations.. Precalc just 0 to F in 8pix byte blocks..

Then, just code the line, block by block. Unroll every step. It's not a lot of code or precalc data..

Looking at your screens, what size is your font?! Things will be easier if you stick to an 8 pix font, but the concept of precalcing all possible blocks still applies :)

You should be able to draw a line and scroll the rest as necessary within a handful or two of scan lines I think..

Good luck mate :)

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3472
Joined: Sat Jun 30, 2012 9:33 am

Re: Optimize idea/help needed

Postby dml » Mon Sep 17, 2012 8:44 pm

distantminds wrote:Hi!
Some ideas for optimising...


yep that's probably even better esp. if the scroll rate is limited, the score doesn't hop much, row order remains fixed and the text doesn't get randomly overdrawn with other things.

if you do have any of those things happening you might want to consider the line cache - more effort to do but the display won't be volatile and you still only pay for what changes. otherwise, probably best to scroll it all up and down, refilling top or bottom rows as suggested.

FedePede04
Atari God
Atari God
Posts: 1072
Joined: Fri Feb 04, 2011 12:14 am
Location: Denmark
Contact:

Re: Optimize idea/help needed

Postby FedePede04 » Mon Sep 17, 2012 9:20 pm

thanks for the Advice.

you have give me some good ideas, to work with,
in my old players that i did around 1986-1988, i did the scrolling thing and only insert the top/bottom line, and most time it work like a charm, but some times it went wrong. so i had hope to find a solutions, so i always draw the entire screen. also because the old players was not the most optimized code.

i think what i will start with is, setup 3 blank lines, and after only fill in the char there have to be change.
and se how much time that would take, so if the score is completely filled, it would properly take a little more time, or else it should be faster.

Btw. the font is 8x6

thanks again.
Peter
You do not have the required permissions to view the files attached to this post.
Atari will rule the world, long after man has disappeared

sometime my English is a little weird, Google translate is my best friend :)

danorf
Atari maniac
Atari maniac
Posts: 78
Joined: Tue Feb 12, 2013 1:18 pm
Location: Behind a computer

Re: Optimize idea/help needed

Postby danorf » Tue Feb 12, 2013 6:08 pm

Hi,

don't know if it can be of any interest 4 months later, but I think :

Code: Select all

move.b   (a5)+,(a4)      ; 12(2/1)
   move.b   (a5)+,2(a4)      ; 16(3/1)
   move.b   (a5)+,160(a4)      ; 16(3/1)
   move.b   (a5)+,162(a4)      ; 16(3/1)
   move.b   (a5)+,320(a4)      ; 16(3/1)
   move.b   (a5)+,322(a4)      ; 16(3/1)
   move.b   (a5)+,480(a4)      ; 16(3/1)
   move.b   (a5)+,482(a4)      ; 16(3/1)
   move.b   (a5)+,640(a4)      ; 16(3/1)
   move.b   (a5)+,642(a4)      ; 16(3/1)
   move.b   (a5)+,800(a4)      ; 16(3/1)
   move.b   (a5)+,802(a4)      ; 16(3/1)
;            = 188 cycles (35/12)

should be replaced by :

Code: Select all

   move.w   (a5)+,d4      ;  8(2/0)
   movep.w  d4,(a4)      ; 16(2/2)
   move.w   (a5)+,d4      ;  8(2/0)
   movep.w  d4,160(a4)      ; 16(2/2)
   move.w   (a5)+,d4      ;  8(2/0)
   movep.w  d4,320(a4)      ; 16(2/2)
   move.w   (a5)+,d4      ;  8(2/0)
   movep.w  d4,480(a4)      ; 16(2/2)
   move.w   (a5)+,d4      ;  8(2/0)
   movep.w  d4,640(a4)      ; 16(2/2)
   move.w   (a5)+,d4      ;  8(2/0)
   movep.w  d4,800(a4)      ; 16(2/2)
;                               = 144 cycles (24/12)

need to be tested on even and odd adresses, but I think it should works.

More :
on a plain ST :

Code: Select all

   Lea      (a3,d2.w),a5

must be avoided because it wastes many cycles due to bus reading cycles misalignment.
it should cost 12(2/0) cycles but it's more likely 16(2/0) or in the best case (with pairing) 14(2/0).
this code :

Code: Select all

   lea.l   (a3),a5      ;  4(1/0)
   adda.w   d2,a5      ;  8(1/0)

should do the same thing and always cost 12(2/0) cycles.
in addition :

Code: Select all

lsl.w   #4,d2

and

Code: Select all

move.w   (a2,d1.w),d4

should pair for a gain of 4 cycles.
so replacing :

Code: Select all

   lsl.w   #4,d2
   lea.l   (a3,d2.w),a5
   move.w   (a2,d1.w),d4
   lea.l   (a0,d4.w),a4

by :

Code: Select all

   lsl.w   #4,d2
   move.w   (a2,d1.w),d4
   lea.l   (a3),a5
   adda.w   d2,a5
   lea.l   (a0),a4
   adda.w   d4,a4

should lead to a gain of 8 cycles.

Even more :
we can merge the two branches in only one
if we never have to parse null string :

Code: Select all

String_loop_wolc:
   move.b   (a1)+,d2
   beq.s   string_end_wolc
   
[...]
   
   addq.w   #2,d1
   bra.S   String_loop_wolc

could become :

Code: Select all

   move.b   (a1)+,d2
String_loop_wolc:

[...]

   move.b   (a1)+,d2
   bne.s  String_loop_wolc

Even even more :
I think you don't have to

Code: Select all

subi.w   #32,d2

inside the loop, you could make this operation once on the address stored in a3, outside the loop and the remaining :

Code: Select all

   move.b   (a1)+,d2
   andi.w   #$00ff,d2

could be rewriten in :

Code: Select all

   moveq   #0,d2
   move.b   (a1)+,d2

to gain 4 more cycles

so, in the end, your code will become :

Code: Select all

   [start of your code until the end of the loading clipart section]

   lea.l   -$200(a3),a3

   [following of your code until the print routine]

   moveq   #0,d2
   move.b   (a1)+,d2

String_loop_wolc:
   
; finds X cord in Clip Art
   lsl.w   #4,d2
   move.w   (a2,d1.w),d4
   lea.l   (a3),a5
   adda.w   d2,a5
   lea.l   (a0),a4
   adda.w   d4,a4

; printer char to screen
   move.w   (a5)+,d4
   movep.w   d4,(a4)
   move.w   (a5)+,d4
   movep.w   d4,160(a4)
   move.w   (a5)+,d4
   movep.w   d4,320(a4)
   move.w   (a5)+,d4
   movep.w   d4,480(a4)
   move.w   (a5)+,d4
   movep.w   d4,640(a4)
   move.w   (a5)+,d4
   movep.w   d4,800(a4)

   addq.w   #2,d1

   moveq   #0,d2
   move.b   (a1)+,d2
   bne.s  String_loop_wolc

It should run at 20+224n-4 cycles instead of 296n+4 cycles for the original code (n=number of char=number of iteration). That's ~24% better.

Other ideas :
1)perhaps :

Code: Select all

   addq.w   #2,d1
   bne.s  String_loop_wolc

could be replaced with a dbra.
2)with more informations on the data pointed by a2 and a3, I think we can save few more cycles.

All of these tricks need to be tested, but I hope (and think) I haven't said too many stupidities for my first post :angel: .


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 3 guests