[ASM] Fast Muls and Divs Help

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Ratatouil
Atarian
Atarian
Posts: 6
Joined: Mon Apr 16, 2018 8:46 pm

[ASM] Fast Muls and Divs Help

Postby Ratatouil » Mon Apr 16, 2018 9:28 pm

Hi there,
My name's David, I live in France and I'm very new registered in this place.
I used to own a Commodore 64 in the middle 80s and after at the start of 90s an Atari ST. So long long time ago (gosh I'm old).
I also used to exclusively play games on my good old Atari and now I feel like a late challenge to finally make something constructive.
I decided to start with the hardest : 68000 ASM :mrgreen:
After a lot of weeks reading several docs I'm stuck with a problem : Fast muls and Divs.
I need your help :mrgreen:

According to my doc a Muls is around 70 cycles to execute. Quite a lot. So I find (not me the author I read it somewhere) a solution using square tables.
So if I understood well the theory I wrote this portion of code making a signed a*b

Code: Select all

;d0=a
;d1=b
   lea sqrtab(pc),a0
   move.w d0,d2               ; 4
   add.w d1,d0               ; 4   a+b
   sub.w d1,d2               ; 4   a-b
   add.w d0,d0               ; 4
   add.w d2,d2               ; 4
   move.w (a0,d0.w),d0         ; 14    (a+b)^2
   sub.w (a0,d2.w),d0            ; 14   (a+b)^2-(a-b)^2
   asr.w #2,d0               ; 10   /4


The problem is I can only muls two .BYTE to have a result in .W because the sqr table would be huge (?).
So for example if I want to muls two 12,4 numbers (12 bits integer, 4 bits decimal) I need to make two times this portion of code (+adding +swapping).
If I count well, it's a total of 58+58 cycles square tab method > 70 cycles of MULS instruction.
I have probably missed something as this square method meant faster. My math level is not very high so I request help :D

Good evening,

PS: Sorry, my english is weak.

Moulinaie
Captain Atari
Captain Atari
Posts: 201
Joined: Wed Feb 01, 2012 9:34 pm

Re: [ASM] Fast Muls and Divs Help

Postby Moulinaie » Tue Apr 17, 2018 6:14 am

Hi,

More than two times!
Your first number is two bytes long : let's say AB
The second is two bytes long, let's say CD.
Remember when you multiply:
AB
CD
----
DxB
DxA and shift 8 bits
CxB and shift 8 bits
CxA and shift 16 bits
----
total sum.

Guillaume.

wietze
Captain Atari
Captain Atari
Posts: 215
Joined: Fri Mar 01, 2013 10:52 pm

Re: [ASM] Fast Muls and Divs Help

Postby wietze » Tue Apr 17, 2018 6:30 am

Hello, and welxome!

Good to see the number of asm programming enthusiasts growing ;).

I am not familiar with your approach as a common used technique for muls, but I will give it some thought tonite after work. So far I have used two ways to speed up divs (they are ranging up to 144 cycles iirc, whereas muls uusually end up being around 60 max iirc); which are precalculated tables or the use of logarithmic tables.

Precalculated tables are basically tables with indeces that are encoded a and b values. To use them, you create the index from the 2 given values and get the result from the table. This solution for me is beneficial for divs only, and the speed increase over muls is marginal.

Logtables are useful for both divs and muls and rely on the fact that ln(a)+ln(b)=ln(a+b). I think the benefit here is you minimize the amount of operations you need to do to get the result (3 lookups and an add, of which some local optimization of the lookups may be skipped). However the downside is that you trade in precision for speed. In that sense your approach may be the best of both worlds and gain a little speed without the loss of precision.

Ill see ifni can get back to your solution when i have digested it.

wietze
Captain Atari
Captain Atari
Posts: 215
Joined: Fri Mar 01, 2013 10:52 pm

Re: [ASM] Fast Muls and Divs Help

Postby wietze » Tue Apr 17, 2018 6:54 am

Ok. Just the math you used, with my interpretation:

- Original Code:

Code: Select all

   lea sqrtab(pc),a0         
   move.w d0,d2              ; 4
   add.w d1,d0               ; 4   a+b
   sub.w d1,d2               ; 4   a-b
   add.w d0,d0               ; 4
   add.w d2,d2               ; 4
   move.w (a0,d0.w),d0       ; 16    (a+b)^2
   sub.w (a0,d2.w),d0        ; 16   (a+b)^2-(a-b)^2
   asr.w #2,d0               ; 8

Total: 60 cycles

Obvious optimizations are:
- premultiply your a,b values by 2, removing the add.w's to get proper indices
- preshift your table results to the right by 2, to remove the shifts of the results:

- Suggestion 1, obvious optimizations:

Code: Select all

   lea   sqrtab(pc),a0   
   move.w   d0,d2            ;4
   add.w   d1,d0            ;4
   sub.w   d1,d2            ;4
   move.w   (a0,d0.w),d0      ;16
   sub.w   (a0,d2.w),d0      ;16

Total: 44 cycles

If you want to go the extra mile, waste memory for the solution; assuming that you do multiple multiplications (e.g. setup cost of table can be negated; I assume, since you didnt take the cycles for loading the table into account):
- align the square root tab to a 64kb boundary (e.g. start of table is at $xxxx0000), that way we dont need to use indexed addressing mode to load data from register/table:

Code: Select all

   move.l   sqrttabpointer,d0   ; setup d0, aligned to word boundary

   move.l   d0,d2         ;4    d2 aligned to word boundary
   add.w   d1,d0         ;4
   sub.w   d1,d2         ;4
   move.l   d0,a0         ;4   use d0 value as table address
   move.w   (a0),d0         ;8
   move.l   d2,a0         ;4   use d2 value as table address
   sub.w   (a0),d0         ;8

Total: 36 cycles.

The last solution should be viable for both muls and divs; and is actually quite fast, and precise. I think I should use this approach sometime :)

Moulinaie
Captain Atari
Captain Atari
Posts: 201
Joined: Wed Feb 01, 2012 9:34 pm

Re: [ASM] Fast Muls and Divs Help

Postby Moulinaie » Tue Apr 17, 2018 7:04 am

Ratatouil wrote:

Code: Select all

;d0=a
;d1=b
   lea sqrtab(pc),a0
   move.w d0,d2               ; 4
   add.w d1,d0               ; 4   a+b
   sub.w d1,d2               ; 4   a-b
   add.w d0,d0               ; 4
   add.w d2,d2               ; 4
   move.w (a0,d0.w),d0         ; 14    (a+b)^2
   sub.w (a0,d2.w),d0            ; 14   (a+b)^2-(a-b)^2
   asr.w #2,d0               ; 10   /4




When you compute D2 = a-b, you must be sure that a>b, else a-b<0.
And then, (a0,d2.w) points to an address BEFORE a0.

Or, you must have two symetric square tables.
A0+1 = 1²
A0+2 = 2²
..
and A0-1 = 1²
A0-2 = 2²
...

Guillaume.

wietze
Captain Atari
Captain Atari
Posts: 215
Joined: Fri Mar 01, 2013 10:52 pm

Re: [ASM] Fast Muls and Divs Help

Postby wietze » Tue Apr 17, 2018 7:30 am

Sure. That would just mean you generate your aligned table differently.

Ratatouil
Atarian
Atarian
Posts: 6
Joined: Mon Apr 16, 2018 8:46 pm

Re: [ASM] Fast Muls and Divs Help

Postby Ratatouil » Tue Apr 17, 2018 10:06 am

Wow ! You guys are fast and smart.
Need to think and study your solutions.
This AB*CD problem (4 adds) is a pain but seems inevitable. May be using a 11,4 bits format would reduce a single table to 64Kb (-32Kb to 32Kb) so no need to make 4 adds.
Anyway, many thanks guys for your help, much appreciated.

Ratatouil
Atarian
Atarian
Posts: 6
Joined: Mon Apr 16, 2018 8:46 pm

Re: [ASM] Fast Muls and Divs Help

Postby Ratatouil » Tue Apr 17, 2018 10:08 am

huh ?!? My answer post has disappeared ?

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5005
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: [ASM] Fast Muls and Divs Help

Postby simonsunnyboy » Tue Apr 17, 2018 3:49 pm

*Moderator note* Yes it has as you need at least 3 manually approved posts before they appear automatically.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

Ratatouil
Atarian
Atarian
Posts: 6
Joined: Mon Apr 16, 2018 8:46 pm

Re: [ASM] Fast Muls and Divs Help

Postby Ratatouil » Tue Apr 17, 2018 5:21 pm

wietze wrote:The last solution should be viable for both muls and divs; and is actually quite fast, and precise. I think I should use this approach sometime :)


Not sure to understand. To switch to divs mode we should need an additionnal ASR.W ?

wietze
Captain Atari
Captain Atari
Posts: 215
Joined: Fri Mar 01, 2013 10:52 pm

Re: [ASM] Fast Muls and Divs Help

Postby wietze » Wed Apr 18, 2018 4:59 am

Ratatouil wrote:
wietze wrote:The last solution should be viable for both muls and divs; and is actually quite fast, and precise. I think I should use this approach sometime :)


Not sure to understand. To switch to divs mode we should need an additionnal ASR.W ?


I didnt look at the math, i just assumed the proposed stuff would work for muls and divs. If it only works for muls, then i should have said only muls ;).


Social Media

     

Return to “Coding”

Who is online

Users browsing this forum: No registered users and 3 guests