fastest way to copy a memory block in 68060 assembly

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
rmd
Atari freak
Atari freak
Posts: 59
Joined: Fri Dec 15, 2017 11:30 am
Location: Berlin, Zermany

fastest way to copy a memory block in 68060 assembly

Postby rmd » Sat Nov 02, 2019 9:26 am

Hello,
I'm looking for a code snippet of the fastest way to copy a 64KB block in 68060 assembly. (when source and destination don't overlap)
thanks!

swapd0
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 102
Joined: Thu Dec 13, 2007 8:56 pm

Re: fastest way to copy a memory block in 68060 assembly

Postby swapd0 » Sat Nov 02, 2019 12:10 pm

something like this.

rept 64*1024/13*4
movem.l (a0),d0-d7/a2-a6
movem.l d0-d7/a2-a6,(a1)
lea 13*4(a0),a0
lea 13*4(a1),a1
endr

User avatar
rmd
Atari freak
Atari freak
Posts: 59
Joined: Fri Dec 15, 2017 11:30 am
Location: Berlin, Zermany

Re: fastest way to copy a memory block in 68060 assembly

Postby rmd » Sat Nov 02, 2019 8:38 pm

swapd0 wrote:something like this.

rept 64*1024/13*4
movem.l (a0),d0-d7/a2-a6
movem.l d0-d7/a2-a6,(a1)
lea 13*4(a0),a0
lea 13*4(a1),a1
endr

:cheers:
found a version there too https://chromium.googlesource.com/nativ ... k/memcpy.S

JeanMars
Captain Atari
Captain Atari
Posts: 180
Joined: Fri Apr 09, 2010 5:15 pm
Location: France
Contact:

Re: fastest way to copy a memory block in 68060 assembly

Postby JeanMars » Sat Nov 02, 2019 8:43 pm

Hi,

How does the rept(movem.l ...) based routine compares in term of clock cycles to the rept( move.l (a0)+,(a1)+ ...) ?
Estimating CPU cycles is a bit too far way in my mind but just for curiosity if someone has already sone the maths I'll appreciate.

Thanks,
Jean

JeanMars
Captain Atari
Captain Atari
Posts: 180
Joined: Fri Apr 09, 2010 5:15 pm
Location: France
Contact:

Re: fastest way to copy a memory block in 68060 assembly

Postby JeanMars » Sat Nov 02, 2019 8:59 pm

OK, found some 68k clock cycles here:
http://oldwww.nvg.ntnu.no/amiga/MC680x0 ... mmove.HTML
http://oldwww.nvg.ntnu.no/amiga/MC680x0 ... mpetc.HTML

So it's a long tile away from me but I would say:
move.l: 20n
movem.l: 12+8n+8+8n=20+16n
n being the number of long words moved.
So for 13 (#of registers for movem.l method): 228 cycles
For movel.l method: 260 cycles

But it's quite old for me, not sure about the maths here.

User avatar
chlu600
Retro freak
Retro freak
Posts: 15
Joined: Wed Mar 04, 2015 8:32 am

Re: fastest way to copy a memory block in 68060 assembly

Postby chlu600 » Sat Nov 02, 2019 10:26 pm

I’ve never done something with the 68060 cpu. But why not using the instruction cache?
The small loop will easily fit into the cache and then just read & write.

Perhaps I’ve missed something?

JeanMars
Captain Atari
Captain Atari
Posts: 180
Joined: Fri Apr 09, 2010 5:15 pm
Location: France
Contact:

Re: fastest way to copy a memory block in 68060 assembly

Postby JeanMars » Sat Nov 02, 2019 11:13 pm

Yep sure considering the amount of memory to move, need a loop counter anyway else code size would be too much.
Don't know how big 68060 instruction cache is (BTW why 68060?) but idea is to get as close as possible as this cache size and loop on n rept of movem.l

User avatar
thomas3
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 130
Joined: Tue Apr 11, 2017 8:57 pm
Location: the people's republic of south yorkshire, uk.

Re: fastest way to copy a memory block in 68060 assembly

Postby thomas3 » Sun Nov 03, 2019 1:01 am

swapd0 wrote:something like this.

rept 64*1024/13*4
movem.l (a0),d0-d7/a2-a6
movem.l d0-d7/a2-a6,(a1)
lea 13*4(a0),a0
lea 13*4(a1),a1
endr


You can lose all the a0 leas and almost all the a1 leas by using (a0)+ and then addressing to d(a1).

User avatar
rmd
Atari freak
Atari freak
Posts: 59
Joined: Fri Dec 15, 2017 11:30 am
Location: Berlin, Zermany

Re: fastest way to copy a memory block in 68060 assembly

Postby rmd » Sun Nov 03, 2019 6:31 am

JeanMars wrote:(BTW why 68060?)
Because silly venture :wink:

JeanMars
Captain Atari
Captain Atari
Posts: 180
Joined: Fri Apr 09, 2010 5:15 pm
Location: France
Contact:

Re: fastest way to copy a memory block in 68060 assembly

Postby JeanMars » Sun Nov 03, 2019 9:33 am

Hi,

You can lose all the a0 leas and almost all the a1 leas by using (a0)+ and then addressing to d(a1).

? Don't get it.
BTW in my cycle calculation I forgot to include these lea, so it's 228+8+8=244 (movem.l) vs 260 (move.l)
Well not that better, assuming I'm correct on cycles hwich is pretty risky considering how long I did not this kind of things :-)

Also here it's 68000, not 68060 and don't know is adresses are even aligned.

User avatar
rmd
Atari freak
Atari freak
Posts: 59
Joined: Fri Dec 15, 2017 11:30 am
Location: Berlin, Zermany

Re: fastest way to copy a memory block in 68060 assembly

Postby rmd » Sun Nov 03, 2019 9:58 am

JeanMars wrote:Also here it's 68000, not 68060 and don't know is adresses are even aligned.

Yes the adresses will be aligned.

OL
Atari Super Hero
Atari Super Hero
Posts: 519
Joined: Fri Apr 01, 2005 6:59 am
Contact:

Re: fastest way to copy a memory block in 68060 assembly

Postby OL » Sun Nov 03, 2019 10:13 am

Hello

you should be able to do faster in theory you should be able to do 2 instruction in same time but I think issue is link to memory acces with move.l, on CT60 limit look at 50Mb/sec at 100Mhz, on V4 it more than 100Mb/sec

Olivier

JeanMars wrote:Hi,

You can lose all the a0 leas and almost all the a1 leas by using (a0)+ and then addressing to d(a1).

? Don't get it.
BTW in my cycle calculation I forgot to include these lea, so it's 228+8+8=244 (movem.l) vs 260 (move.l)
Well not that better, assuming I'm correct on cycles hwich is pretty risky considering how long I did not this kind of things :-)

Also here it's 68000, not 68060 and don't know is adresses are even aligned.
OL

evil
Captain Atari
Captain Atari
Posts: 185
Joined: Sun Nov 12, 2006 8:03 pm

Re: fastest way to copy a memory block in 68060 assembly

Postby evil » Sun Nov 03, 2019 10:26 am

rmd wrote:Hello,
I'm looking for a code snippet of the fastest way to copy a 64KB block in 68060 assembly. (when source and destination don't overlap)
thanks!


Hello rmd,

in my experience move16 is the fastest on 060. It requires the start and end buffer to be aligned by 16 bytes, but apart from that it is very straight forward.

So I did a little test now to see how it stacks up against movem.l and movem.l with scrambled source data.
Each test copies 8k of data each loop, it isn't completely optimal for movem.l so it can do a little better than shown here.
Also unlike the 68000 we have a cache and don't want to rept everything as it won't fit.
Hopefully I didn't do some fatal mistake :) Just lower the loop count to #8-1 for a 64k copy.

First try, movem.l

Code: Select all

;      movem.l linear copy about 488kbyte / 60Hz VBL, 66 MHz CPU
copy_movem:   
      move.l   buf1addr,a5
      move.l   buf2addr,a6
      
      moveq   #61-1,d0
.loop:
      movem.l   (a5)+,d1-a4      ;52 bytes
      movem.l   d1-a4,(a6)

.q:      set   52
      rept   156
      movem.l   (a5)+,d1-a4      ;8112 bytes
      movem.l   d1-a4,.q(a6)
.q:      set   .q+52
      endr

      movem.l   (a5)+,d1-d7      ;28 bytes = 8k per loop
      movem.l   d1-d7,.q(a6)

      lea   8192(a6),a6   
      
      dbra   d0,.loop


Second try, movem.l with the source data scrambled backwards, so we can use (an)+ for source and -(an) for destination.

Code: Select all

;      movem.l scrambled copy about 496kbyte / 60Hz VBL, 66 MHz CPU
copy_movem_scrambled:
      move.l   buf1addr,a5
      move.l   buf2addr,a6
      add.l   #1024*496,a6
      
      moveq   #62-1,d0
.loop:
      rept   157
      movem.l   (a5)+,d1-a4      ;8164 bytes
      movem.l   d1-a4,-(a6)
      endr
      
      movem.l   (a5)+,d1-d7      ;28 bytes = 8k per loop
      movem.l   d1-d7,-(a6)

      dbra   d0,.loop


And finally, the nice and clean move16

Code: Select all

;      move16 about 520kbyte/ 60Hz VBL, 66 MHz CPU
copy_16:   
      move.l   buf1addr,a0
      move.l   buf2addr,a1

      moveq   #65-1,d0
.loop:
      rept   512
      move16   (a0)+,(a1)+      ;8k per loop
      endr
      
      dbra   d0,.loop


Worth to note is that the test program didn't shut off all of the OS so Timer C and VBL from the OS were still running. Doing it more naughty will gain some numbers for each method.

User avatar
leonard
Moderator
Moderator
Posts: 658
Joined: Thu May 23, 2002 10:48 pm
Contact:

Re: fastest way to copy a memory block in 68060 assembly

Postby leonard » Sun Nov 03, 2019 12:01 pm

I didn't know anything about 68060 (except movep does not exist anymore :)). I didn't even know about the "move16" instruction existence. Sounds very nice!
Leonard/OXYGENE.

User avatar
rmd
Atari freak
Atari freak
Posts: 59
Joined: Fri Dec 15, 2017 11:30 am
Location: Berlin, Zermany

Re: fastest way to copy a memory block in 68060 assembly

Postby rmd » Sun Nov 03, 2019 5:23 pm

evil wrote:
rmd wrote:Hello,

Code: Select all

;      move16 about 520kbyte/ 60Hz VBL, 66 MHz CPU
copy_16:   
      move.l   buf1addr,a0
      move.l   buf2addr,a1

      moveq   #65-1,d0
.loop:
      rept   512
      move16   (a0)+,(a1)+      ;8k per loop
      endr
      
      dbra   d0,.loop


Worth to note is that the test program didn't shut off all of the OS so Timer C and VBL from the OS were still running. Doing it more naughty will gain some numbers for each method.


amazing, so if I want to inline that, what are the clobbered regs, : "d0", "a0", "a1" ?

evil
Captain Atari
Captain Atari
Posts: 185
Joined: Sun Nov 12, 2006 8:03 pm

Re: fastest way to copy a memory block in 68060 assembly

Postby evil » Sun Nov 03, 2019 5:46 pm

rmd wrote:
evil wrote:
rmd wrote:Hello,

Code: Select all

;      move16 about 520kbyte/ 60Hz VBL, 66 MHz CPU
copy_16:   
      move.l   buf1addr,a0
      move.l   buf2addr,a1

      moveq   #65-1,d0
.loop:
      rept   512
      move16   (a0)+,(a1)+      ;8k per loop
      endr
      
      dbra   d0,.loop


Worth to note is that the test program didn't shut off all of the OS so Timer C and VBL from the OS were still running. Doing it more naughty will gain some numbers for each method.


amazing, so if I want to inline that, what are the clobbered regs, : "d0", "a0", "a1" ?


In this case yes, but you can easily change that around.

User avatar
rmd
Atari freak
Atari freak
Posts: 59
Joined: Fri Dec 15, 2017 11:30 am
Location: Berlin, Zermany

Re: fastest way to copy a memory block in 68060 assembly

Postby rmd » Sun Nov 03, 2019 6:17 pm

evil wrote:
In this case yes, but you can easily change that around.
thanks!

User avatar
thomas3
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 130
Joined: Tue Apr 11, 2017 8:57 pm
Location: the people's republic of south yorkshire, uk.

Re: fastest way to copy a memory block in 68060 assembly

Postby thomas3 » Sun Nov 03, 2019 6:34 pm

evil wrote:Second try, movem.l with the source data scrambled backwards, so we can use (an)+ for source and -(an) for destination.


The most obvious gfx optimisation I never thought of, part #4822 of an eternally ongoing series.......

tommo
Atari User
Atari User
Posts: 31
Joined: Mon Jan 29, 2018 6:00 pm

Re: fastest way to copy a memory block in 68060 assembly

Postby tommo » Mon Nov 04, 2019 7:36 pm

Do you want to copy a screen from fast-ram to st-ram on a ct60 in GFA?

The fastram of ct60 is ca 10x faster than st-ram 8)
with a 66mhz on LONG alignment i can read or write about 50mb/s in fastram
when tos interupts are still normal running using "movem.l"

GFA 3.6tt-interpreter does not like the 060-multple caches to be on!
I have not tried it after compilation.

I you are interested in the workings find a "68060 um pdf"
the instruction-timing tabels are also there.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 7 guests