Subroutine an data alignment (040/060)

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
lp
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2442
Joined: Wed Nov 12, 2003 11:09 pm
Location: GFA Headquarters
Contact:

Subroutine an data alignment (040/060)

Postby lp » Fri Jun 18, 2010 5:38 pm

I have taken the time to dis-assemble and re-assemble the Nova Mach64 (PCI) driver and solved the cache problems. The driver never was cache friendly. So my Hades060 now boots at full speed. :thumbs:

My question is this, now that I have it in assembler form. Would it benefit me to align the subroutine and data structures at all? If so, what boundary is best for 040/060 machines?

Related question, assuming alignment is worth doing, does plain TOS insure the binary is even loaded at the correct boundary? Pretty sure MiNT does this, but this driver is loading way before MiNT.

Current driver release is here:
http://www.bright.net/~gfabasic/html/do ... htm#clones
Video of my Hades060 booting for anyone who never seen one or perhaps bored:
http://www.youtube.com/user/gfabasic

User avatar
lp
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2442
Joined: Wed Nov 12, 2003 11:09 pm
Location: GFA Headquarters
Contact:

Re: Subroutine an data alignment (040/060)

Postby lp » Mon Jun 21, 2010 3:16 am

I ran some tests in plain TOS. It appears the best it does is an alignment of 4, that is the basepage is placed at an address evenly divisible by 4. I guess it would not do any good to align to 16 for the 020 and up if the program runs from the auto folder. Oh well.

User avatar
lp
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2442
Joined: Wed Nov 12, 2003 11:09 pm
Location: GFA Headquarters
Contact:

Re: Subroutine an data alignment (040/060)

Postby lp » Sun Jun 27, 2010 3:38 pm

What sparked my interest about alignment was the Nova VDI driver. I noticed the author had gone to the trouble to align certain parts of the code. He used a 16-byte boundary. However, TOS never loaded it on a 16-byte boundary so the whole scheme didn't really do much of anything but waste bytes. Its very likely he didn't know TOS only manages a 4-byte boundary.

I found this article, which explains alignment and the possible benefits:
http://www.mactech.com/articles/mactech ... index.html

I've written a loader/stub that goes in the \auto folder that loads the Nova VDI driver aligned, then fixes it up itself, and then starts it. Gembench then shows small improvements in some areas, the overall average is better. Its not dramatic, but it appears to help a bit.

My original proof of concept was written in GFA and it worked, but I've since redone the loader in assembler.

User avatar
wongck
Ultimate Atarian
Ultimate Atarian
Posts: 12774
Joined: Sat May 03, 2008 2:09 pm
Location: Far East
Contact:

Re: Subroutine an data alignment (040/060)

Postby wongck » Mon Jun 28, 2010 11:43 am

lp wrote:What sparked my interest about alignment was the Nova VDI driver. I noticed the author had gone to the trouble to align certain parts of the code. He used a 16-byte boundary. However, TOS never loaded it on a 16-byte boundary so the whole scheme didn't really do much of anything but waste bytes.


Could be that 16bit CPU aligned at this 16-bits boundary?
My Stuff: FB/Falcon CT63 CTPCI ATI RTL8139 USB 512MB 30GB HDD CF HxC_SD/ TT030 68882 4+32MB 520MB Nova/ 520STFM 4MB Tos206 SCSI
Shared SCSI Bus:ScsiLink ethernet, 9GB HDD,SD-reader @ http://phsw.atari.org
My Atari stuff for sale - click here for list

User avatar
lp
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2442
Joined: Wed Nov 12, 2003 11:09 pm
Location: GFA Headquarters
Contact:

Re: Subroutine an data alignment (040/060)

Postby lp » Mon Jun 28, 2010 3:01 pm

wongck wrote:Could be that 16bit CPU aligned at this 16-bits boundary?


No, its a 16-byte boundary, not bit boundary. Like I tell charles all the time, read the article. I realize its aimed at the mac, but the cpu related stuff is relevant. :wink:

User avatar
wongck
Ultimate Atarian
Ultimate Atarian
Posts: 12774
Joined: Sat May 03, 2008 2:09 pm
Location: Far East
Contact:

Re: Subroutine an data alignment (040/060)

Postby wongck » Tue Jun 29, 2010 9:21 am

Will your enhancement run on my TT with the Nova card?
My Stuff: FB/Falcon CT63 CTPCI ATI RTL8139 USB 512MB 30GB HDD CF HxC_SD/ TT030 68882 4+32MB 520MB Nova/ 520STFM 4MB Tos206 SCSI
Shared SCSI Bus:ScsiLink ethernet, 9GB HDD,SD-reader @ http://phsw.atari.org
My Atari stuff for sale - click here for list

User avatar
wongck
Ultimate Atarian
Ultimate Atarian
Posts: 12774
Joined: Sat May 03, 2008 2:09 pm
Location: Far East
Contact:

Re: Subroutine an data alignment (040/060)

Postby wongck » Tue Jun 29, 2010 9:23 am

lp wrote:
wongck wrote:Could be that 16bit CPU aligned at this 16-bits boundary?


No, its a 16-byte boundary, not bit boundary. Like I tell charles all the time, read the article. I realize its aimed at the mac, but the cpu related stuff is relevant. :wink:


Ha ha.... :lol: I mean both have the common number 16 in it.. nvm.
My Stuff: FB/Falcon CT63 CTPCI ATI RTL8139 USB 512MB 30GB HDD CF HxC_SD/ TT030 68882 4+32MB 520MB Nova/ 520STFM 4MB Tos206 SCSI
Shared SCSI Bus:ScsiLink ethernet, 9GB HDD,SD-reader @ http://phsw.atari.org
My Atari stuff for sale - click here for list

User avatar
lp
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2442
Joined: Wed Nov 12, 2003 11:09 pm
Location: GFA Headquarters
Contact:

Re: Subroutine an data alignment (040/060)

Postby lp » Wed Jun 30, 2010 4:31 pm

wongck wrote:Will your enhancement run on my TT with the Nova card?


It depends on how your driver was built. Every Nova driver is different for each hardware type, but in theory it should work on a TT030. I can upload a binary I guess. Might even release the source code to the loader if I get it cleaned up to my satisfaction :) .

User avatar
wongck
Ultimate Atarian
Ultimate Atarian
Posts: 12774
Joined: Sat May 03, 2008 2:09 pm
Location: Far East
Contact:

Re: Subroutine an data alignment (040/060)

Postby wongck » Thu Jul 01, 2010 11:57 am

Saw your video.
I don't think my Nova can do such colours in higher res then 1024x768. :roll:
My Stuff: FB/Falcon CT63 CTPCI ATI RTL8139 USB 512MB 30GB HDD CF HxC_SD/ TT030 68882 4+32MB 520MB Nova/ 520STFM 4MB Tos206 SCSI
Shared SCSI Bus:ScsiLink ethernet, 9GB HDD,SD-reader @ http://phsw.atari.org
My Atari stuff for sale - click here for list

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Re: Subroutine an data alignment (040/060)

Postby Nyh » Thu Jul 01, 2010 1:31 pm

lp wrote:My question is this, now that I have it in assembler form. Would it benefit me to align the subroutine and data structures at all? If so, what boundary is best for 040/060 machines?

For a 040 (and also a 030) I think it is best to align on 16 byte boundary.

In C it is easy to align data structures with malloc() and aligning them just as we do with the 256 byte screen boundary on the ST and write the value in a pointer to the data structure. The same is true in assembly.

Doing the same for code is a bit harder. The best way is to use you own loader and make sure the code starts on a 16 byte boundary. I think the easiest way would be using assembly placing the tight loops at the start so you can count your bytes. The hard way would be writing a very smart relocator.

The important question is: do you know which data to align? Do you know where the tight loops are and what data is read and written most frequently in those loops? Sometimes I am very surprised by the outcome of the profiler on a 68000.

From experience I know I can win orders of magnitude by writing smarter code. A factor of two by going from C to assembly is about the best you can get. For the 040 with the 4k caches your data will either fit a the cache or not. The 16 byte boundary in the cache is of little consequence I think. The only exceptions might be interrupt routines.

Hans Wessels

mikro
Hardware Guru
Hardware Guru
Posts: 2034
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: Subroutine an data alignment (040/060)

Postby mikro » Thu Jul 01, 2010 3:17 pm

Nyh wrote:Doing the same for code is a bit harder. The best way is to use you own loader and make sure the code starts on a 16 byte boundary. I think the easiest way would be using assembly placing the tight loops at the start so you can count your bytes. The hard way would be writing a very smart relocator.

Ever heard of align and cnop directives? :)

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Re: Subroutine an data alignment (040/060)

Postby Nyh » Thu Jul 01, 2010 3:33 pm

mikro wrote:Ever heard of align and cnop directives? :)

Align is a very good idea. Forgot about that... But yes, very easy to put code and data on a 16 byte boundary with respect of the program start.
cnop is new for me. It is not in the Pure C assembler but I see it is in devpac.

Hans Wessels

User avatar
shoggoth
Nature
Nature
Posts: 973
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: Subroutine an data alignment (040/060)

Postby shoggoth » Wed Jul 14, 2010 9:22 pm

mikro wrote:
Nyh wrote:Doing the same for code is a bit harder. The best way is to use you own loader and make sure the code starts on a 16 byte boundary. I think the easiest way would be using assembly placing the tight loops at the start so you can count your bytes. The hard way would be writing a very smart relocator.

Ever heard of align and cnop directives? :)


Makes no difference if you're on TOS - because the program needs to be placed at a 16 byte multiple. MiNT does this, TOS does not.

Damn Atari for not taking care of this! (well they couldn't really know that we would still be using this stuff in 2010 could they? :-)
Ain't no space like PeP-space.

User avatar
earx
Captain Atari
Captain Atari
Posts: 353
Joined: Wed Aug 27, 2003 7:09 am

Re: Subroutine an data alignment (040/060)

Postby earx » Wed Sep 22, 2010 7:42 am

I think nor CNOP nor ALIGN worked for me. Nice directives but everytime I loaded Mon, it didn't show any alignment at all :/ So, the only way is to relocate chunks of code to aligned positions. This introduces problems with instruction cache.. So, you now have to use either a load-align-pexec scheme or try the supervisor way: supvis-cacheoff/flush-relocate-user. The latter is pretty tricky: meddling with the cache is a world in itself. But IIRC decent cache control was now available on CT60 via a special system call??

User avatar
lp
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 2442
Joined: Wed Nov 12, 2003 11:09 pm
Location: GFA Headquarters
Contact:

Re: Subroutine an data alignment (040/060)

Postby lp » Fri Sep 24, 2010 4:13 pm

Yes, as an xbios() call if you look in the CT60 documentation. MiNT also provides cache control options via system calls, but of course they are only usable after MiNT loads.

mikro
Hardware Guru
Hardware Guru
Posts: 2034
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: Subroutine an data alignment (040/060)

Postby mikro » Wed Sep 29, 2010 12:58 pm

shoggoth wrote:Makes no difference if you're on TOS - because the program needs to be placed at a 16 byte multiple. MiNT does this, TOS does not.

Good point.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: SteveBagley and 6 guests