m68k-atari.mint-gcc: stack frame structure

C and PASCAL (or any other high-level languages) in here please

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Sat Apr 19, 2014 3:08 pm

I'm trying to dissolve how gcc produces its stack frames. To interface with existing assembly code, I need to copy data from the stack to my registers.

For starters I compiled the following short function:

Code: Select all

uint16_t myfunc(uint16_t ab,  uint16_t * dest, uint16_t ba)
{
   *dest = (ab | ba);
   
   return 42;
}


This gives the following disassembled code (not minding the offsets):

Code: Select all

00000010 <_myfunc>:
  10:   4e56 0000         linkw %fp,#0
  14:   206e 000c         moveal %fp@(12),%a0         [color=#BF0000]my dest pointer [/color]
  18:   302e 0012         movew %fp@(18),%d0         [color=#BF0000]ab ? [/color]
  1c:   806e 000a         orw %fp@(10),%d0              [color=#BF0000]ba? [/color]
  20:   3080              movew %d0,%a0@
  22:   702a              moveq #42,%d0
  24:   4e5e              unlk %fp
  26:   4e75              rts


From my knowledge fp stands for register a6 and is a sort of reference to the current stack frame, generated by the link instruction.

Can somebody explain in simple words why offsets start not at offset 8 (4 bytes sp, 4 bytes return address on stack), and why the offsets inside the stack frame do not match my data types?

Is there a generic rule how I can deduce my stack frame composition from my function parameter list?
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: m68k-atari.mint-gcc: stack frame structure

Postby dml » Sat Apr 19, 2014 4:01 pm

Hi, perhaps this will help. I use this pattern to bind asm funcs to GCC.

Be careful how you declare the function on the C side. If you declare a return type that value should leave in d0. If you declare void return, then you should preserve d0.

IIRC not all registers need preserved otherwise - something like a2-a6/d2-d7, with the compiler assuming the others trashed - but I believe d0 is a special case for void return, and best not take shortcuts at the start ;)

The movem size can be changed - just adjust the size of the '.save' part of the stackframe definition. a6 is saved by the link op.

Code: Select all

int my_asm_func(int arg1, int arg2);


Code: Select all

   XDEF      _my_asm_func

*-------------------------------------------------------*
_my_asm_func:
*-------------------------------------------------------*
         rsreset
*-------------------------------------------------------*
.savea6:      rs.l   1
.save:      rs.l   13 ;(d1-d7/a0-a5)
.return:      rs.l   1
*-------------------------------------------------------*
.arg1:         rs.l   1
.arg2:         rs.l   1
*-------------------------------------------------------*
.frame_:      rs.b   0
*-------------------------------------------------------*
   movem.l      d1-d7/a0-a5,-(sp)
   link      a6,#-.frame_
*-------------------------------------------------------*
   move.l      .arg1(a6),d0
   move.l      .arg2(a6),d1
;   do stuff here....
*-------------------------------------------------------*
;   return in d0...
   moveq      #0,d0
   unlk      a6
   movem.l      (sp)+,d1-d7/a0-a5
   rts


[EDIT]

Note this is devpac/vasm syntax - GCC accepts something close but you'll need to check the equivalent of RSRESET/RS.? for defining structure offsets.


Can somebody explain in simple words why offsets start not at offset 8 (4 bytes sp, 4 bytes return address on stack), and why the offsets inside the stack frame do not match my data types?
Is there a generic rule how I can deduce my stack frame composition from my function parameter list?


The rs.? structure I posted above shows what fields are sitting on the stack.

%fp is just a GAS alias for 'frame pointer' which is generally a6, if a frame pointer is generated at all - it doesn't have to be, especially if you tell it not to (-fomit-frame-pointer), in which case it will likely be a7/sp.

The only other thing you need to know, is that all fields are longword aligned, regardless of usual size. If you pass a char, it will occupy 4 byes. This doesn't mean the caller/callee access all 4 bytes when using it, but it will space them out by 4 byte intervals.
Last edited by dml on Sun Apr 20, 2014 9:47 am, edited 1 time in total.

User avatar
mfro
Atari Super Hero
Atari Super Hero
Posts: 802
Joined: Thu Aug 02, 2012 10:33 am
Location: SW Germany

Re: m68k-atari.mint-gcc: stack frame structure

Postby mfro » Sat Apr 19, 2014 5:09 pm

Yes, gcc pushes parameters to the stack with int size, regardless of their true size.

Stack format depends on int size, however: if you compile -mshort, int size is 16 bit, stack format likewise. If you compile with 32 bit ints, stack format is 32 bits.

Parameters are pushed backwards (left to right).

There is one exception to the int size rule: if you pass structs by value, fields are pushed with their natural size (shorts = 16 bit).

You can use this feature to trick the compiler to combine -mshort with "normal" code, i.e. to pass 16 bit ints (-mshort stack format) from 32 bit code if you pack a suitable struct.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Sun Apr 20, 2014 9:18 am

mfro: So with short ints it uses 16bits for char, short and int (*int_8_T and *int16_t) and for longs (xint32_t) it will pass longs, same as for addresses?


dml: i have never ever heard or seen those rs opcodes from Devpac. Do they allocate stack references? For the d0 i shall be warned ;)
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: m68k-atari.mint-gcc: stack frame structure

Postby dml » Sun Apr 20, 2014 9:36 am

simonsunnyboy wrote:dml: i have never ever heard or seen those rs opcodes from Devpac. Do they allocate stack references? For the d0 i shall be warned ;)


The RS.? directive is for making structures/offsets. It's like DS.? but it counts from zero following the RSRESET, instead of counting from the last program address. I used it here to make a stackframe, but you'd normally use it to define something like C structs in asm, and to represent C structs in asm.

Otherwise you're using literal magic numbers all over your asm code, which makes it hard to read and maintain.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Sun Apr 20, 2014 10:07 am

Got it - in the past i used EQUs for this in my m68k code.
But what directive is used for gcc in its place?
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Sun Apr 20, 2014 10:57 am

At least I could now make a simple function to work, without symbolic access to my stack frame:

Protoytpe is void short_poke(uint16_t value, uint16_t * dest);

Assembly funcion is:

Code: Select all

.text
.globl _short_poke

_short_poke:
   link a6,#0
   move.l a0,-(sp)
   
   move.l 10(a6),a0       | fetch pointer
   move.w 8(a6),(a0)      | use long as pointer
   
   move.l (sp)+,a0
   unlk a6
   rts
.end


Now if I can make the symbolic work as expected, then it is natural, stack frame offset is incrementing with each parameter passed and the increment size is based on the data type used.

Is tehre some documentation around which registers are safe to use and which need to be saved on the stack? On PureC/AHCC d0,d1 d2 and a0,a1 could be used, the others had to be preserved. What to preserve with gcc, or what is allowed to be trashed? Apart from d0 which seems to be the function return value.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
mfro
Atari Super Hero
Atari Super Hero
Posts: 802
Joined: Thu Aug 02, 2012 10:33 am
Location: SW Germany

Re: m68k-atari.mint-gcc: stack frame structure

Postby mfro » Sun Apr 20, 2014 11:28 am

You might know that already, but just in case: there are situations where assembler makes sense and there are others where it doesn't. Your example is a perfect one for the latter.

This simple four-liner does exactly the same thing, just more efficient:

Code: Select all

static inline short_poke(short value, short *dst)
{
    *dst = value;
}


Compiled with "gcc -O2 -fomit-frame-pointer -mshort", It avoids the (unnecessary) link and unlink instructions and compiles to this:

Code: Select all

00000000 <_short_poke>:
   0:   206f 0008         moveal %sp@(6),%a0
   4:   30af 0006         movew %sp@(4),%a0@
   8:   4e75              rts

It doesn't save any registers (since a0-a1 and d0-d1 are scratch for gcc anyway) and does the job as efficient as possible.

Even better, it can (and will) be inlined if you allow gcc to do so. Something you can't do with seperatly assembled object files.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Sun Apr 20, 2014 11:36 am

This is just an example to learn the interfacing :roll: I have a large library of working routines that need integration with gcc.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
mfro
Atari Super Hero
Atari Super Hero
Posts: 802
Joined: Thu Aug 02, 2012 10:33 am
Location: SW Germany

Re: m68k-atari.mint-gcc: stack frame structure

Postby mfro » Sun Apr 20, 2014 12:01 pm

simonsunnyboy wrote:But what directive is used for gcc in its place?


".struct" is basically the same thing in gas.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Sun Apr 20, 2014 4:36 pm

mfro wrote:
simonsunnyboy wrote:But what directive is used for gcc in its place?


".struct" is basically the same thing in gas.



I tried something like this here (http://linux.web.cern.ch/linux/scientif ... truct.html) this morning. It works now but it is tricky as the section must be changed, otherwise the .o file contains no usable code. This .struct reference absolute memory.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
mfro
Atari Super Hero
Atari Super Hero
Posts: 802
Joined: Thu Aug 02, 2012 10:33 am
Location: SW Germany

Re: m68k-atari.mint-gcc: stack frame structure

Postby mfro » Mon Apr 21, 2014 8:47 am

simonsunnyboy wrote:This is just an example to learn the interfacing :roll: I have a large library of working routines that need integration with gcc.


I see and that's perfectly reasonable. I probably wasn't specific enough about what I wanted to tell you.

Since (at least if I remember right) in the very beginning of the discussion, you were the one to complain about gcc's inefficient behaviour to pass parameters on the stack, I wanted to provide you directions to enable gcc to optimize that deficit away for fast, efficient code.

An assembler coded library (at least if it provides relatively small routines where the overhead of parameter passing is significant) is inappropriate for that, IMHO. Gcc can't optimize library calls since it would be needed to do that at link time (not possible, at least not in the version we have available for m68k). Later gcc versions support LTO (link time optimization) which will make this easier.

One of gcc's superior strengths (probably _the_ one) over traditional Atari compilers is the ability of inlining. With library calls, you eliminate this strength.
Suppose the following code (idea taken from your example):

Code: Select all

void short_poke(short value, short *dst)
{
    *dst = value;
}

void call_poke_repeatedly(short *buffer)
{
    short *a = buffer;
    short *b = buffer + 1;
    short *c = buffer + 2;
    short *d = buffer + 3;

    short_poke(0xffff, a);
    short_poke(0xff00, b);
    short_poke(0x00ff, c);
    short_poke(0xff00, d);
}


if you write short_poke() in a separate compilation unit (no matter if you've written it in assembly or in C, as a separate object file or included in a library), call_poke_repeatedly() will compile into something like this:

Code: Select all

000008ae <_call_poke_repeatedly>:
 8ae:   4fef fff0       lea %sp@(-16),%sp
 8b2:   2f6f 0014 000c  movel %sp@(20),%sp@(12)
 8b8:   202f 0014       movel %sp@(20),%d0
 8bc:   5480            addql #2,%d0
 8be:   2f40 0008       movel %d0,%sp@(8)
 8c2:   202f 0014       movel %sp@(20),%d0
 8c6:   5880            addql #4,%d0
 8c8:   2f40 0004       movel %d0,%sp@(4)
 8cc:   202f 0014       movel %sp@(20),%d0
 8d0:   5c80            addql #6,%d0
 8d2:   2e80            movel %d0,%sp@
 8d4:   2f2f 000c       movel %sp@(12),%sp@-
 8d8:   4878 ffff       pea ffffffff <___FUNCTION__.1784+0xfffff067>
 8dc:   4eb9 0000 0000  jsr _short_poke
 8e2:   508f            addql #8,%sp
 8e4:   2f2f 0008       movel %sp@(8),%sp@-
 8e8:   4878 ff00       pea ffffff00 <___FUNCTION__.1784+0xffffef68>
 8ec:   4eb9 0000 0000  jsr _short_poke
 8f2:   508f            addql #8,%sp
 8f4:   2f2f 0004       movel %sp@(4),%sp@-
 8f8:   4878 00ff       pea ff <.LBB3+0x5>
 8fc:   4eb9 0000 0000  jsr _short_poke
 902:   508f            addql #8,%sp
 904:   2f17            movel %sp@,%sp@-
 906:   4878 ff00       pea ffffff00 <___FUNCTION__.1784+0xffffef68>
 90a:   4eb9 0000 0000  jsr 0 _short_poke
 910:   508f            addql #8,%sp
 912:   4fef 0010       lea %sp@(16),%sp
 916:   4e75            rts


which is (obviously) pretty inefficient, indeed.

If you show gcc the source of short_poke() (no matter if it's pure C or inline assembly, nor if it's in the same .c/.S file or in a separate header defined as inline function) when compiling call_poke_repeatedly(), things change dramatically, however:

Code: Select all

0000026c <_call_poke_repeatedly>:
 26c:   206f 0004       moveal %sp@(4),%a0
 270:   30bc ffff       movew #-1,%a0@
 274:   317c ff00 0002  movew #-256,%a0@(2)
 27a:   317c 00ff 0004  movew #255,%a0@(4)
 280:   317c ff00 0006  movew #-256,%a0@(6)
 286:   4e75            rts


which is about as efficient as you can get since gcc can inline the function into the call, effectively eliminating function call overhead completely. People tend to think inlining would cause code bloat, which is not necessarily the case, as this example shows. Gcc usually does a pretty good job in practice judging about what to inline and what not - provided it has the chance to do so.

Bottom line is: if you want fast, efficient code, make sure you always show "the whole thing" to gcc - at least at times when speed matters - to enable it to do proper inlining.
Last edited by mfro on Mon Apr 21, 2014 9:38 am, edited 1 time in total.

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: m68k-atari.mint-gcc: stack frame structure

Postby dml » Mon Apr 21, 2014 9:07 am

mfro is right of course.

The potential gain from ASM based libraries (or any ASM called from C) is not just based on the speed of the ASM relative to C so much as the amount you can amortize via the ASM. That includes the cost of calling it. If the ASM is small, almost the entire cost will be from calling it and that defeats the point.

So the more ASM you have executing per call, the more you stand to amortize. Making a pile of short duration calls to ASM will always be slower than letting the compiler just make all the code for you (if done with care, and allowed to inline), no matter what the ASM does.

Inline ASM syntax is a bit better in this respect, since GCC does provide a decent way (difficult to learn and master, but decent) to inline bits of ASM into C while amortizing the overhead of switching between them. But it's unwieldy for anything bigger than small fragments.

Fortunately, big fragments can amortize the cost of calls to an external lib. So by making good choices about what exists as C (complex, fragile logic), what might best be done with inline ASM (simple, frequently used ops which the compiler can't usually beat - e.g. fixedpoint arithmetic), and what can be written as an ASM lib (heavy lifting, long duration work), you can avoid performance problems with nearly all cases.

The other tricks to optimize calls across the boundary will help in the corner cases that don't quite fit any of the above.


It's worth practicing with a bit of each to see what works best.

I'm guessing your ASM lib will end up containing a mix of heavy lifting code, and hardware-level or time-critical code. There will likely also be a mix of stuff which isn't best done as ASM, but is still in there for the sake of completeness and having the lib make sense as a unit. That's usually how things work out.

User avatar
simonsunnyboy
Moderator
Moderator
Posts: 5101
Joined: Wed Oct 23, 2002 4:36 pm
Location: Friedrichshafen, Germany
Contact:

Re: m68k-atari.mint-gcc: stack frame structure

Postby simonsunnyboy » Mon Apr 21, 2014 6:18 pm

It's mainly existing interrupt routines. I will not do a C code rewrite of a Protracker replay I don't understand just to have gcc doing a pretty good optimizing job.
I wouldn't ask so many gcc questions if I wasn't convinced by now that it is a toolchain worth picking up ;)

Anopther point is I simply do not wan tto rewrite yet another time a lot of helpers when i haven't integrated into something workable for ages. My last complete Atari project is from 2009 and I'm moving to C since then.....and spare time is at a premium. So i want to reuse the work I have already done. Some parts can be rewritten, yes, the IKBD interrupt routines and Wizzcat replayer are not suited for this.
Simon Sunnyboy/Paradize - http://paradize.atari.org/

Stay cool, stay Atari!

1x2600jr, 1x1040STFm, 1x1040STE 4MB+TOS2.06+SatanDisk, 1xF030 14MB+FPU+NetUS-Bee

User avatar
dml
Fuji Shaped Bastard
Fuji Shaped Bastard
Posts: 3474
Joined: Sat Jun 30, 2012 9:33 am

Re: m68k-atari.mint-gcc: stack frame structure

Postby dml » Thu Apr 24, 2014 6:29 pm

BTW there are numerous ways to call ASM from C using GCC's features. Here's another variant - using inline ASM block performing a JSR to the ASM code, with C variable inputs and register clobber constraints (lifted from one of my projects as an example, albeit maybe not a clear one :) ).

Nothing gets passed on the stack, but in registers - in this case loaded from the existing stackframe/memory ("m", but could be from other registers based on opportunity - "g" for general purpose registers, "d" for data registers, "a" for address registers - but being too specific means more chance of causing register starvation so best leave the compiler room or estimate the info is probably living on the stack).

The callee is responsible for preserving any registers not mentioned in the block. It's still not as optimal as pure ASM because the initial moves might be redundant in some cases if the values are already in registers, but chances are registers are busy and its a small price anyway if you don't want to redo all the caller code in ASM...

Code: Select all

            __asm__ __volatile__ (
               "                           \
               move.l      %0,a0;               \
               move.l      %1,d6;               \
               move.l      %2,d7;               \
               jsr         _BM_A_RMux1x2;         \
               "
                 :
               : "m"(ch0),
                 "m"(buffer),
                 "m"(microblocksize)
               : "a0",
                 "d6", "d7",
                 "cc"
            );


Once again, this can still be beaten by inline C functions if the compiler manages to produce good enough code. In my case, this function does intensive work in small bursts and the compiled code isn't as good as the ASM, so the arrangement paid off.

More optimal is doing the whole routine as inline ASM, but it is not a comfortable syntax, needing quotes and line continuation '\' symbols etc. everywhere. Handy, effective - but not nice to work with. I use that as a last resort or for very small fragments, but mostly try to keep assembly code in separate sourcefiles, especially if there is a lot of related code.


Social Media

     

Return to “C / PASCAL etc.”

Who is online

Users browsing this forum: No registered users and 2 guests