This is just an example to learn the interfacing
I have a large library of working routines that need integration with gcc.
I see and that's perfectly reasonable. I probably wasn't specific enough about what I wanted to tell you.
Since (at least if I remember right) in the very beginning of the discussion, you were the one to complain about gcc's inefficient behaviour to pass parameters on the stack, I wanted to provide you directions to enable gcc to optimize that deficit away for fast, efficient code.
An assembler coded library (at least if it provides relatively small routines where the overhead of parameter passing is significant) is inappropriate for that, IMHO. Gcc can't optimize library calls since it would be needed to do that at link time (not possible, at least not in the version we have available for m68k). Later gcc versions support LTO (link time optimization) which will make this easier.
One of gcc's superior strengths (probably _the_ one) over traditional Atari compilers is the ability of inlining. With library calls, you eliminate this strength.
Suppose the following code (idea taken from your example):
Code: Select all
void short_poke(short value, short *dst)
*dst = value;
void call_poke_repeatedly(short *buffer)
short *a = buffer;
short *b = buffer + 1;
short *c = buffer + 2;
short *d = buffer + 3;
if you write short_poke() in a separate compilation unit (no matter if you've written it in assembly or in C, as a separate object file or included in a library), call_poke_repeatedly() will compile into something like this:
Code: Select all
8ae: 4fef fff0 lea %sp@(-16),%sp
8b2: 2f6f 0014 000c movel %sp@(20),%sp@(12)
8b8: 202f 0014 movel %sp@(20),%d0
8bc: 5480 addql #2,%d0
8be: 2f40 0008 movel %d0,%sp@(8)
8c2: 202f 0014 movel %sp@(20),%d0
8c6: 5880 addql #4,%d0
8c8: 2f40 0004 movel %d0,%sp@(4)
8cc: 202f 0014 movel %sp@(20),%d0
8d0: 5c80 addql #6,%d0
8d2: 2e80 movel %d0,%sp@
8d4: 2f2f 000c movel %sp@(12),%sp@-
8d8: 4878 ffff pea ffffffff <___FUNCTION__.1784+0xfffff067>
8dc: 4eb9 0000 0000 jsr _short_poke
8e2: 508f addql #8,%sp
8e4: 2f2f 0008 movel %sp@(8),%sp@-
8e8: 4878 ff00 pea ffffff00 <___FUNCTION__.1784+0xffffef68>
8ec: 4eb9 0000 0000 jsr _short_poke
8f2: 508f addql #8,%sp
8f4: 2f2f 0004 movel %sp@(4),%sp@-
8f8: 4878 00ff pea ff <.LBB3+0x5>
8fc: 4eb9 0000 0000 jsr _short_poke
902: 508f addql #8,%sp
904: 2f17 movel %sp@,%sp@-
906: 4878 ff00 pea ffffff00 <___FUNCTION__.1784+0xffffef68>
90a: 4eb9 0000 0000 jsr 0 _short_poke
910: 508f addql #8,%sp
912: 4fef 0010 lea %sp@(16),%sp
916: 4e75 rts
which is (obviously) pretty inefficient, indeed.
If you show gcc the source of short_poke() (no matter if it's pure C or inline assembly, nor if it's in the same .c/.S file or in a separate header defined as inline function) when compiling call_poke_repeatedly(), things change dramatically, however:
Code: Select all
26c: 206f 0004 moveal %sp@(4),%a0
270: 30bc ffff movew #-1,%a0@
274: 317c ff00 0002 movew #-256,%a0@(2)
27a: 317c 00ff 0004 movew #255,%a0@(4)
280: 317c ff00 0006 movew #-256,%a0@(6)
286: 4e75 rts
which is about as efficient as you can get since gcc can inline the function into the call, effectively eliminating function call overhead completely. People tend to think inlining would cause code bloat, which is not necessarily the case, as this example shows. Gcc usually does a pretty good job in practice judging about what to inline and what not - provided it has the chance to do so.
Bottom line is: if you want fast, efficient code, make sure you always show "the whole thing" to gcc - at least at times when speed matters - to enable it to do proper inlining.