Cycle counts and phasing: tables and tester

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Cycle counts and phasing: tables and tester

Postby Dio » Wed Mar 25, 2009 9:36 pm

I've been working on an an automated tester for checking emulators' 68000 timing very precisely. I've finally solved a few of the odder issues and here's the first set of results. There's two tables here:
- the first is the cycle count for each 68000 instruction when the previous and next instructions don't do anything odd with phasing, and all memory operations are to main RAM (the shifter domain, subject to waitstates).
- The second describes the phasing effects for each instruction. An 'L' indicates that the instruction finishes on the odd phase ('leaves' odd phase); an 'A' indicates the instruction pairs with a preceding instruction if it starts on the odd phase ('accepts' odd phase). The simple rule is that if an A instruction follows an L instruction in your instruction stream you save 4 cycles.

Here's the tables of gold results:

Code: Select all

Name                : Dn   An  (A)  (A)+ -(A) $(A) I(A)  .W   .L  $(P) I(P)  #
add.w *,d0             4    4    8    8   12   12   16   12   16   12   16    8
add.w d0,*                      12   12   16   16   20   16   20
adda.w *,a1            8    8   12   12   16   16   20   16   20   16   20   12
add.w #1,*             8        16   16   20   20   24   20   24
addq.w #1,*            4    8   12   12   16   16   20   16   20
addx.w *,*             4                  20
and.w *,d0             4         8    8   12   12   16   12   16   12   16    8
and.w d0,*                      12   12   16   16   20   16   20
and.w #1,*             8        16   16   20   20   24   20   24
asl.w #7,d0           20
asl.w d0,d0           16
asr.w #7,d0           20
asr.w d0,d0           16
clr.w *                4        12   12   16   16   20   16   20
cmp.w *,d0             4    4    8    8   12   12   16   12   16   12   16    8
cmpa.w *,a1            8    8   12   12   16   16   20   16   20   16   20   12
cmp.w #1,*             8        12   12   16   16   20   16   20
eor.w d0,*             4        12   12   16   16   20   16   20
eor.w #1,*             8        16   16   20   20   24   20   24
lsl.w #7,d0           20
lsl.w d0,d0           16
lsr.w #7,d0           20
lsr.w d0,d0           16
move.w *,d0            4    4    8    8   12   12   16   12   16   12   16    8
move.w *,a1            4    4    8    8   12   12   16   12   16   12   16    8
move.w *,(a1)          8    8   12   12   16   16   20   16   20   16   20   12
move.w *,(a1)+         8    8   12   12   16   16   20   16   20   16   20   12
move.w *,-(a1)         8    8   12   12   16   16   20   16   20   16   20   12
move.w *,24(a1)       12   12   16   16   20   20   24   20   24   20   24   16
move.w *,20(a1,d0.w)  16   16   20   20   24   24   28   24   28   24   28   20
move.w *,$200.w       12   12   16   16   20   20   24   20   24   20   24   16
move.w *,scratchpad   16   16   20   20   24   24   28   24   28   24   28   20
movem.w *,d0-d3                 28   28        32   36   32   36
movem.w d0-d3,*                 24        24   28   32   28   32
movep.w d0,4(a1)      16
movep.w 4(a0),d0      16
neg.w *                4        12   12   16   16   20   16   20
negx.w *               4        12   12   16   16   20   16   20
not.w *                4        12   12   16   16   20   16   20
or.w *,d0              4         8    8   12   12   16   12   16   12   16    8
or.w d0,*                       12   12   16   16   20   16   20
or.w #1,*              8        16   16   20   20   24   20   24
rol.w #7,d0           20
rol.w d0,d0           16
ror.w #7,d0           20
ror.w d0,d0           16
roxl.w #7,d0          20
roxl.w d0,d0          16
roxr.w #7,d0          20
roxr.w d0,d0          16
sub.w *,d0             4    4    8    8   12   12   16   12   16   12   16    8
sub.w d0,*                      12   12   16   16   20   16   20
suba.w *,a1            8    8   12   12   16   16   20   16   20   16   20   12
sub.w #1,*             8        16   16   20   20   24   20   24
subq.w #1,*            4    8   12   12   16   16   20   16   20
subx.w *,*             4                  20
tst.w *                4         8    8   12   12   16   12   16
add.l *,d0             8    8   16   16   20   20   24   20   24   20   24   16
add.l d0,*                      20   20   24   24   28   24   28
adda.l *,a1            8    8   16   16   20   20   24   20   24   20   24   16
add.l #1,*            16        28   28   32   32   36   32   36
addq.l #1,*            8    8   20   20   24   24   28   24   28
addx.l *,*             8                  32
and.l *,d0             8        16   16   20   20   24   20   24   20   24   16
and.l d0,*                      20   20   24   24   28   24   28
and.l #1,*            16        28   28   32   32   36   32   36
asl.l #7,d0           24
asl.l d0,d0           16
asr.l #7,d0           24
asr.l d0,d0           16
clr.l *                8        20   20   24   24   28   24   28
cmp.l *,d0             8    8   16   16   20   20   24   20   24   20   24   16
cmpa.l *,a1            8    8   16   16   20   20   24   20   24   20   24   16
cmp.l #1,*            16        20   20   24   24   28   24   28
eor.l d0,*             8        20   20   24   24   28   24   28
eor.l #1,*            16        28   28   32   32   36   32   36
lsl.l #7,d0           24
lsl.l d0,d0           16
lsr.l #7,d0           24
lsr.l d0,d0           16
move.l *,d0            4    4   12   12   16   16   20   16   20   16   20   12
move.l *,a1            4    4   12   12   16   16   20   16   20   16   20   12
move.l *,(a1)         12   12   20   20   24   24   28   24   28   24   28   20
move.l *,(a1)+        12   12   20   20   24   24   28   24   28   24   28   20
move.l *,-(a1)        12   12   20   20   24   24   28   24   28   24   28   20
move.l *,24(a1)       16   16   24   24   28   28   32   28   32   28   32   24
move.l *,20(a1,d0.l)  20   20   28   28   32   32   36   32   36   32   36   28
move.l *,$200.l       20   20   28   28   32   32   36   32   36   32   36   28
move.l *,scratchpad   20   20   28   28   32   32   36   32   36   32   36   28
movem.l *,d0-d3                 44   44        48   52   48   52
movem.l d0-d3,*                 40        40   44   48   44   48
movep.l d0,4(a1)      24
movep.l 4(a0),d0      24
neg.l *                8        20   20   24   24   28   24   28
negx.l *               8        20   20   24   24   28   24   28
not.l *                8        20   20   24   24   28   24   28
or.l *,d0              8        16   16   20   20   24   20   24   20   24   16
or.l d0,*                       20   20   24   24   28   24   28
or.l #1,*             16        28   28   32   32   36   32   36
rol.l #7,d0           24
rol.l d0,d0           16
ror.l #7,d0           24
ror.l d0,d0           16
roxl.l #7,d0          24
roxl.l d0,d0          16
roxr.l #7,d0          24
roxr.l d0,d0          16
sub.l *,d0             8    8   16   16   20   20   24   20   24   20   24   16
sub.l d0,*                      20   20   24   24   28   24   28
suba.l *,a1            8    8   16   16   20   20   24   20   24   20   24   16
sub.l #1,*            16        28   28   32   32   36   32   36
subq.l #1,*            8    8   20   20   24   24   28   24   28
subx.l *,*             8                  32
tst.l *                4        12   12   16   16   20   16   20
asl.w #1,*             8        12   12   16   16   20   16   20
asr.w #1,*             8        12   12   16   16   20   16   20
lsl.w #1,*             8        12   12   16   16   20   16   20
lsr.w #1,*             8        12   12   16   16   20   16   20
rol.w #1,*             8        12   12   16   16   20   16   20
ror.w #1,*             8        12   12   16   16   20   16   20
roxl.w #1,*            8        12   12   16   16   20   16   20
roxr.w #1,*            8        12   12   16   16   20   16   20
bchg #1,*             12        16   16   20   20   24   20   24
bchg d0,*              8        12   12   16   16   20   16   20
bset #1,*             12        16   16   20   20   24   20   24
bset d0,*              8        12   12   16   16   20   16   20
bclr #1,*             12        16   16   20   20   24   20   24
bclr d0,*              8        12   12   16   16   20   16   20
btst #1,*             12        12   12   16   16   20   16   20
btst d0,*              8         8    8   12   12   16   12   16
pea *                           12             16   24   16   20   16   24
lea *,a1                         4              8   16    8   12    8   16
link a0,#4            16
unlk a0               12
mulu *,d0             40        64   64   48   68   76   72   72   48   52   44
muls *,d0             44        52   52   48   68   64   60   60   48   52   48
abcd *,*               8                  20
nbcd *                 8        12   12   16   16   20   16   20
sbcd *,*               8                  20
st *                   8        12   12   16   16   20   16   20
tas *                  4        16   16   20   20   24   20   24
moveq #0,d0            4
exg d0,d1              8
ext.w d0               4
ext.l d0               4
swap d0                4
nop                    4
move.w sr,*            8        12   12   16   16   20   16   20
move.w *,ccr          12        16   16   20   20   24   20   24   20   24   16
move.w #$2200,sr      16
andi #$2700,sr        20
ori #$2200,sr         20
eori #$0100,sr        20

unlk a0               12
bra.l/.s              12   12
bcc.l/.s taken        12   12
bcs.l/.s not          12    8
bsr.l/.s              20   20
dbcc cc true,nz,z     12   12   16
jmp                              8             12   16   12   12   12
jsr                             16             20   24   20   20   20
rts/rte               16   20
trap/illegal          36   36

INSTRUCTION PAIRING:
add.w *,d0            --   --   --   --   A-   --   A-   --   --   --   A-   --
add.w d0,*                      --   --   A-   --   A-   --   --
adda.w *,a1           --   --   --   --   A-   --   A-   --   --   --   A-   --
add.w #1,*            --        --   --   --   --   --   --   --
addq.w #1,*           --   --   --   --   A-   --   A-   --   --
addx.w *,*            --                  A-
and.w *,d0            --        --   --   A-   --   A-   --   --   --   A-   --
and.w d0,*                      --   --   A-   --   A-   --   --
and.w #1,*            --        --   --   --   --   --   --   --
asl.w #7,d0           --
asl.w d0,d0           -L
asr.w #7,d0           --
asr.w d0,d0           -L
clr.w *               --        --   --   A-   --   A-   --   --
cmp.w *,d0            --   --   --   --   A-   --   A-   --   --   --   A-   --
cmpa.w *,a1           -L   -L   -L   -L   AL   -L   AL   -L   -L   -L   AL   -L
cmp.w #1,*            --        --   --   --   --   --   --   --
eor.w d0,*            --        --   --   A-   --   A-   --   --
eor.w #1,*            --        --   --   --   --   --   --   --
lsl.w #7,d0           --
lsl.w d0,d0           -L
lsr.w #7,d0           --
lsr.w d0,d0           -L
move.w *,d0           --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,a1           --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,(a1)         --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,(a1)+        --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,-(a1)        --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,24(a1)       --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,20(a1,d0.w)  A-   A-   --   --   A-   --   A-   --   --   --   A-   --
move.w *,$200.w       --   --   --   --   A-   --   A-   --   --   --   A-   --
move.w *,scratchpad   --   --   --   --   A-   --   A-   --   --   --   A-   --
movem.w *,d0-d3                 --   --        --   --   --   --
movem.w d0-d3,*                 --        --   --   --   --   --
movep.w d0,4(a1)      --
movep.w 4(a0),d0      --
neg.w *               --        --   --   A-   --   A-   --   --
negx.w *              --        --   --   A-   --   A-   --   --
not.w *               --        --   --   A-   --   A-   --   --
or.w *,d0             --        --   --   A-   --   A-   --   --   --   A-   --
or.w d0,*                       --   --   A-   --   A-   --   --
or.w #1,*             --        --   --   --   --   --   --   --
rol.w #7,d0           --
rol.w d0,d0           -L
ror.w #7,d0           --
ror.w d0,d0           -L
roxl.w #7,d0          --
roxl.w d0,d0          -L
roxr.w #7,d0          --
roxr.w d0,d0          -L
sub.w *,d0            --   --   --   --   A-   --   A-   --   --   --   A-   --
sub.w d0,*                      --   --   A-   --   A-   --   --
suba.w *,a1           --   --   --   --   A-   --   A-   --   --   --   A-   --
sub.w #1,*            --        --   --   --   --   --   --   --
subq.w #1,*           --   --   --   --   A-   --   A-   --   --
subx.w *,*            --                  A-
tst.w *               --        --   --   A-   --   A-   --   --
add.l *,d0            --   --   -L   -L   AL   -L   AL   -L   -L   -L   AL   --
add.l d0,*                      --   --   A-   --   A-   --   --
adda.l *,a1           --   --   -L   -L   AL   -L   AL   -L   -L   -L   AL   --
add.l #1,*            --        --   --   --   --   --   --   --
addq.l #1,*           --   --   --   --   A-   --   A-   --   --
addx.l *,*            --                  A-
and.l *,d0            --        -L   -L   AL   -L   AL   -L   -L   -L   AL   --
and.l d0,*                      --   --   A-   --   A-   --   --
and.l #1,*            --        --   --   --   --   --   --   --
asl.l #7,d0           -L
asl.l d0,d0           --
asr.l #7,d0           -L
asr.l d0,d0           --
clr.l *               -L        --   --   A-   --   A-   --   --
cmp.l *,d0            -L   -L   -L   -L   AL   -L   AL   -L   -L   -L   AL   -L
cmpa.l *,a1           -L   -L   -L   -L   AL   -L   AL   -L   -L   -L   AL   -L
cmp.l #1,*            -L        --   --   --   --   --   --   --
eor.l d0,*            --        --   --   A-   --   A-   --   --
eor.l #1,*            --        --   --   --   --   --   --   --
lsl.l #7,d0           -L
lsl.l d0,d0           --
lsr.l #7,d0           -L
lsr.l d0,d0           --
move.l *,d0           --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,a1           --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,(a1)         --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,(a1)+        --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,-(a1)        --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,24(a1)       --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,20(a1,d0.l)  A-   A-   --   --   A-   --   A-   --   --   --   A-   --
move.l *,$200.l       --   --   --   --   A-   --   A-   --   --   --   A-   --
move.l *,scratchpad   --   --   --   --   A-   --   A-   --   --   --   A-   --
movem.l *,d0-d3                 --   --        --   --   --   --
movem.l d0-d3,*                 --        --   --   --   --   --
movep.l d0,4(a1)      --
movep.l 4(a0),d0      --
neg.l *               -L        --   --   A-   --   A-   --   --
negx.l *              -L        --   --   A-   --   A-   --   --
not.l *               -L        --   --   A-   --   A-   --   --
or.l *,d0             --        -L   -L   AL   -L   AL   -L   -L   -L   AL   --
or.l d0,*                       --   --   A-   --   A-   --   --
or.l #1,*             --        --   --   --   --   --   --   --
rol.l #7,d0           -L
rol.l d0,d0           --
ror.l #7,d0           -L
ror.l d0,d0           --
roxl.l #7,d0          -L
roxl.l d0,d0          --
roxr.l #7,d0          -L
roxr.l d0,d0          --
sub.l *,d0            --   --   -L   -L   AL   -L   AL   -L   -L   -L   AL   --
sub.l d0,*                      --   --   A-   --   A-   --   --
suba.l *,a1           --   --   -L   -L   AL   -L   AL   -L   -L   -L   AL   --
sub.l #1,*            --        --   --   --   --   --   --   --
subq.l #1,*           --   --   --   --   A-   --   A-   --   --
subx.l *,*            --                  A-
tst.l *               --        --   --   A-   --   A-   --   --
asl.w #1,*            --        --   --   A-   --   A-   --   --
asr.w #1,*            --        --   --   A-   --   A-   --   --
lsl.w #1,*            --        --   --   A-   --   A-   --   --
lsr.w #1,*            --        --   --   A-   --   A-   --   --
rol.w #1,*            --        --   --   A-   --   A-   --   --
ror.w #1,*            --        --   --   A-   --   A-   --   --
roxl.w #1,*           --        --   --   A-   --   A-   --   --
roxr.w #1,*           --        --   --   A-   --   A-   --   --
bchg #1,*             -L        --   --   --   --   --   --   --
bchg d0,*             -L        --   --   A-   --   A-   --   --
bset #1,*             -L        --   --   --   --   --   --   --
bset d0,*             -L        --   --   A-   --   A-   --   --
bclr #1,*             --        --   --   --   --   --   --   --
bclr d0,*             --        --   --   A-   --   A-   --   --
btst #1,*             -L        --   --   --   --   --   --   --
btst d0,*             -L        --   --   A-   --   A-   --   --
pea *                           --             --   A-   --   --   --   A-
lea *,a1                        --             --   A-   --   --   --   A-
link a0,#4            --
unlk a0               --
mulu *,d0             --        -L   -L   AL   --   A-   --   -L   -L   AL   --
muls *,d0             -L        -L   -L   AL   -L   AL   -L   -L   -L   AL   -L
abcd *,*              -L                  A-
nbcd *                -L        --   --   A-   --   A-   --   --
sbcd *,*              -L                  A-
st *                  -L        --   --   A-   --   A-   --   --
tas *                 --        --   --   A-   --   A-   --   --
moveq #0,d0           --
exg d0,d1             -L
ext.w d0              --
ext.l d0              --
swap d0               --
nop                   --
move.w sr,*           -L        --   --   A-   --   A-   --   --
move.w *,ccr          --        --   --   A-   --   A-   --   --   --   A-   --
move.w #$2200,sr      --
andi #$2700,sr        --
ori #$2200,sr         --
eori #$0100,sr        --


These tables were generated on an STE. The first table has been cross-checked on an STFM and found to be the same, despite slightly different hardware. (The $FF8209 register seems to be in the CPU domain on STFM. On the STE it appears to be in the shifter domain, so subject to phase waitstates, and in addition if you read it on the odd phase you see random screwy results: I'm not sure if these are extra waitstates or a bad value, perhaps due to the register being read and written simultaneously). I haven't checked the second table on an FM yet.

STeem is very accurate on the base cycle counts, with only a very few small differences (negx and tas to mem). It is less accurate on pairing. It gets most of the major cases right but there are plenty of instructions with small differences.

The tester automatically marks any instructions that don't match the gold results from a real machine. I'm expanding it to check timing and exact stack contents of bus and address errors, after which I'll be releasing it for people to check their emulators with.

If anyone spots anything grossly odd with these, please let me know - there could still be bugs here I might need to track down.
Last edited by Dio on Thu Mar 26, 2009 9:02 am, edited 2 times in total.

User avatar
npomarede
Atari God
Atari God
Posts: 1312
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Cycle counts and phasing: tables and tester

Postby npomarede » Wed Mar 25, 2009 10:24 pm

Dio wrote:I've been working on an an automated tester for checking emulators' 68000 timing very precisely. I've finally solved a few of the odder issues and here's the first set of results. There's two tables here:
- the first is the cycle count for each 68000 instruction when the previous and next instructions don't do anything odd with phasing.
- The second describes the phasing effects for each instruction. An 'L' indicates that the instruction finishes on the odd phase ('leaves' odd phase); an 'A' indicates the instruction pairs with a following instruction if it starts on the odd phase ('accepts' odd phase). The simple rule is that if your instruction stream follows an L instruction with an A instruction you save 4 cycles.


Nice table, interesting work.
But I'm not sure about the "english" way you explain L and A behaviour. From what I read in your text, this seems to be the opposite of the example you give.
If I understand your table, you mean that LA is pairing (saves 4 cycles) but AL is not pairing ? (for example exg+move could pair, but not move+exg)

In that case, you should replace :
an 'A' indicates the instruction pairs with a following instruction

with
an 'A' indicates the instruction pairs with a preceeding instruction


Also, from my own experience with developing Hatari, a few important ones are missing :
exg + dbcc
cmp + bcc
mul + div (but not div + mul)
... (see Hatari's source if you like :) )

The ones involving "branch" are harder to count using ff8209, but it's feasible with a little more code.

Don't hesitate to run your test code against Hatari, I would be interested to know where some cycles counting could be wrong.

Also, I think the official motorola table are important and should be used too. The ST often rounds cycles to the next multiple of 4.
So, another necessary condition for pairing to work is that both instructions should take 4n+2 cycles in the motorola doc.
If one of the instructions takes 4n+4 cycles, there won't be pairing.


Nicolas

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Wed Mar 25, 2009 10:41 pm

There's a few that are different in the motorola docs - the bit test instructions in particular don't appear to match (12 for set and chg and 14 for clr looks like 10 and 12 to me). With my experience of technical docs, I think I trust the tester :D .

I do plan to test the branch cases, as they are all 10-cycle in the docs, but I need to hand-write tests for those: I've done it for the timing, but not for pairing yet. I also need a larger variety of mul and div operands.

I'm hoping to get an executable anyone can run out in the next few days. In the long run I'll be opening the source so people can do what they want with it.

Good point on confusing myself with the description of A. I've edited the notes accordingly.

User avatar
npomarede
Atari God
Atari God
Posts: 1312
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Cycle counts and phasing: tables and tester

Postby npomarede » Wed Mar 25, 2009 11:03 pm

Dio wrote:There's a few that are different in the motorola docs - the bit test instructions in particular don't appear to match (12 for set and chg and 14 for clr looks like 10 and 12 to me). With my experience of technical docs, I think I trust the tester :D .

I do plan to test the branch cases, as they are all 10-cycle in the docs, but I need to hand-write tests for those: I've done it for the timing, but not for pairing yet. I also need a larger variety of mul and div operands.

I'm hoping to get an executable anyone can run out in the next few days. In the long run I'll be opening the source so people can do what they want with it.

Nice, that will be interesting to test.

Good point on confusing myself with the description of A. I've edited the notes accordingly.

I think the sentence
...your instruction stream follows an L instruction with an A instruction...

is also misleading. Perhaps this sounds better
...in your instruction stream, if an A instruction follows an L instruction...



Nicolas

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Thu Mar 26, 2009 9:03 am

Makes sense :) .

ppera

Re: Cycle counts and phasing: tables and tester

Postby ppera » Thu Mar 26, 2009 1:57 pm

Very useful work. I'm waiting for source :D

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Thu Mar 26, 2009 4:18 pm

Cheers. The methodology's fairly simple: wait to the start of the displayed screen, and install the routine to be tested as the HBL handler. The HBL completes just about when the MMU starts running for the line, so a couple of nops, read FF8209, the routine under test, another nop (to sort out the phase issues), read FF8209 again and repeat until we run out of HBLs for the frame. At the end, postprocess them subtracting the nop and the read-to-read time.

The best thing is how easy I'm finding it to leverage it for other uses. The bus and address error stacking is something I've been meaning to do for ages, but the realisation I can time it as well is the icing on the cake. With that I can really probe the microcode and find out exactly what reads and writes are going out on what cycles.

I've got an instruction exerciser as well which I need to clean up and get out there as well. I'm astonished nobody's done this sort of thing before (or if they have, they've sat on it). But various types of CPU verifiers seem to be relatively thin on the ground on the 68k, despite it being probably the single most emulated CPU - I had much more luck with the Z80, and there's a 6502 one out there as well I know.

User avatar
npomarede
Atari God
Atari God
Posts: 1312
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Cycle counts and phasing: tables and tester

Postby npomarede » Thu Mar 26, 2009 11:07 pm

Dio wrote:I've got an instruction exerciser as well which I need to clean up and get out there as well. I'm astonished nobody's done this sort of thing before (or if they have, they've sat on it).


Regarding the methodology used, many people used it some years ago, it's just that as you say no one released their results.
In my case, I wrote such an automatic cycle counter for 68000 using ff8209 approximatively 20 years ago (using profimat as an assembler, which was a real pain compared to devpac later :) ), I saw some code from Nick/TCB doing the same posted somewhere on the forum, also a table from equinox with all the 68000 instructions as measured on a real ST.

Regarding the fact that there were not that many "68000 validator", I think it may be because there're already some very good open source 68000 emulators (UAE/WinUAE and Mame for example) that are really accurate, so I guess no one is feeling like writing his own 68000 emulator, hence the lack of need to test all opcodes (the fact that UAE cpu core runs nearly every code from Amiga or Atari is a kind of validation in its way).

Speaking about WinUAE, its latest cpu core try to support the correct read/write cycle in each instruction. With such a core, it would then be theorically possible to write an "ST" 68000 that would do "automatic" instruction pairing.

But programs like yours are still useful, especially to report less accurate behaviour for some special case (like exception processing, stacking, ...)


Nicolas

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Fri Mar 27, 2009 9:53 pm

I was referring more to automated emulator testers rather than real back-in-the-day stuff. The Z80 thing I was referring to was a program called 'zexall' which would automatically exercise every instruction and check that everything including undocumented flags was on the button. Once I ran that and fixed all the issues, Emu's Spectrum emulation went from a bit ropy to just about perfect. There's a similar thing for the 6502 hosted on C64 as well. But nothing for the 68k that I could find, despite the plethora of 68k emulator cores.

Mawashi is pretty good, but no prefetch makes it of very limited use for the ST. For a long time Emu could run with that or Starscream, but I had to replace with my own to get decent compatibility. Not particularly confident about its timing, either.

What do you mean by automatic instruction pairing? Just timing the reads, writes and prefetches at the subinstrution level? That's how I do it on Emu and it works fine, and as you say the pairing 'just works' assuming I have the phases right (which is what this tester is meant to check :) ). Don't you have to do that to get things like Spectrum 512 to work? I can't think what an easier alternative might be - are people building big tables for this stuff?

User avatar
npomarede
Atari God
Atari God
Posts: 1312
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Cycle counts and phasing: tables and tester

Postby npomarede » Fri Mar 27, 2009 10:12 pm

Dio wrote:What do you mean by automatic instruction pairing? Just timing the reads, writes and prefetches at the subinstrution level? That's how I do it on Emu and it works fine. Don't you have to do that to get things like Spectrum 512 to work? I can't think what an easier alternative might be - are people building big tables for this stuff?


Yes, Hatari is running without complete subinstruction emulation of the read/write cycles. Although this would be better for accurary, it turns out that in fact you just need to have a good behaviour for 'move' instruction (and a few other ones) and nearly all fullscreen/spectrum effects will work.
At least this is how it's done in Hatari and so far it works with all the cases I encountered.

As for pairing, I'm also using some handmade table to specify the possible opcodes families that could pair. This is not an exhaustive list, just a list required to make all demos/games work, and it's very easy to add new cases.
So for Hatari, we're using tables, and in fact they're not that big, 50 entries or so is enough.

Adding read/write at the opcode level could be nice, as it would make a lot of things automatic, but it would also come at the cost of a little slowdown in emulation. Also Hatari is aiming at 68030+ emulation (falcon) too, so we need to have a cpu core that fits all case and stay maintanable and not too 68000 only specific.

Nicolas

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Fri Mar 27, 2009 10:59 pm

How accurate's your timing for the 030? I remember back when I was coding on the Falcon Motorola pretty much flat refused to provide any timing information for the thing. If I ever get as far as Falcon support for Emu I'd probably use a completely different core for the 030, the prefetching, caching, bursting and MMU are maybe worth a different approach.

That's a thought: does Starball (game I coded) work in Falcon mode on it?

User avatar
Frank B
Atari God
Atari God
Posts: 1012
Joined: Wed Jan 04, 2006 1:28 am
Location: Boston

Re: Cycle counts and phasing: tables and tester

Postby Frank B » Sat Mar 28, 2009 9:41 am

Dio wrote:How accurate's your timing for the 030? I remember back when I was coding on the Falcon Motorola pretty much flat refused to provide any timing information for the thing. If I ever get as far as Falcon support for Emu I'd probably use a completely different core for the 030, the prefetching, caching, bursting and MMU are maybe worth a different approach.

That's a thought: does Starball (game I coded) work in Falcon mode on it?


The timing for the 68030 is documented fully in the Motorola 68030 user's manual and not the programmer's reference manual. Not exactly obvious! The manual is available on the freescale site. Hope this helps.

Frank

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Sat Mar 28, 2009 2:25 pm

Yeah, after I wrote that last night I dug it out and had a read. I'm sure that wasn't published in the older days, though. But the reason given was that it was too complicated to conveniently describe, and after having read the UM they've got a point. (More to the point may be that the per-instruction cycle counts don't look to compare favourably with the 386...)

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Mon Mar 30, 2009 7:30 pm

The BAE tester is starting to produce useful results at last, for address operands at least. Here's the results from some address errors on an STE:

Code: Select all

Name                : Dn   An  (A)  (A)+ -(A) $(A) I(A)  .W   .L  $(P) I(P)  #
move.w *,d0            -    -   56   56   60   60   64    -    -    -    -    -
                               1/2  1/2 -1/4  3/2  7/2                         
                               RI5  RI5  RI5  RI5  RI5                         
                              3010 3018 3020 3028 3030                         
movem.w *,d0-d3                 60   60        64   68    -    -    -    -
                               1/6  1/6       3/8  7/4                   
                               RI5  RI5       RI5  RI5                         
                              4C90 4C98      4CA8 4CB0                   
movem.w d0-d3,*                 60        60   64   68    -    -
                               1/6      -1/6  3/8  7/8         
                               WI5       WI5  WI5  WI5                         
                              4890      48A0 48A8 48B0         

Rows are cycles, fault and PC offsets, type code and IR.

Note how the stacked PC reveals different behaviour on predecrement. Also seems to imply that the movem does all the prefetches before reading, which would explain why I had to hack the bus error address to get Blood Money to work, as my movem currently issues the final prefetch after all the reads. Also looks like on the move the source read might happen before any prefetches caused by extension (or perhaps the AE is tested before the move actually starts - the bus error results should reveal that).

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Fri Apr 03, 2009 11:11 am

Got full results for the bus and address errors now. The full set of tables is 150k, presumably too big to bang in here? There's all sorts of fascinating details in there:

- Bus errors look to be generated by a watchdog timer in glue - my estimation is that if nothing responds to a DTACK within 64 cycles it generates BERR (the timings for BEs look to be exactly 68 cycles ahead of the AE, so probably 64 watchdog plus 4 cycles for the bus transaction). It manages to screw up my timing in a few cases by making the routine take longer than a full HBL, but fortunately most of them just slip under the bar. A further experiment to see if stuff that's only a BE in user mode has the same timing would be nice, but it's a lot more difficult.

- The exact PC stacked on bus or address error is complex. It does appear to be consistent between the two, and the address stacked is sometimes revealing of the prefetch pattern and sometimes just plain confusing :) . It looks like some types of extension word cause the PC to update (abs W and abs L) but some don't (displacement and indexed). Still trying to work out the exact algorithm.

- The stacked IR reflects the following instruction for move.w x,-(a1) only. Presumably the prefetch has already completed and transferred to IR - but why it doesn't happen for the long move is intriguing.

- Predecrement reads appear to work lower address, higher address for reads, but higher address, lower address for writes.

- Both Emu and Steem are a horrible mess accuracy-wise :) .

I should have some preliminary source and object code by the end of the weekend if my son and the footy allow.

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Re: Cycle counts and phasing: tables and tester

Postby Nyh » Mon Apr 06, 2009 12:40 pm

Dio wrote:- The exact PC stacked on bus or address error is complex. It does appear to be consistent between the two, and the address stacked is sometimes revealing of the prefetch pattern and sometimes just plain confusing :) . It looks like some types of extension word cause the PC to update (abs W and abs L) but some don't (displacement and indexed). Still trying to work out the exact algorithm.

- The stacked IR reflects the following instruction for move.w x,-(a1) only. Presumably the prefetch has already completed and transferred to IR - but why it doesn't happen for the long move is intriguing.

- Predecrement reads appear to work lower address, higher address for reads, but higher address, lower address for writes.

Very interesting work. Thank you for sharing your insights with us!

Hans Wessels

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Mon Apr 06, 2009 5:45 pm

Glad I could be of help. (Although sadly not with any code yet because I didn't manage to get anything done at the weekend).

fpgaarcade
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 104
Joined: Thu Sep 20, 2007 10:06 pm
Location: Sweden

Re: Cycle counts and phasing: tables and tester

Postby fpgaarcade » Thu Apr 23, 2009 9:10 am

Hi,
I'm working on an FPGA platform (www.fpgaarcade.com). The aim is to move to a softcore 68K inside the FPGA we can extend to be a 68020. The advantage of this is we can run the soft core much faster than the original chips.

I have an external 68000 which I run in lock step with the soft core to test the cycle timing. This technique has worked well with the 6502 and z80 cores before.

Do you any of you have a test suite which goes through these combinations of instructions, address modes and pairings? I don't need any of the measurement tricks as I can stop the system at the exact cycle there is a timing difference.

Thanks,
Mike.

mikej@fpgaarcade.com

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Sun Apr 26, 2009 9:32 am

Hi there Mike - I've seen the R&D you're doing and it's very interesting, and I've been hoping to chat with you for a while. I'd very much like to see your test results as to the exact timing of the instruction and data fetch cycles, and I'm keen to help out a bit if I can provide something useful - a verified accurate verilog / vhdl definition for the ST would give us a firm answer to every possible emulation question.

I don't think I have a specific test that does exactly what you want. What I have right now:
- a basic instruction verifier, that runs through all the different instructions with a wide range of operands and performs a checksum to see that the results and SRs are correct
- a separate verifier producing a complete result table for the BCD ops
- and then this timing verifier I'm working on now, which does run through nearly every possible instruction and address mode combination that can hit memory.

I imagine the latter has most of what you want - I think it would be reasonably easy to take the test tables I have and convert them to what you're looking for. The tables look like this:

Code: Select all

dc.w t0_e - t0_s / 2
t0_s: add.w d0,d0
t0_e:
dc.w t1_e - t1_s / 2
t1_s: add.w a0,d0
t1_e:
dc.w t2_e - t2_s / 2
t2_s: add.w (a0),d0
t2_e:

and so on. So you have the number of dwords in the instruction and the instruction. It would be trivial to modify it so it just spins through all the instructions without the preamble (although you would eventually have to worry about the location of the scratchpad eventually, so I still think the test harness is a decent idea. It could be a vastly simpler one than I have right now though).

I do plan to release both the table generator (a simple if messy C++ file) and the harness, but I'm struggling to find time to work on them at the moment between work, home and Left 4 Dead ;) . If you want them right now let me know - I'm very eager to get hold of any additional data you can provide on the signal timings (if I have to resuscitate enough of my limited verilog or VHDL to parse it out of the source myself that's not a problem).

fpgaarcade
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 104
Joined: Thu Sep 20, 2007 10:06 pm
Location: Sweden

Re: Cycle counts and phasing: tables and tester

Postby fpgaarcade » Sun Apr 26, 2009 11:46 am

Thanks Dio,

I am just finishing the layout of the 68K daughter board so I should be able to start the soft core verification in a few weeks.
One of the problems is the real 68K has quite a complex instruction pre-fetch behaviour which currently is not done by the soft cores.
Your timing verifier sounds like what I need, ping me when you have something I can steal for my tests!
One of the nice things with the FPGA platform - even if running with external real CPU is you have something called chipscope which is like a logic analyser which runs in the FPGA. You can stop and grab the exact timing of each instruction which is useful.

Best,
Mike.
mikej@freeuk.com

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Sun Apr 26, 2009 7:04 pm

The prefetch doesn't seem to be particularly complex in and of itself: each instruction is coded to ensure the two-word queue is always full at instruction end. The complexity only arises because it's manually coded into the microcode for each instruction and so not as predictable as one might think. Mostly, these do appear to be 'logical' changes - for example, the CPU needs a couple of cycles to decrement the address for predecrement, so it shifts the prefetch ahead of the read so as to do them in parallel. Some of this is also revealed by the bus / address error testing (although it's never that simple in the 68k, some of the rules look a bit non-obvious, although I haven't really tried to boil down the raw underlying mechanism).

What I'd love to probe for each instruction is the start cycle and address of each of the memory accesses relative to the start of the instruction. Then we know for certain the exact relative orders of the operand reads, writes and prefetches.

I've been trying to find the 68000 microcode for a while. Supposedly it's in the (now expired) patent, but it doesn't seem to be available online.

Blinking at my schedule for the next few weeks I don't think I'll have anything ready for public release for a month or so, so let me know if you want the best available before then.

Cheers
Dave

ijor
Hardware Guru
Hardware Guru
Posts: 3863
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Cycle counts and phasing: tables and tester

Postby ijor » Sun Apr 26, 2009 9:26 pm

Dio wrote:I've been working on an an automated tester for checking emulators' 68000 timing very precisely.


Very nice !

There's a few that are different in the motorola docs - the bit test instructions in particular don't appear to match (12 for set and chg and 14 for clr looks like 10 and 12 to me). With my experience of technical docs, I think I trust the tester


Not trusting the docs is a wise attitude. But the motorola docs, depending on the version, are almost 100% accurate. And in the specifc case of the bit instructions, you can't blame Motorola when you are misreading the doc (or you are using a dodgy copy).

What do you mean by automatic instruction pairing? ... I can't think what an easier alternative might be - are people building big tables for this stuff?


Ah! I now (and only now) understand your comment about my latest article.

You were not aware that most emulators use tables for this. And I was not aware that you were attempting something more accurate. Yeah, if you are doing bus-cycle accuracy emulation, then evertyhing makes sense. But when you are doing instruction level emulation, just with some table base correction (as most emulators do), then you can never be 100% accurate.

- Bus errors look to be generated by a watchdog timer in glue -


Yes, this is well known, we talked about that several times. It is likely by design. The idea is that third party upgrades could use any (or most) of the unused address space at willing.

- The exact PC stacked on bus or address error is complex...


It is. There is no direct relation between this and anything else. That's one of the reasons I said that bus error accurate emulation is one of the most difficult emulator tasks. There are of course, some pattern, and conceivable you could use the patterns to reduce the size of the huge table that otherwise would be required. But there is no simple direct logic.

fpgaarcade wrote:One of the problems is the real 68K has quite a complex instruction pre-fetch behaviour which currently is not done by the soft cores.

Dio wrote:The prefetch doesn't seem to be particularly complex in and of itself...


I agree. The complexity and importance of the prefetch behavior (as long as we talk strictly about prefetch) is overrated. The reason I wrote the initial article about the prefetch was because it is (or was) one of the 68K features less understood. Some of the top ST and 68K experts had a very wrong idea about how it works. Many people thought the 68K prefetch is smart and dynamic, as a modern micro, but it is not.

I've been trying to find the 68000 microcode for a while. Supposedly it's in the (now expired) patent, but it doesn't seem to be available online


The patents are available online at the usual patent web sites. Part of my research was based on those patents. But there is a long way to go from the patents, and unfortunately they won't provide the ultimate authoritative answers you expect.

For starters, the microcode in the patents is quite unreadable. It is a very bad scan of a bad printing. In second place, it seem to have many mistakes. It is not a computer printing that you could trust that what was printed went exactly to the silicon. Also several pages are missing. e.g., about half of the pages with the microcode for the DIV instructions are missing.

Lastly, the patterns don't cover the final product as we know it. They cover a beta/preliminary design that was never sold, possibly never actually produced. There are significant differences with the final product, and certainly many on the microcode. e.g, the DBcc instruction is missing, and instead it has a short form of DBRA (unconditional). Seems that DBcc was added at a later stage for better support for high level languages. But personally, I miss a short version of DBRA :(

However, I do recommend checking the patents. Even when they won't provide all the answers you expect, it is a very interesting material.

Dio
Captain Atari
Captain Atari
Posts: 451
Joined: Thu Feb 28, 2008 3:51 pm

Re: Cycle counts and phasing: tables and tester

Postby Dio » Sun Apr 26, 2009 9:41 pm

ijor wrote:
- The exact PC stacked on bus or address error is complex...
It is. There is no direct relation between this and anything else. That's one of the reasons I said that bus error accurate emulation is one of the most difficult emulator tasks. There are of course, some pattern, and conceivable you could use the patterns to reduce the size of the huge table that otherwise would be required. But there is no simple direct logic.

From what I'm seeing, I don't think it needs a table, I think it will turn out to be predictable. Just a bit illogical. The rough rules I have right now are:
- the PC stacked is that of the next fetch. So for something like add (a0),d0 then it's PC+2. That seems to be because the prefetch hasn't been issued yet.
- -(an) sources stack PC+4. The implication is that the prefetch has already occurred (and there's no easy test because read ordering can't be probed from software). However, the cycle count of the bus / address error might indicate it hasn't. I still haven't worked that one through.
- $(an) and $(an,dn) stack PC+2, despite having at least in theory fetched the extension word or displacement. That's not absolutely required - the word's already in the other prefetch register after all - but again the cycle count seems to indicate the prefetch hasn't happened yet. Also needs more work
- abs.W and .L stack the expected PC+4 and PC+6 and the cycle also indicates the prefetches definitely occur before the read

Haven't worked through the destinations at all. I also haven't run the table that tests read OK write bad for second sources but I'm not sure if that makes a lot of difference.

I'm optimistic that when I work out the underlying logic it will all 'just work' automatically. It's fairly close already.

It does seem to get a bit more chaotic in a few places - I suspect those are where the addressing used by the instruction isn't quite standard, but I think that mostly shows in the code implementing the instructions anyway.

Thanks for the hints on the patent, I will have a rummage at some point.
Last edited by Dio on Sun Apr 26, 2009 9:51 pm, edited 1 time in total.

fpgaarcade
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 104
Joined: Thu Sep 20, 2007 10:06 pm
Location: Sweden

Re: Cycle counts and phasing: tables and tester

Postby fpgaarcade » Sun Apr 26, 2009 9:49 pm

Interesting stuff.

The patents I found are
4296469 Execution unit for data processor using segmented bus structure
4307445 Microprogrammed control apparatus having a two level control store for data processor
4312034 ALU and condition code control unit for data processor
4325121 Two level control store
4338661 Conditional branch unit
4342078 Instruction register sequence decoder << most interesting one
4348722 Bus error recognition
4349873 Microprocessor interrupt processing
4409671 Data processor having single clock pin

/Mike

ijor
Hardware Guru
Hardware Guru
Posts: 3863
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Cycle counts and phasing: tables and tester

Postby ijor » Mon Apr 27, 2009 12:45 am

Dio wrote:From what I'm seeing, I don't think it needs a table, I think it will turn out to be predictable. Just a bit illogical. The rough rules I have right now are: - the PC stacked is that of the next fetch.


I didn't say it is not predictable (as if it would be non-deterministic). Of course it is predictable. Just that there is no simple direct logic.

No, there is no direct relation with the next fetch address. There is only a partial, indirect relation. That's why I'm saying there are some patterns.

So for something like add (a0),d0 then it's PC+2. That seems to be because the prefetch hasn't been issued yet.
- -(an) sources stack PC+4. The implication is that the prefetch has already occurred (and there's no easy test because read ordering can't be probed from software). However, the cycle count of the bus / address error might indicate it hasn't. I still haven't worked that one through.


Yep, these are good samples that proof that there is no simple direct relation. In both cases the next prefetch would be at the same address, yet the PC written on the exception frame is different.

I don't think there is anything to "work" here. Just to accept that, unfortunately, there is no simple rule. It doesn't mean you can't find useful patterns, or a combination of rules with exceptions that would be helpful. Just that there is no simple direct logic.

Btw, the next address to be fetched is PC+4, not PC+2.

Honestly, I don't think using tables is that bad. On a modern PC (and on most other platforms that an ST emulator would run), you have huge amounts of disk space, RAM and CPU cache to spair. You have to decode the opcode anyway. You must have opcode based tables one way or the other, making the table bigger is not that bad. You just must be careful when designing the tables for optimizing CPU caching.

You'll of course get a hit on performance, but you'll get it anyway if you want cycle accurate emulation. Not big problem for emulating an ST/STe. Falcon and 030 emulation requires much more CPU power, but personally, I wouldn't care too much about cycle accuracy when emulating a Falcon. And then you could get a cycle accurate 68000 engine, plus a not (so) cycle accurate 030 one.


Social Media

     

Return to “Coding”

Who is online

Users browsing this forum: No registered users and 2 guests