The worst hack in Steem

A place to discuss current and future developments for STeem

Moderators: Mug UK, Steem Authors, Moderator Team

User avatar
npomarede
Atari God
Atari God
Posts: 1094
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby npomarede » Mon Apr 09, 2012 4:24 pm

Steven Seagal wrote:I figure it's quite easier to emulate in Steem thanks to the clever video system: you just need to force rendering when MOVE.W ..., (d16, An) goes to the SDP zone, just like it was a shifter event.
I think the left border of the scroller wasn't "1 VBL" compliant, but it's hard to tell on youtube, the framerate is different anyway. Check it if you fix it on Hatari.

Yes, it's quite easy to implement ; but as I said, the problem is not the implementation, it's that it will slow down emulation a lot, because for any write to RAM you need to update the video address on each access and do some comparisons to see if the ram location and the video address overlaps.

About the VBL timing, one of my glorious hacks is to reduce timing by 4 (52 instead of 56). It helps some programs and breaks nothing, maybe because on a STF, the delay before execution is 64, in Steem it's hardwired as 68. Just a thought.

I'm afraid you didn't fix anything this way. The time needed between the interrupt and the execution of the code in the interrupt handler is 56. I measured it very precisely with some video synced code (and I guess steem's author and other people did it too, this value doesn't come from approximation). Fixing problems by changing a value that is known to be the correct one is not a correct fix. This just means somewhere else in the code there might be a 4 cycles error, but it's not in the interrupt timing.
Apart from that, It's true on STE the VBL starts 4 cycles later than on STF. This has to be taken into account depending on the STF/STE mode.

Beware of thought such as "It helps some programs and breaks nothing", in my experience this will often turn out to be wrong, one day you find even a very simple program that is now broken. The only accurate way to validate an emulation method IMHO is to write a very simple asm program that tests only this point very precisely on real hardware, disabling all other components that could interfere and check you get the exact same result in emulation. At one point, using a set of demos/games to decide whether the emulation is good or not is not accurate enough, many times you will find that your set of testing programs was too restrictive.

- another solution is to use a bus driven approach (instead of cpu driven, as done in WinUAE most accurate mode) : give 4 cycles to the CPU, 4 cycles to the MMU/shifter and draw 4 pixels (you need to share bus cycle with blitter, disk dma, sound dma, ... too). This is the most generic way to run the emulation, the closest to how the hardware really works, but it will be sloooowww, it really requires a lot of cpu.


Surely you mean 2 cycles for the CPU, 2 for the shifter?

No, it's 4 cycles on ST. The shifter gets data from the mmu every 4 cycles, not 2 (note than on ST bus accesses are rounded to 4 cycles, whereas on amiga for example this is not the case. This is not due to the 68000 but to the rest of the hardware).

Nicolas

User avatar
Cyprian
Atari God
Atari God
Posts: 1313
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: The worst hack in Steem

Postby Cyprian » Mon Apr 09, 2012 8:56 pm

npomarede wrote:No, it's 4 cycles on ST. The shifter gets data from the mmu every 4 cycles, not 2 (note than on ST bus accesses are rounded to 4 cycles, whereas on amiga for example this is not the case. This is not due to the 68000 but to the rest of the hardware)

in Amiga 500, as well as in ST, 68000 access to chip ram is rounded to 4 and to 8 in A1200.
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Aranym / Steem / Saint
http://260ste.appspot.com/

User avatar
npomarede
Atari God
Atari God
Posts: 1094
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby npomarede » Mon Apr 09, 2012 9:07 pm

Cyprian_K wrote:
npomarede wrote:No, it's 4 cycles on ST. The shifter gets data from the mmu every 4 cycles, not 2 (note than on ST bus accesses are rounded to 4 cycles, whereas on amiga for example this is not the case. This is not due to the 68000 but to the rest of the hardware)

in Amiga 500, as well as in ST, 68000 access to chip ram is rounded to 4 and to 8 in A1200.

Yes, this is also why Amiga had some memory extensions as "fast ram" where accesses were not rounded to 4 (starting at $200000 if I recall correctly) (but there were also memory extensions with "slow" ram that behaves like chip ram, except you could not access it through dma)
Note that on ST it's also possible to get non-rounded accesses when the 68000 code runs from the cartridge.

User avatar
Cyprian
Atari God
Atari God
Posts: 1313
Joined: Fri Oct 04, 2002 11:23 am
Location: Warsaw, Poland

Re: The worst hack in Steem

Postby Cyprian » Tue Apr 10, 2012 9:02 am

npomarede wrote:Yes, this is also why Amiga had some memory extensions as "fast ram" where accesses were not rounded to 4 (starting at $200000 if I recall correctly) (but there were also memory extensions with "slow" ram that behaves like chip ram, except you could not access it through dma)
Note that on ST it's also possible to get non-rounded accesses when the 68000 code runs from the cartridge.

yes, but also ST ALT-RAM and Blitter's shadow registers aren't rounded to 4 too
and I read on Hatar's distribution list :) that Glue registers also aren't rounded to 4
Jaugar / TT030 / Mega STe / 800 XL / 1040 STe / Falcon030 / 65 XE / 520 STm / SM124 / SC1435
SDrive / PAK68/3 / CosmosEx / SatanDisk / UltraSatan / USB Floppy Drive Emulator / Eiffel / SIO2PC / Crazy Dots / PAM Net
Hatari / Aranym / Steem / Saint
http://260ste.appspot.com/

User avatar
npomarede
Atari God
Atari God
Posts: 1094
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby npomarede » Tue Apr 10, 2012 9:09 am

Cyprian_K wrote:and I read on Hatar's distribution list :) that Glue registers also aren't rounded to 4

Yes, this was discussed by Paulo Simoes and Ijor and used to produce the fastest STF hardscroll (3 lines + 1 sync line)

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Tue Apr 10, 2012 8:26 pm

npomarede wrote:Yes, it's quite easy to implement ; but as I said, the problem is not the implementation, it's that it will slow down emulation a lot, because for any write to RAM you need to update the video address on each access and do some comparisons to see if the ram location and the video address overlaps.


No in fact, as was evident in the screenshot above, I already patched Steem for this and there's no noticeable performance impact, the comparisons are trivial, and visibly rendering 8 (?) pixels blocks is still fine, I see no CPU spike.
It's really quick & easy in Steem, I know it's different in Hatari because it renders line by line (probably WinSTon's heritage):

Code: Select all

case BITS_876_101: // (d16, An)
      INSTRUCTION_TIME(12-4-4);
      abus=areg[PARAM_N]+(signed short)m68k_fetchW();
#if defined(SS_VID_3615GEN4)
      // Writing in video RAM just after it's been fetched by the shifter
      if(abus>=shifter_draw_pointer && abus<=shifter_draw_pointer+32)
      { 
        Shifter.DST(LINECYCLES); // force rendering; fixes 36.15 Gen 4 demo
#if defined(SS_DEBUG) && defined(SS_VARIOUS)
        SetProgram(GEN4_3615);
#endif
      }
#endif
      pc+=2;
      break;



I'm afraid you didn't fix anything this way. The time needed between the interrupt and the execution of the code in the interrupt handler is 56. I measured it very precisely with some video synced code (and I guess steem's author and other people did it too, this value doesn't come from approximation). Fixing problems by changing a value that is known to be the correct one is not a correct fix. This just means somewhere else in the code there might be a 4 cycles error, but it's not in the interrupt timing.
Apart from that, It's true on STE the VBL starts 4 cycles later than on STF. This has to be taken into account depending on the STF/STE mode.

Beware of thought such as "It helps some programs and breaks nothing", in my experience this will often turn out to be wrong, one day you find even a very simple program that is now broken. The only accurate way to validate an emulation method IMHO is to write a very simple asm program that tests only this point very precisely on real hardware, disabling all other components that could interfere and check you get the exact same result in emulation. At one point, using a set of demos/games to decide whether the emulation is good or not is not accurate enough, many times you will find that your set of testing programs was too restrictive.


I'm sure your right, it's a quick hack, nothing more, in parts to make up for 68 vs. 64 delay. The hack only applies to STF option. Up to date my attemps at really fixing those HBL/VBL cycles break everything, as I've explained.


No, it's 4 cycles on ST. The shifter gets data from the mmu every 4 cycles, not 2 (note than on ST bus accesses are rounded to 4 cycles, whereas on amiga for example this is not the case. This is not due to the 68000 but to the rest of the hardware).

Nicolas


I shouldn't discuss this with you or anybody who knows far more on the ST but I was under the impression that every 4 cycles, 2 were given to the CPU then 2 to the shifter, but of course only WRT bus access. You're not saying the "MMU" gives 4 cycles to the CPU and then 4 to the shifter, rather both chips get their cycles at the same time only they must share the bus on a 2/2 basis, hence the rounding to 4.
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Sat Apr 14, 2012 7:34 am

...
Anyway, I looked further at these hacks...

1) There was a hack involving HBL & VBL starting time in the frame, it's unimportant (fixed).
2) The most confusing part of Steem to me was the macro:
#define CYCLES_FROM_HBL_TO_LEFT_BORDER_OPEN 84
This value being used in "Read SDP" and in rendering ("draw_scanline_to").
The expected value is 56 (display start in 50hz).
The hack gives the false idea that HBL somehow start at 484, which seems to make sense for physical hblank but just doesn't fit with the rest.
There was a hack because the ST shifter has 2 very strange characteristics we must take as facts:
- Palette changes are delayed some cycles before taking effect.
- The video counter is updated with a similar delay too.
So I made this explicit in the code and gave the 56 value to the macro:

Code: Select all

#define SHIFTER_PALETTE_LATENCY (28-1)
#define SHIFTER_READ_SDP_LATENCY 28 // taking 'bytes ahead' into account

Now it all makes sense (if we accept the shifter behaviour).
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

mc6809e
Captain Atari
Captain Atari
Posts: 159
Joined: Sun Jan 29, 2012 10:22 pm

Re: The worst hack in Steem

Postby mc6809e » Sat Apr 14, 2012 10:13 am

Cyprian_K wrote:
npomarede wrote:No, it's 4 cycles on ST. The shifter gets data from the mmu every 4 cycles, not 2 (note than on ST bus accesses are rounded to 4 cycles, whereas on amiga for example this is not the case. This is not due to the 68000 but to the rest of the hardware)

in Amiga 500, as well as in ST, 68000 access to chip ram is rounded to 4 and to 8 in A1200.


I've seen this claim before, but it isn't always true. The two machines, the ST and Amiga, are different in how ram is accessed.

On the ST, only even ram memory cycles (a memory cycle lasts 2 cpu cycles) are available to the CPU. This means that a memory address presented by the CPU to the MMU for reading or writing during an odd memory cycle will have the read or write performed during one of these available even memory cycles. This normally takes a total of 4 cpu cycles. If the CPU presents an address for reading or writing during an even memory cycle, however, the MMU will delay performance of the read or write by 2 CPU cycles. This can happen, for example, when executing an instruction like CLR.L on a register and the following instruction prefetch is initiated on an odd memory cycle. Instead of taking 6 cycles, the CLR.L instruction will often appear to take 8.

On the Amiga 500, all memory cycles are potentially available to the CPU/blitter so the rounding rule breaks down. Usually the situation is just like the ST, though, with bit plane DMA taking the odd cycles just like the shifter. Even when odd cycles are available, during the overscan area for instance, the CPU often doesn't use them but sometimes will. A CLR.L instruction executed during this period, for example, will not be blocked and will take 6 cycles instead of 8, using an odd cycle. Rotates and shifts may also take 6 or 10 or 14 or more cycles. It's also possible for the CPU to use odd memory cycles if the number of bit planes is reduced to 3 or 2 or 1. But the most common situation where the CPU uses odd cycles is while the blitter is being used to clear video memory. The blitter uses every other memory access cycle to clear so there are times when there are odd cycles left for the CPU if no other DMA is occurring. Memory clears can be sped up considerably by using the CPU to write to memory on odd memory cycles while the blitter writes on even cycles (or vice versa).

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Sun Apr 15, 2012 7:56 am

Steven Seagal wrote:Now it all makes sense (if we accept the shifter behaviour).


Except it's exactly the opposite of what I said yesterday, lulz. Who knows tomorrow?
Image

My understanding of those interesting shifter facts now is this:

There's a delay in displaying fetched pixels but not in changing palette.
Now the value of 84 makes sense: when we're at 'LineCycle' 52 for example, and we change the palette, it has effect on pixel 52-28=24 of the screen. This means that when we're on pixel 52 in our reckoning ("for the shifter"), the screen's electron beam is only at pixel 24. When the electron beam is at 52, the shifter has already fetched and decided the kind of line it must display, but it still must draw the pixels. We're at LineCycle 52+28=80, etc.
When we mean to remove the right border, we use the value '376'. When the shifter gets the frequency change at this 'LineCycle', it changes its policy, but with delay. The electron beam isn't at the right border yet, it's 28 cycles behind.
The video counter apparently gets the screen value, and not the fetching position, which would explain why we need 'hacks' for this function too.
So, sorry for my accusations Hayward bros, it isn't an ugly hack I found in Steem after all. But there was still something to fix (Cool STE, 3615 Gen 4...), and to understand.
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Mon Apr 23, 2012 7:46 pm

Continuing this monologue, I posted the Steem event plan for each frame above, with, for a 50hz frame, 313 HBL spaced every 512 cycles. Cool... but what about lines ending at cycle 508 because they're at 60hz? This wasn't really handled, and for each such line we get 4 cycles off in the frame.
It generally has no consequence, but in this case, SNYD/TCB, it has (you need good eyes to see it!):


Image

Steem original plan

Image

Steem fixed plan

Image

Hatari

Hopefully you can see that the logo is more on the left in the first screen. By the way, this is a screen with hacks. The display is totally incorrect without it (right border not removed).
It is horrible because it creates confusing situations where you would use hacks to get through, in this instance, in the 'read SDP' function. The current "fix" is heavy, with a loop that shifts future events if a line ends at 508.

EDIT:
Back to this, found a lighter fix. As I understand, there's a double hack for 60hz lines in Steem. 1) The HBL is fixed at 512 in the frame, and 2) 'read SDP' returns the same for a 60hz as for a 50hz line. I think we have a -4 and a +4 cycles compensating each other here.
But I just found (in an older thread in this forum) that a 60hz line will run 508 cycles only if the switch was before DE.
A line that starts at 50hz, is changed to 60 after cycle 56, will still run 512 cycles. That's the case of the line just before the SDP is read in TCB, and that's what messed Steem! For those lines, you must adjust 4 cycles/2 bytes.
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
Dizzy-WEWRF
Atari maniac
Atari maniac
Posts: 99
Joined: Fri Mar 23, 2012 7:22 pm

Re: The worst hack in Steem

Postby Dizzy-WEWRF » Thu Jun 21, 2012 3:38 pm

Hi :)

Is there
any timeframe, a version SSE 3.4 is
expected to be released :D

Or perhaps any compiled beta
for unpatiant ones like me inbetween ;)


awaiting it
Great work
Thanks Dizzy
Everything will become well :)

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Thu May 26, 2016 7:32 am

To come back to the first post, an overflowing 32bit integer as CPU cycle counter.

It makes emulation of the E-Clock (1/10 CPU clock) difficult.
Each time we check the E-Clock (to add wait states), we do:
(cpu_cycles % 10), % being "modulo".

It works as long as the counter is positive, but each time it switches sign, we get wrong values.
For example, positive to negative, the last digit goes the wrong way:

...0
...2
...4
...6
-...8
-...6
-...4
-...2
...

In previous versions of Steem SSE, there was a correction for when it goes negative, a hack in fact.

Code: Select all

  BYTE cycles=abs(act%10);
  if(act<0 && cycles!=8)
    cycles=6-cycles;


Problem is when we go back to positive, and it's hard to detect, the sequence breaks again.

-...0 -> 6
-...8 -> 8
-...6 -> 0
-...4 -> 2
-...2 -> 4
0 > 6
...2
...4
...6
...8

Example: demo Closure will play fine once in Steem SSE, but on replay it will lose sync.

Because of this, we should add a cycle counter, just for e-clock, that would be kept positive (each frame modulo 160 works, for instance). But this counter must be updated each time the main counter is, which is several times per instruction. It adds much overhead to emulation (whether it's inlined or not).

And the question, of course, is there an obvious way to do this without overhead?
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
npomarede
Atari God
Atari God
Posts: 1094
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby npomarede » Thu May 26, 2016 9:01 am

Hi
just use a 64 bit counter for the cycle count ? With recent CPU, it should have no noticeable impact, and certainly less impact that checking if the 32 bit counter wraps after each instruction. With a 64 bit counter it would take years of continuous running before the emulator wraps the counter. (in Hatari, cpu cycles are kept in a 64 bit counter for this reason since a long time, no visible performance impact).

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Sat May 28, 2016 8:18 am

In previous versions, the check for negative timing was when E-clock was computed (on HBL, VBL, ACIA), not after each instruction.
As it is today (3.8.2 beta) we need two 32bit cycle counters instead of one: one that will overflow, one that is limited each frame.
There may be little difference between maintaining a 64bit counter and maintaining 2 32bit counters on a 32bit system but I don't know.
If I use 64bit, all computing using cycles is affected, not just cycle addition.
A 64bit counter would be for the future (?) x64 build. Technically, it could still overflow (sooner at super speeds).
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
npomarede
Atari God
Atari God
Posts: 1094
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby npomarede » Sat May 28, 2016 8:54 am

Hi
note that "32 bit system" with modern cpu means the cpu can still handle internally 64 bits variable. The main difference between 32 and 64 bit mode of the cpu is mainly how much memory you can address directly, but for example on my linux pc, I run in 32 bit mode and there was no speed penalty when using 64 bit counters instead of 32 bit counters.

As for 64 bit counter wrapping, it would take a very long time of non stop running : when simulating an 8 MHz atari, it means 268 sec before reaching 2^31 cycles, which gives 2^32*268 / 3600 = 319736454 hours before wrapping at 64 bit, ie ~36500 years, which is an order of magnitude beyond, I doubt any counter will wrap, even when speeding up atari's cpu.

If you really want to stay at 32 bit and wrap yourself, you can do this :
every vbl checks if your cpu counter (and all derived counters) is > 1.000.000.000 for example ; if so, subtract 500.000.000 to all the cycles counter / event counter. This way delta values remain correct and your counter will stay between 500M and 1000M.

Nicolas

ijor
Hardware Guru
Hardware Guru
Posts: 2904
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: The worst hack in Steem

Postby ijor » Sat May 28, 2016 12:17 pm

Steven Seagal wrote:To come back to the first post, an overflowing 32bit integer as CPU cycle counter.

It makes emulation of the E-Clock (1/10 CPU clock) difficult.
Each time we check the E-Clock (to add wait states), we do:
(cpu_cycles % 10), % being "modulo".

It works as long as the counter is positive, but each time it switches sign, we get wrong values.
For example, positive to negative, the last digit goes the wrong way:


I don't understand. Why you use a signed counter, why not unsigned? Or at least, cast it to unsigned on that operation.

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Sat May 28, 2016 4:17 pm

ijor wrote:I don't understand. Why you use a signed counter, why not unsigned?


It was so in original Steem.
A signed counter keeps timing comparisons (t2-t1) valid around an overflow.

Or at least, cast it to unsigned on that operation.


When it overflows, the unsigned integer goes from $..E (14) to 0 but you want a 6 as last digit. Same problem.
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Sat May 28, 2016 4:31 pm

npomarede wrote:The main difference between 32 and 64 bit mode of the cpu is mainly how much memory you can address directly, but for example on my linux pc, I run in 32 bit mode and there was no speed penalty when using 64 bit counters instead of 32 bit counters.


If all Steem's cycle-related variables, like the table that olds GLUE/Shifter event timings, go 64bit, we could have more cache misses. My strategy (in the 32bit build) is to have as small a memory footprint as possible.
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm

User avatar
npomarede
Atari God
Atari God
Posts: 1094
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: The worst hack in Steem

Postby npomarede » Sat May 28, 2016 5:55 pm

Steven Seagal wrote:
npomarede wrote:The main difference between 32 and 64 bit mode of the cpu is mainly how much memory you can address directly, but for example on my linux pc, I run in 32 bit mode and there was no speed penalty when using 64 bit counters instead of 32 bit counters.


If all Steem's cycle-related variables, like the table that olds GLUE/Shifter event timings, go 64bit, we could have more cache misses. My strategy (in the 32bit build) is to have as small a memory footprint as possible.

I really think that should be measured performance-wise, given the complexity of the emulation path I don't think it would have a noticeable impact. Plus, there's no reason why a 32 bit counter would be in cache and not a 64 bit counter. On the contrary data caches will often have some lines of consecutive bytes, so if 4 bytes are in cache, it's very likely the 4 next ones are also in cache.

I think it's easier to let the amd/intel cpu do all the work and use 64 bit counters (that's what modern cpu are made for, we should not keep the same reasoning/limitation than 20 years ago where caches were only 1 or 2 KB) instead of adding the complexity of handling warping with all the 32 bit counters, possibly forgetting some cases that will trigger an error sooner or latter.

But that's your call to decide, I already made my choice in Hatari :)

Nicolas

ijor
Hardware Guru
Hardware Guru
Posts: 2904
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: The worst hack in Steem

Postby ijor » Sun May 29, 2016 2:39 am

Steven Seagal wrote:If all Steem's cycle-related variables, like the table that olds GLUE/Shifter event timings, go 64bit, we could have more cache misses. My strategy (in the 32bit build) is to have as small a memory footprint as possible.


You can have mixed cycle counters. Make the main counter 64 bits. Keep the other counters that you want at 32 bits. When you compare with those, use only the lower 32 bits of the main counter (actually, as you do know). But use the full 64 bits for the E clock ...

If that is not enough for you, then you can maintain a multi byte counter that is updated only when needed. Say, a 128 bits counter, but the upper bits are updated once per frame, or when you need to for some reason.

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1665
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: The worst hack in Steem

Postby Steven Seagal » Fri Jun 03, 2016 8:12 am

Thx for the input but I finally found a way without overhead and without 64bit counters. It just costs two 32bit variables, that don't need to be updated each time the main cycle counter is modified.

Each frame and each time the E-clock is read, a new function, RefreshCyclesForEClock(), is called.
It adds CPU cycles since last time to an apart counter, and then does a modulo by 16x10 on it.
CPU counter may be negative, the difference (cycles1-cycles0) will always be positive.

Code: Select all

void TM68000::RefreshCyclesForEClock() {
  int cycles1=ACT; // current cycles
  int ncycles=cycles1-cycles0; // elapsed CPU cycles since last refresh
  ASSERT(ncycles>=0);
  cycles_for_eclock+=ncycles; // update counter for E-clock
  ASSERT(cycles_for_eclock>=0);
  cycles_for_eclock%=(10*16); // remove high bits
  cycles0=cycles1; // record current CPU cycles
}


(cycles0 and cycles_for_eclock are the new 32bit member variables)
In the CIA we learned that ST ruled
Steem SSE: http://ataristeven.exxoshost.co.uk/Steem.htm


Social Media

     

Return to “Development”

Who is online

Users browsing this forum: No registered users and 1 guest