
Moderators: Mug UK, Silver Surfer, Moderator Team
Code: Select all
x set 0
rept 10
movem.l d0-d7/a2-a6,x(a0)
x set x+(4*13)
endr
Code: Select all
bydo_sheild_tab rept 15
Code: Select all
rept 228
movem.l (a7)+,d0-d7/a0-a3 *108T (12+12*8) preload 16+8 colors
* complete P0 and first half of P1
movem.l d1-d7,(a5) * 64T * P0 col 2-15
move.w d0,(a4) * 8T * P0 col1 , 15 colors of P0 written
swap d0 * 4T
* 184T so far
*boe
move.w d0,(a6) * 8T update color 0 of P0 at exact line start
movem.l (a7)+,d0-d3 * 12+4*8 = 44T - second half of P1
movem.l a0-a3,(a6) * 40 T first half of P1 to shifter at px 56
move.l usp,a3 * 4T px
movem.l d0-d3,(a3) * 40T second half P1 at px 100
movem.l (a7)+,d0-d7 * 76T , complete P2 pref. at px 140
* movem.l d0-d7,(a6) * 72T all 16 colors P2 at px 216
* Instead, do it in 2 steps
movem.l d0-d3,(a6) * 8+8x4=40
movem.l d4-d7,(a3) * 40T
* 20T states free here for :
* addq.l #1,d0 * 8T
asr.l #6,d0 * 20T
*bos
* move.w $40.w,(a6) * 16T need 312T from boe ( move.w a5,(a6) ) - here relative +8T wr.
* nop
moveq #0,d0
move.w d0,(a6)
nop
* Total 184+312+16=512 - superisha !
endr
AtariZoll wrote:it works well and smooth at 8 MHz, so no need to push over.
AtariZoll wrote:Btw. Unroll is more efficient even with cache. Especially if we loop short code - because no dbf or other jumps to loop begin, which just eat CPU time, while doing nothing "useful" .
AtariZoll wrote:Anyway, more interesting is why you had problems at 16MHz, on Mega STE ? May be that it is cache sensitive. I need to check it again on MSTE.
I already did some mod of this game - replaced ICE packed files with better packing, what depacks much faster. Really don't get why ICE is so much used (seen even in some commercial releases). http://atari-forum.com/viewtopic.php?f= ... e&start=50
Maybe we could do new release for floppy users - I will provide you repacked files and source of depacker, so you can add it in src. file and assemble.
I really have too much things to work on currently, and you already fixed src.
Code: Select all
dc.l .data+x+(2*4)*60)
Code: Select all
dc.l .data+x+(2*4)*60
fenarinarsa wrote:After some testing I found that the game hangs when running at >8Mhz because it uses a peculiar way of doing vsync:
- first timer B is used to change the hardscroll parameters in the lower part of the screen to display the status bar (status_tb)
- then a new timer B to open the lower border (lower_tb1)
- then around 28 lines later, a new timer B in event mode (.ltb2) decrements vbl_count, used by a function called "sync" which is used into the game to sync to begin a new screen buffer.
(there is a "vsync" function but it's not used in-game, only when system vbl functions are active, for stage loader I guess)
Of course at 16Mhz+ the lower border is not opened, then vbl_count is not decremented, and the game hangs in the sync function.
Cyprian wrote:...
In case of Hatari/Steem 16MHz means: 16MHz CPU and 4MHz memory bus.
In case of Mega STE 16Mhz means: 16MHz CPU and 2MHz memory bus (exactly the same as standard ST).
What does it mean in case of open border? Under emulator NOP takes 4 low-res pixels for 8MHz CPU and 2 low-res pixels for 16MHz CPU.
In case of Mega STE, bottom border should be open properly due to the same memory bus speed. NOP instruction takes 4 low-res pixels either on 16MHz CPU or 8MHz CPU .
AtariZoll wrote:I would not agree with that. There is 2 kind of memory in Mega STE : regular ST RAM and cache RAM. If CPU addressing some address what is already in cache it will be accessed 2x faster than when is not in cache.
fenarinarsa wrote:You're right, 16Mhz on MegaSTE actually makes no sense when you don't enable the 16k associative cache (you get only ~3% performance boost I think), because the CPU has to wait for the bus. When enabling the cache it's a different story. It does not give a 100% boost unless you optimize your code to fit in the 16k cache - and it's quite hard since it caches everything, instructions + data - but you get at least a good performance hit. My tests showed that it can be as good as +80%.
Greenious wrote:It all depends on circumstances, typically it is the ST-ram access that holds the CPU back, ROM access and many hardware registers are faster. Most instructions on the 68k are also typically not "slow enough" to really benefit from a faster clock, since they access the ST-ram frequently enough to have that hold them back. (ie. they don't spend too much time inside the CPU "contemplating" over the data. With mul/div instructions the great exception) So yes, 3-5% speed boost on a game/demo is probably about as good as it gets.
But you get a much better speedboost in TOS than in games/demos running 16MHz without cache, since much is running in ROM, and it is noticeable. Perhaps like ~30% on average in desktop applications.
I do agree though, 16MHz without cache is not worth the effort imho, too much work for very little gain. TT/Fastram change the equation though, but only works with software that is fastram aware, which basically no game is.
AtariZoll wrote:ROM access on ST is not faster, and HW registers are not too. Actually some are slower - extra wait states.
And at 8MHz it can't be faster since RAM access is at full speed.
Did you perform actual ROM speed test on Mega STE ? I think that even if ROM works at same speed as ST RAM, with cache it will be faster up to some 80% - depending of code executed. And really no point to talk about using 16MHz on MSTE without cache - nobody using it. Maybe some really rare SW can benefit little - what crashes with cache on.
Greenious wrote:
Well, obviously you'll need to generate faster dtack and use faster eproms than supplied with Atari originally, but no, the ST-ram and the rest of the bus works somewhat independently, so you can indeed access ROM and things like TT/Fast/alt-ram independently and magnitudes faster than 8 MHz. And I did not claim all hardware registers was faster, indeed some, as you point out, are slower. (Even ICD Adspeed from back in the days included an option to use "fast rom access", dependent on you fitting fast enough eproms ofcourse)
But swing by the hardware forum and read up on Exxos 16Mhz booster, I'm sure you'll learn a thing or two.
fenarinarsa wrote:Here's my fixed version (source code only):
- fixed syntax errors in the original source code
- removed the (useless?) lower border opening for the sync function
Now the game works for any CPU frequency, here's a capture on Mega STE running at 8 and 16Mhz:
https://www.youtube.com/watch?v=lwUUaoFkS9g
Return to “News & Announcements”
Users browsing this forum: No registered users and 7 guests