Quiz of the week: Single cycle accuracy code.

GFA, ASM, STOS, ...

Moderators: exxos, simonsunnyboy, Mug UK, Zorro 2, Moderator Team

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Quiz of the week: Single cycle accuracy code.

Postby ijor » Fri Oct 14, 2016 2:15 am

Hope, after some warm up, I'm bringing something interesting this time :)

It is used to say that the ST, and in general the 68000 code, has a granularity of two cycles. According to that concept, it is possible to perform reads and writes with an accuracy and granularity of two cycles, but not less.

This is mainly because, according to all references, no 68000 instruction takes an odd number of cycles, always an even number. And the reason for this is the microcode's timing. A microblock (a microcode instruction) takes normally two or four cycles. No microblock ever takes just a single cycle.

The ST if further constrained by the famous round up to four rule. Any access to main RAM, or to Shifter, is aligned to a four cycles boundary. But this affects only RAM and Shifter access. For instance, code running from ROM is not constrained to the four cycles granularity. And even code running from RAM, with clever pairing tricks, can perform external cycles with a two cycles granularity.

But, never, not code running from ROM neither from RAM, can execute with a single cycle granularity … or it can?

So what if we want to perform a write at an exact cycle? Is that possible? What if, say, we want to change video frequency from 50 Hz to 60 Hz for an odd number of cycles? That would require to write to Glue once at cycle n, and then again at cycle n+i, where i is an odd number.

Is that possible? Is it possible only for code running from ROM? Or it is also possible for code running from RAM? If it is possible, how, and why?

Note that the case mentioned of GLUE is just a sample. It could be something else as long as it is not Shifter (and RAM). I selected GLUE, precisely because it could be interesting. But the question is generic, if it is possible to to perform reads and writes with full cycle accuracy.

Quiz of the week, or may be for the month … :)

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1898
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby Steven Seagal » Fri Oct 14, 2016 9:49 am

Other quizzes were of the day.
When R/W to PSG ($FF8800), there seems to be a systematic wait time of 1 cycle.
I win, I win!

User avatar
troed
Atari God
Atari God
Posts: 1182
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: Quiz of the week: Single cycle accuracy code.

Postby troed » Fri Oct 14, 2016 10:10 am

ijor wrote:Hope, after some warm up, I'm bringing something interesting this time :)


So I bought a Playstation VR yesterday, and it's absolutely awesome.

Yet I want to read what you write even more than I want to get back to my house and play!!!! :D

/Troed


User avatar
npomarede
Atari God
Atari God
Posts: 1133
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Quiz of the week: Single cycle accuracy code.

Postby npomarede » Fri Oct 14, 2016 3:42 pm

Is your question a rhetorical question, or should we understand that the answer is already "yes" ? :)
Maybe using "TAS" at some point ? I don't remember its microcode, but I wonder if it doesn't perform some bus decision at odd cycle ?
TAS is famous for making the Amiga crashes (due to conflict with the HW accessing the bus at the same time as the cpu), so maybe on Atari is has a "positive" effect ? :)

Nicolas

User avatar
alien
Atari maniac
Atari maniac
Posts: 85
Joined: Sat May 01, 2004 4:01 am
Location: USA
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby alien » Sun Oct 16, 2016 6:07 pm

For some reason, when you asked this question, it brought to mind that I always wondered whether there might be a way to change the video fetch address mid-screen. I tried to find a means in software, but never succeeded. That's something you could check when you decap/reverse-engineer the MMU.

If there are instructions that take 2n+1 cycles on a 68000, there may be a way to exploit them, even if RAM doesn't allow fetches at odd cycles, but ROM does. If the ROM contains some sequence "odd-cycle causing instruction(s), ... move to memory-address, ... return", one can call it. Even if there are additional instructions, for example before the return, it doesn't matter, as long as one can mitigate any bad consequences they might have.

I agree with Stephen in that if there is a way block bus access for a cycle, then the 68000 would most likely have to stall that long. So there might be a way to stall it, without stalling the other components, producing the equivalent of a 2n+1 cycle instruction as far as the other components were concerned.
Alien / ST-Connexion

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby ijor » Sun Oct 16, 2016 10:49 pm

npomarede wrote:Is your question a rhetorical question, or should we understand that the answer is already "yes" ? :)


The answer is, of course, yes. :) As previous quizzes, I know the answer beforehand. And yes, I wouldn't ask it if I knew the answer is no, jaja.

Maybe using "TAS" at some point ? I don't remember its microcode, but I wonder if it doesn't perform some bus decision at odd cycle ?


I admit I don't remember performing much tests with the TAS instruction. But I don't think there is anything "odd" with TAS. I mean, it is a very special instruction. But there is nothing odd in the sense as an odd, not divisible by 2, cycle.

Steven Seagal wrote:Other quizzes were of the day. When R/W to PSG ($FF8800), there seems to be a systematic wait time of 1 cycle. I win, I win!


Wow, didn't expect it as fast. Yeah ... you win, congratulations! If you knew it, then why it is not implemented in Steem? :)

I'll post a detailed answer later.

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby ijor » Sun Oct 16, 2016 10:59 pm

alien wrote:For some reason, when you asked this question, it brought to mind that I always wondered whether there might be a way to change the video fetch address mid-screen. I tried to find a means in software, but never succeeded. That's something you could check when you decap/reverse-engineer the MMU.


Do you mean to write to that register, as you can in the STE?

Don't think there is any way. The Video Pointer at MMU is just a counter. The only other function besides incrementing is that Vsync edge provokes the higher bits to be reloaded from Vbase and the lower bits cleared.

So only possible way seems to be to provoke a Vsync edge ...

User avatar
alien
Atari maniac
Atari maniac
Posts: 85
Joined: Sat May 01, 2004 4:01 am
Location: USA
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby alien » Mon Oct 17, 2016 1:27 am

ijor wrote:Don't think there is any way. The Video Pointer at MMU is just a counter. The only other function besides incrementing is that Vsync edge provokes the higher bits to be reloaded from Vbase and the lower bits cleared.

So only possible way seems to be to provoke a Vsync edge ...


Interesting. We'd need a very very short Vsync pulse so that the monitor/TV doesn't actually Vsync. The advantage would be that things like the Spreadpoint demo, or Shadow of the Beast parallax would become trivial. 8).
Alien / ST-Connexion

User avatar
npomarede
Atari God
Atari God
Posts: 1133
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Quiz of the week: Single cycle accuracy code.

Postby npomarede » Mon Oct 17, 2016 9:16 am

ijor wrote:
Steven Seagal wrote:Other quizzes were of the day. When R/W to PSG ($FF8800), there seems to be a systematic wait time of 1 cycle. I win, I win!


Wow, didn't expect it as fast. Yeah ... you win, congratulations! If you knew it, then why it is not implemented in Steem? :)

I'll post a detailed answer later.

Hi

funny that it works this way ; because many years ago when trying to add the correct number of waitstates in Hatari, after checking several combinations of move.w , move.l , movem and so on, I found that the best model that fitted with all cases was to add 1 cycle wait state per acces.

Code: Select all

M68000_WaitState(1);                            /* [NP] FIXME not 100% accurate, but gives good results *

As I thought that odd number of wait states were not possible, I marked this as "FIXME" while waiting for a better model / solution. Seems I can leave it this way in the end :)
Also from my tests, move will work with the "1 cycle" rule, but not clr.

For example :
movep.l d6,0(a5) will take 24+1+1+1+1 = 28 cycles
clr.b (a1) will take 20 cycles, and not 12+1+1=14 -> 16 cycles (this is because clr does a read before the write)
So the rule can be a little more "complex" when not using a simple move.b

Nicolas

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby ijor » Mon Oct 17, 2016 9:51 am

Essentially, the main point is that the 68000 doesn't constrain the wait states to be an even number of cycles.

Instructions take always an even number of cycles, but this is only the minimum. Every microcode block that completes a bus cycle can be extended, for any number of cycles while wait states are generated by the system. So if the system inserts, say 3 wait states, a bus cycles that normally would take 4 cycles would take 7. And after that bus cycle, the cpu will continue executing from odd cycles,at least until more wait cycles change the alignment once again.

In the case of the ST, GLUE inserts just a single wait state when accessing the PSG audio chip. This would result in bus cycles accessing the PSG to take 5 clock cycles. And hence the next bus cycle will start at an odd cycle. Of course, if the next cycle happens to be at main RAM (or Shifter), such as when performing a prefetch, then MMU would align the CPU back to a boundary of 4 cycles.

But any bus cycle(s) performed in between, after a PSG access, and before any RAM access (including prefetch), will be executed at an odd cycle!

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby ijor » Mon Oct 17, 2016 10:06 am

npomarede wrote:Also from my tests, move will work with the "1 cycle" rule, but not clr.

clr.b (a1) will take 20 cycles, and not 12+1+1=14 -> 16 cycles (this is because clr does a read before the write)
So the rule can be a little more "complex" when not using a simple move.b


The rule is perfect. You are applying it wrong. CLR has the prefetch at the middle, between the read and the write operand. So it is actually 12+1+3+1=17 -> 20 (if not paired):

Code: Select all

Read operand:    4 + 1 (wait state)
Prefetch:        3 (MMU align) + 4
Write operand:   4 + 1 (wait state)
Next bus cycle:  3 (MMU align) ...

The last only if NOT PAIRED with the next instruction!, otherwise just 1 cycle alignment instead of 3

User avatar
npomarede
Atari God
Atari God
Posts: 1133
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: Quiz of the week: Single cycle accuracy code.

Postby npomarede » Mon Oct 17, 2016 10:09 am

You're right, clr does a prefetch between read and write, which will cause a rounding to 4 cycles.

Regarding these rounding cases, when I wanted to compare real STF with Hatari regarding code running from ROM (no bus rounding) and whether some IO regs were rounded to 4 cycles or not (for shifter for example), I needed to find some code in TOS ROM to jump to that would allow me to do some move then go back to my program into RAM and measure the whole execution time. It turned out that it's quite hard to find some simple pieces of code in ROM that would just do sthg like :

Code: Select all

 move.w Dx,(ax)
 rts

I found some places that did 4 writes, or several reads, but if we wanted to find a common piece of code in all TOSes that could be called from a program running in RAM to perform 1 write to FF8201 and 1 write to another IO regs for example, then I'm not sure it would be that easy :(
for example in TOS 1.00
$00fc230c : 10d9 move.b (a1)+,(a0)+
$00fc230e : 51c8 fffc dbra d0,$fc230c
$00fc2312 : 4e75 rts

But you need to read from A1, which is likely to be in RAM if you want to put your own value, so this will add a bus wait.

Nicolas

User avatar
troed
Atari God
Atari God
Posts: 1182
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: Quiz of the week: Single cycle accuracy code.

Postby troed » Mon Oct 17, 2016 10:35 am

ijor wrote:But any bus cycle(s) performed in between, after a PSG access and before any RAM access or prefetch, will be executed at an odd cycle!


Thanks as always for your research! Have you already seen the possibility to use this for something new? (My first thinking would be to revisit my lower border tests in mono, for example)

/Troed

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby ijor » Mon Oct 17, 2016 11:22 am

npomarede wrote:Regarding these rounding cases, when I wanted to compare real STF with Hatari regarding code running from ROM (no bus rounding) and whether some IO regs were rounded to 4 cycles or not (for shifter for example), I needed to find some code in TOS ROM to jump to ...


Nice idea! LOL. There was a time that I wanted to build a diag cartridge with flashable ROM. That would do it for your purposes.

I found some places that did 4 writes, or several reads, but if we wanted to find a common piece of code in all TOSes that could be called from a program running in RAM to perform 1 write to FF8201 and 1 write to another IO regs for example, then I'm not sure it would be that easy


Yep, certainly not easy. But we might not need it, the following code can be run from RAM:

Code: Select all

  lea A0,PsgAddr+1
  lea A1,GlueAddr

  move.b D0,-(A0)          ; Use this mode to make the write the last cycle, AFTER the prefetch
  move.b D1,(A1)           ; This mode writes BEFORE the prefetch


The above sequence will make the trick. There is no prefetch between both I/O bus cycles. The write to GLUE follows immediately the write to PSG. BUT, this will only give you the possiblity of writing at a +1 cycle. That is, at the following cycle of one aligned at a four cycles boundary.

Unfortunately this won't work for writing at a +3 cycle. I cannot find a combination of instructions that would produce that kind of alignment. So this method doesn't allow to write at any arbitrary cycle. Of course, from ROM is easy. But I have YET another idea for doing this from RAM ... :)

troed wrote:Thanks as always for your research! Have you already seen the possibility to use this for something new? (My first thinking would be to revisit my lower border tests in mono, for example)


Not anything concrete. It passed my mind it might help for the issue that a couple of things don't work reliably in one of the wait states. But I'm not sure this might solve the problem.

Edit: Fixed some language errors.

User avatar
Steven Seagal
Atari God
Atari God
Posts: 1898
Joined: Sun Dec 04, 2005 9:12 am
Location: Undisclosed
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby Steven Seagal » Mon Oct 17, 2016 7:58 pm

ijor wrote:Wow, didn't expect it as fast. Yeah ... you win, congratulations!


Some other aging B movie action stars could be slower, but with Steven Seagal, the response is always fast and brutal. You know!

Image

If you knew it, then why it is not implemented in Steem? :)


It's a recent development and still under test (current beta v3.8.3), not known as fact (until now). :)

In the case of the ST, GLUE inserts just a single wait state when accessing the PSG audio chip.


I thought it had to do with different clocks (2mhz vs 8mhz).

ijor
Hardware Guru
Hardware Guru
Posts: 3031
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby ijor » Tue Oct 18, 2016 10:31 am

Steven Seagal wrote:
ijor wrote:In the case of the ST, GLUE inserts just a single wait state when accessing the PSG audio chip.

I thought it had to do with different clocks (2mhz vs 8mhz).


Not really, or only very indirectly related to the slower clock if you want. The PSG has an asynchronous bus interface. The clock is mostly used for the sound processing. The wait states are needed because it is a slower device that needs more time (not exactly more cycles) to process the bus operations.

And anyway, wait states are required to be generated externally. It's not like ACIAs that are connected to the E clock.

User avatar
leonard
Moderator
Moderator
Posts: 640
Joined: Thu May 23, 2002 10:48 pm
Contact:

Re: Quiz of the week: Single cycle accuracy code.

Postby leonard » Tue Oct 25, 2016 8:52 pm

Ijor, that's just awesome :) I mean, I have the feeling that I can learn new stuff about ATARI when reading atari forum till the end of my life :)

I implemented the strange PSG timing access in SainT using a very empirical rule, and I never suspected the "+1" waitstate. In my wrong brain, it *should* be only 4 cycles, not 1, so how I did the MOVEM to PSG timing right? I did many tests and at the end I empirically found that there was 4 cycles delay every 4 PSG access.
I never could imagine "+1" cycle myself, never. That's awesome :)

troed: regarding all your work about borders opening and the shifter/mmu state machine, do you think you can make a new fullscreen line using that new brainblasting trick?

Other question for ATARI demo fans: do you think some existing demo fullscreen instability could be explained by PSG access at the wrong place?
Leonard/OXYGENE.

User avatar
troed
Atari God
Atari God
Posts: 1182
Joined: Mon Apr 30, 2012 6:20 pm
Location: Sweden

Re: Quiz of the week: Single cycle accuracy code.

Postby troed » Wed Oct 26, 2016 7:21 am

leonard wrote:troed: regarding all your work about borders opening and the shifter/mmu state machine, do you think you can make a new fullscreen line using that new brainblasting trick?


On my, unfortunately way too long for the retro-time I have, todo is to see if the problems with my "new & improved" routine as used in Closure (only working in one out of two Shifter substates in WS2) could be solved or alleviated with a +1 trick. The main problem, at least for the demo, might be that I'm all out of cycles for "setup instructions" though, but as proof of concept at least.

What I will do is to probe all positions (done for low/high for all even cycles) and see if the possibility to now switch at 0,1,2 [4,5,6...] will help with some edge cases. As mentioned above, although we should be able to just do the math, would be to see if lower border in mono becomes possible.

/Troed


Social Media

     

Return to “Coding”

Who is online

Users browsing this forum: No registered users and 1 guest