ST Chipset decap

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

ijor
Hardware Guru
Hardware Guru
Posts: 3817
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Sun Jan 07, 2018 1:18 pm

Steven Seagal wrote:Are you saying that the GLUE "M2CLOCK" can awake at an arbitrary 0-15 tick (~32mhz) on the STF relative to DCYC, and not always on the same n, n+4, n+8 or n+12 tick?


No, it isn't that bad either. It will (or should) trigger always within the same cycle.

MarkP
Atarian
Atarian
Posts: 5
Joined: Tue Jul 31, 2018 10:22 pm

Re: ST Chipset decap

Postby MarkP » Wed Aug 01, 2018 1:59 am

Wow... the ultimate level of hacking, peering right into the microstructure of the chips themselves, and using that to solve one particular longstanding enigma of how data moves through the shifter (that I've wondered about myself right since reading the classic ST Internals book way back in the day and seeing its rather sparing, mostly code-level description of how it appears to work from the outside).

I'm not sure if I'm reading it right, but does the shift matrix diagram (including that brilliant but kooky 2-bit-per-cell design that rolls the FIFO/IR into the shiftreg/RR) imply that it loads in 4 words just-in-time (except in the case of the every-16-pixels glitch) for the previous 4 to have run out in low rez, shunts them all down / shifts out the first pixel / starts loading the first word of the next block of four in more or less the one motion, and places four bits off the far right end of what are effectively 4 parallel, single tapped, 1-bit shift regs with each shift...? And for medium it only loads in two words at a time, twice as often because the individual bits in each effectively-parallel-1-bit reg are then shifted out at twice the frequency, with the lower 2 bits then always being zero so only 4 colours can be produced despite 4 bits technically being shifted out... and for hi-rez it only loads in a single word, and performs the internal transfer 4x as often as in low rez, with only the uppermost output ever changing?

Versus the old model I remember seeing drawn up in the past where the IR and RR are separate 1-bit deep, 64-bit wide registers with four taps arranged every 16 bits along the RR (like a more regularly spaced set of LFSR taps) and the output of *those* changing as the RR is rotated right with low using all four, medium using the second and fourth, and mono only using the fourth... with all modes loading four words into the IR at a time then dumping the full 64 bits down into the RR once full, at the exact same speed, with only the speed of rotation and number of taps changing (?)

What might that (and the actual clock divider structure) then mean for the possibility of tweaking the shifter for additional modes, both from a fantasy-historical and an actual (FPGA?) recreation basis? Easier, harder?

I'm thinking in terms both of adjusting the bitdepth vs frequency relationship (e.g. 3bpp at 32/3 MHz, or 5bpp/6bpp at 32/5, 32/6MHz with extended shifter matrix and palette depth, the latter being similar to the pixel clock used by a lot of 80s consoles plus computers like the MSX and the former being twice that, so 256 or 512 pixels would just hit overscan and 240/480 pixels for a slightly coarse 40 or 80 columns of text would just fit within the underscan limits... as well as a half Hrez 2bpp mode in mono), and aping the Amiga by changing the CPU (or Blitter, DMA controller, etc) vs shifter bandwidth relationship (as mediated by a slightly twiddled MCU)... so the speed available to the rest of the system could be sacrificed to give more colour and/or resolution (eg mono in full rez and 2bpp/half rez and 4bpp, medrez in 3 or 4bpp, original lowrez in 5 or 6 (or 8?) bpp, intermediate rez at 4bpp/reduced rez at 8bpp...), or colour and/or rez could be sacrificed to speed up the rest of the machine (especially making more use of the blitter and other DMA without slowing the CPU) by the MCU giving the shifter fewer cycles and the shifter running with a higher divider or shallower matrix depth... might be able to pull off some decent fresh tricks with that additional speed without the lower rez or colour set being immediately obvious...

(on the other hand, maybe the setup actually makes it harder to alter in that way vs the old assumptions?)

User avatar
Smonson
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 134
Joined: Sat Feb 20, 2016 9:45 am
Location: Canberra
Contact:

Re: ST Chipset decap

Postby Smonson » Wed Aug 01, 2018 9:23 am

MarkP wrote:I'm thinking in terms both of adjusting the bitdepth vs frequency relationship (e.g. 3bpp at 32/3 MHz, or 5bpp/6bpp at 32/5, 32/6MHz with extended shifter matrix and palette depth


That's an interesting idea... it would be achievable with the FPGA shifter hardware I've already built. 256x200x32 seems like a decent trade-off.

Luckily, the shifter has 5 address lines, so there are 15 empty register slots that aren't currently used. At least one more register would be needed to access the additional palette slots.

ijor
Hardware Guru
Hardware Guru
Posts: 3817
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Wed Aug 01, 2018 8:19 pm

MarkP wrote:I'm not sure if I'm reading it right, but does the shift matrix diagram (including that brilliant but kooky 2-bit-per-cell design that rolls the FIFO/IR into the shiftreg/RR) imply that it loads in 4 words just-in-time (except in the case of the every-16-pixels glitch) for the previous 4 to have run out in low rez, shunts them all down / shifts out the first pixel / starts loading the first word of the next block of four in more or less the one motion, and places four bits off the far right end of what are effectively 4 parallel, single tapped, 1-bit shift regs with each shift...?


To be honest, I'm not 100% sure I understand exactly everything in that very long sentence :), but mostly yes. That's how hardware works, in parallel. There is no "penalty" or "risk" for doing things in the precise last cycle. Actually it is harder and more expensive (requires more transistors) to do it otherwise.

And for medium it only loads in two words at a time, twice as often because the individual bits in each effectively-parallel-1-bit reg are then shifted out at twice the frequency, with the lower 2 bits then always being zero so only 4 colours can be produced despite 4 bits technically being shifted out... and for hi-rez it only loads in a single word, and performs the internal transfer 4x as often as in low rez, with only the uppermost output ever changing?


The transfer from the FIFO/IR to the RR registers is always 4 words (64 bits), always after every 4 LOAD pulses, regardless of the resolution. What happens is that depending on the resolution the output of one shift register is connected to the input of the other (see the overall shift logic diagram at that same post with the shift matrix). So, for instance in high rez, the four shift registers are combined in a single 64-bit shift. So the unused output bits on the higher resolutions might be not zero, but they are masked out later by the palette lookup logic, and the monochrome output obviously uses a single bit. That reminds me that I still didn't post schematics of that later stages of the video output, sorry about that.

Versus the old model I remember seeing drawn up in the past where the IR and RR are separate 1-bit deep ...


I don't have right now the pictures of Alien's famous articles, but I believe he got the model mostly right in this regard. Combining 1 bit of IR and RR in a single cell is just a physical organization. Logically it behaves exactly the same as it were separate.

slingshot
Atari God
Atari God
Posts: 1258
Joined: Mon Aug 06, 2018 3:05 pm

Re: ST Chipset decap

Postby slingshot » Mon Jul 22, 2019 1:57 pm

ijor wrote:So the unused output bits on the higher resolutions might be not zero, but they are masked out later by the palette lookup logic, and the monochrome output obviously uses a single bit. That reminds me that I still didn't post schematics of that later stages of the video output, sorry about that.


Maybe we can see it now? ;)

slingshot
Atari God
Atari God
Posts: 1258
Joined: Mon Aug 06, 2018 3:05 pm

Re: ST Chipset decap

Postby slingshot » Thu Jul 25, 2019 7:09 pm

Meanwhile I also implemented a Shifter based on this great information. However I struggle at one point: Death of the left border. Inside the MMU/GLUE combo, the border is opened, the correct number of words are loaded (if 115 is the correct number). However the Shifter gets confused: sometimes the 3 leftover words are appearing at the beginning of the line, not at the end. I've made a capture:
lefborder.png

I switch the pixel clock as soon as CMPCS is asserted (when the transition made to low res from mono), but I assume the "fast" pixCntr counter still wrong there. However I cannot slow down the pixel clock before CMPCS. So I wonder where's the error:
- pixCtrEn is not delayed enough - but seems to work according to schamatics?
- CMPCS arrive too late after the first LOAD?
- pixel clock switched wrongly?
You do not have the required permissions to view the files attached to this post.

User avatar
npomarede
Atari God
Atari God
Posts: 1308
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: ST Chipset decap

Postby npomarede » Thu Jul 25, 2019 7:43 pm

Hi
I don't know if you took this into account, but this demo depends on the so called "wake up state", so it won't always work (on my STF, it works in WS1, WS2 and WS4 but not WS3)
Nicolas

slingshot
Atari God
Atari God
Posts: 1258
Joined: Mon Aug 06, 2018 3:05 pm

Re: ST Chipset decap

Postby slingshot » Thu Jul 25, 2019 8:33 pm

The MMU/Glue part is made from the GSTMCU schematics, which is AFAIK WS0 always. Will try to check it somehow, here I assume the CMPCS-LOAD difference which counts. However here it's very small - 3 32MHz cycles only.
In the shifter LOAD to pxCtrEn is only 1,5 32MHz cycles. Maybe too small?
(Also this problem hits Closure, from the SYNC logo at the beginning - through the whole demo).
Last edited by slingshot on Thu Jul 25, 2019 8:37 pm, edited 1 time in total.

User avatar
npomarede
Atari God
Atari God
Posts: 1308
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: ST Chipset decap

Postby npomarede » Thu Jul 25, 2019 8:36 pm

Ah OK, you're talking about the STE shifter. but in that case AFAIR the death of the left border doesn't work on STE.

slingshot
Atari God
Atari God
Posts: 1258
Joined: Mon Aug 06, 2018 3:05 pm

Re: ST Chipset decap

Postby slingshot » Fri Jul 26, 2019 10:00 am

Not sure - it opens the border correctly (the GLUE/MMU part works), just the planes are disordered.
Also the same with Closure - btw, is there more versions of that demo, I read in a topic that it uses the illegal shiftmode ("11") to stop the shifter pixel clock, but I couldn't capture such an event.

Upd.: I think an important info is missing from this thread: I read at another thread by @ijor that the palette registers are latches. So as soon as CMPCS_N goes down, the gates are opened, and the values on the data bus are enabled to the inner processes. But the shiftmode register seems to be a D flip-flop, clocked by CMPCS_N, so it is latched only at the end of the chip select cycle (I would guess it sooner, since it's the same in the GSTMCU). However simply storing it at the positive edge wasn't enough, I had to add 2 (32MHz) clock cycle delay also. This fixed the Death of the left border, and also Closure! (Well, the vertical scroller still sux, but most part of the demo is correct). I don't know if I just emulated some internal or chip-to-chip delays with this, or it's just an ugly hack, but works so far.

ijor
Hardware Guru
Hardware Guru
Posts: 3817
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Tue Jul 30, 2019 8:57 pm

Hi,

I'm currently on a trip, at this time I can't really check anything. I will only reply and comment about the global concepts for the time being ...

slingshot wrote:Meanwhile I also implemented a Shifter based on this great information. However I struggle at one point: Death of the left border. Inside the MMU/GLUE combo, the border is opened, the correct number of words are loaded (if 115 is the correct number). However the Shifter gets confused: sometimes the 3 leftover words are appearing at the beginning of the line, not at the end.


This is a stabilization issue. As you probably know by now, Shifter is designed for a line length where the the number of words is a multiple of four. If the number of words in a line are not a multiple of four, then Shifter might not process the end of line correctly. For this purpose software that opens the border usually implements something so called stabilization.

There are several possibilities that could break stabilization. The main suspect is probably a wrong implementation of the Reload Control Logic, but again, it might something else.

npomarede wrote:Ah OK, you're talking about the STE shifter. but in that case AFAIR the death of the left border doesn't work on STE.

Not sure - it opens the border correctly (the GLUE/MMU part works), just the planes are disordered.


I don't know for sure and have no way to check it right now, if indeed this demo works or not on a real STE. But Nicolas is probably correct. If so, then something in your implementation is not correct.

Also the same with Closure - btw, is there more versions of that demo, I read in a topic that it uses the illegal shiftmode ("11") to stop the shifter pixel clock, but I couldn't capture such an event.


I'm not sure, but I think there is only one single public version. For sure that the public version doesn't set both bits of the resolution register.

Upd.: I think an important info is missing from this thread: I read at another thread by @ijor that the palette registers are latches. So as soon as CMPCS_N goes down, the gates are opened, and the values on the data bus are enabled to the inner processes.


The palette registers are indeed asynchronous transparent latches. But this is not relevant for the issues you mention. The palette lookup happens at a later stage. It should not affect anything related to borders or stabilization. It is relevant mostly only for the so called Spectrum 512 effect that change palette registers on the fly.
Fx Cast: Atari St cycle accurate fpga core

slingshot
Atari God
Atari God
Posts: 1258
Joined: Mon Aug 06, 2018 3:05 pm

Re: ST Chipset decap

Postby slingshot » Wed Jul 31, 2019 8:34 am

ijor wrote:Hi,

I'm currently on a trip, at this time I can't really check anything. I will only reply and comment about the global concepts for the time being ...


I wish you to enjoy your trip!

This is a stabilization issue. As you probably know by now, Shifter is designed for a line length where the the number of words is a multiple of four. If the number of words in a line are not a multiple of four, then Shifter might not process the end of line correctly. For this purpose software that opens the border usually implements something so called stabilization.


AFAIK the Death of the left border was one of the first demos which opened the horizontal border, so the concept of stabilization probably wasn't known. I can accept this demo might or might not work (but actually made it work now).
However this reload after 4 load thing causes an interesting issue with STe hard scroll. STe in mono and mid modes is fetching 1 or 2 extra words at the beginning of the line, so it should do something with those extras at the line end, otherwise the next line will break (and the STe test cartridge has a scroll test at mid mode, which is not broken). It might have an internal stabilization when hard scroll is used.

The palette registers are indeed asynchronous transparent latches. But this is not relevant for the issues you mention. The palette lookup happens at a later stage. It should not affect anything related to borders or stabilization. It is relevant mostly only for the so called Spectrum 512 effect that change palette registers on the fly.



Meanwhile I figured out what was wrong: while the palette lookup is not important in this regard, the latching point of the resolution change is. Until the shifter "sees" the change from mono to low, it "fast-forwards" the pixel counter with 32 MHz to a point which will be OK for the later stages. Now this point for me is 2 clocks after CMPCS goes high (which should be not too far away from the real thing).

This is my implementation:
async logic, like on your schematics: https://github.com/gyurco/gstmcu/blob/m ... eo_async.v
sync to clk32 logic: https://github.com/gyurco/gstmcu/blob/m ... er_video.v

User avatar
npomarede
Atari God
Atari God
Posts: 1308
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: ST Chipset decap

Postby npomarede » Wed Jul 31, 2019 8:50 am

slingshot wrote:AFAIK the Death of the left border was one of the first demos which opened the horizontal border, so the concept of stabilization probably wasn't known. I can accept this demo might or might not work (but actually made it work now).
However this reload after 4 load thing causes an interesting issue with STe hard scroll. STe in mono and mid modes is fetching 1 or 2 extra words at the beginning of the line, so it should do something with those extras at the line end, otherwise the next line will break (and the STe test cartridge has a scroll test at mid mode, which is not broken). It might have an internal stabilization when hard scroll is used.

I was wrong with my previous comment, never trust your memory when you can check on the real HW :)
I just connected my old STE on my TV this morning and the demo is indeed working, no stabilization issue, overscan is correct (only on the left part of the 1st overscan line one can see a few extra white pixels, this should be emulated if not already the case :) )

This demo didn't use a stabilizer but instead relied on doing the hi/low switch to remove the left border during 16 cycles, which worked in most of the case on STF but depended on the wake up state.
Stable overscans later used a stabilizer and only 12 cycles to do the hi/low switch to remove left border.

Nicolas
You do not have the required permissions to view the files attached to this post.

slingshot
Atari God
Atari God
Posts: 1258
Joined: Mon Aug 06, 2018 3:05 pm

Re: ST Chipset decap

Postby slingshot » Wed Jul 31, 2019 8:57 am

npomarede wrote:This demo didn't use a stabilizer but instead relied on doing the hi/low switch to remove the left border during 16 cycles, which worked in most of the case on STF but depended on the wake up state.
Stable overscans later used a stabilizer and only 12 cycles to do the hi/low switch to remove left border.
Nicolas

Just for curiosity, what CPU instruction used which is 4 cycles longer? Or just a simple NOP inserted between the two writes?

It looks the same on MiST now :) There's only one issue with the left border removing demos: they leave so little back-porch area that they're very dark on my big TV. Mostly OK on my small monitor.

User avatar
npomarede
Atari God
Atari God
Posts: 1308
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: ST Chipset decap

Postby npomarede » Wed Jul 31, 2019 9:04 am

Yes, they used a NOP between 2 move #xx to remove the left border :

Code: Select all

cpu video_cyc= 32760 504@ 63 : 00000726 12bc 0002                MOVE.B #$02,(A1)
IO write.b $ff8260 = $02 pc=726
shifter=0x02 video_cyc_w=32768 line_cyc_w=0 @ nHBL=63/video_hbl_w=64 pc=726 instr_cyc=12
detect remove left
cpu video_cyc= 32772   4@ 64 : 0000072A 4e71                     NOP
cpu video_cyc= 32776   8@ 64 : 0000072C 12bc 0000                MOVE.B #$00,(A1)
IO write.b $ff8260 = $00 pc=72c
detect remove left with no stab
cpu video_cyc= 32788  20@ 64 : 00000730 4e71                     NOP
...
cpu video_cyc= 33136 368@ 64 : 000007B2 10bc 0000                MOVE.B #$00,(A0)
IO write.b $ff820a = $00 pc=7b2
sync=0x00 video_cyc_w=33144 line_cyc_w=376 @ nHBL=64/video_hbl_w=64 pc=7b2 instr_cyc=12
detect remove right
cpu video_cyc= 33148 380@ 64 : 000007B6 10bc 0002                MOVE.B #$02,(A0)
IO write.b $ff820a = $02 pc=7b6
sync=0x02 video_cyc_w=33156 line_cyc_w=388 @ nHBL=64/video_hbl_w=64 pc=7b6 instr_cyc=12

ijor
Hardware Guru
Hardware Guru
Posts: 3817
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Wed Aug 14, 2019 12:18 pm

slingshot wrote:However this reload after 4 load thing causes an interesting issue with STe hard scroll. STe in mono and mid modes is fetching 1 or 2 extra words at the beginning of the line, so it should do something with those extras at the line end, otherwise the next line will break (and the STe test cartridge has a scroll test at mid mode, which is not broken). It might have an internal stabilization when hard scroll is used.


Not sure about the STe, but ST Shifter can sometimes stabilize itself. It depends, of course, on the timing of the switches; some rarely used over/underscan modes don't need stabilizers. And sometimes it depends on the Shifter (not GLUE-MMU) wakeup.

Meanwhile I figured out what was wrong: while the palette lookup is not important in this regard, the latching point of the resolution change is. Until the shifter "sees" the change from mono to low, it "fast-forwards" the pixel counter with 32 MHz to a point which will be OK for the later stages. Now this point for me is 2 clocks after CMPCS goes high (which should be not too far away from the real thing).


The clock mux effect is delayed, of course. Can't be very specific about the STE. But you probably need more tests with other demos to calibrate it.
Fx Cast: Atari St cycle accurate fpga core

ijor
Hardware Guru
Hardware Guru
Posts: 3817
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Wed Aug 14, 2019 12:19 pm

Death of the left border

npomarede wrote:I was wrong with my previous comment, never trust your memory when you can check on the real HW :)
I just connected my old STE on my TV this morning and the demo is indeed working, no stabilization issue, overscan is correct (only on the left part of the 1st overscan line one can see a few extra white pixels, this should be emulated if not already the case :) )

This demo didn't use a stabilizer but instead relied on doing the hi/low switch to remove the left border during 16 cycles, which worked in most of the case on STF but depended on the wake up state.


Ah, I remember now about this demo. This demo depends on the Shifter wake up. It is not exactly a stabilization issue. But depending on the Shifter wakeup state you might get the every the "every 16 pixel background" effect. Probably also depends on the exact Shifter version as well.
Fx Cast: Atari St cycle accurate fpga core

User avatar
npomarede
Atari God
Atari God
Posts: 1308
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: ST Chipset decap

Postby npomarede » Wed Aug 14, 2019 7:18 pm

ijor wrote:Ah, I remember now about this demo. This demo depends on the Shifter wake up. It is not exactly a stabilization issue. But depending on the Shifter wakeup state you might get the every the "every 16 pixel background" effect. Probably also depends on the exact Shifter version as well.

Yes, on the tests I made on my STF some time ago, I noted that the demo worked in WS1,WS2 an WS4 but gave the vertical black bands in WS3 (WSx names are those from Troed's program to detect WS)

ijor
Hardware Guru
Hardware Guru
Posts: 3817
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Thu Aug 15, 2019 11:00 am

npomarede wrote:Yes, on the tests I made on my STF some time ago, I noted that the demo worked in WS1,WS2 an WS4 but gave the vertical black bands in WS3 (WSx names are those from Troed's program to detect WS)


I don't think it's about GLUE-MMU wake states, it depends on the Shifter wakeup.
Fx Cast: Atari St cycle accurate fpga core

User avatar
npomarede
Atari God
Atari God
Posts: 1308
Joined: Sat Dec 01, 2007 7:38 pm
Location: France

Re: ST Chipset decap

Postby npomarede » Thu Aug 15, 2019 12:34 pm

ijor wrote:
npomarede wrote:I don't think it's about GLUE-MMU wake states, it depends on the Shifter wakeup.

You're right, more tests might be needed to check it was not a coincidence with WS3.
BTW, IIRC you sent me some times ago a small program to "display" shitfer wake up state, by mixing low res and med res vertical patterns and seeing how they align together.
Did you write a more "final" version of this test ? Maybe something that could be mixed with glue-mmu wake states and display everything on a single screen ?


Social Media

     

Return to “Coding”

Who is online

Users browsing this forum: No registered users and 4 guests