ST Chipset decap

GFA, ASM, STOS, ...

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

ijor
Hardware Guru
Hardware Guru
Posts: 3418
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Sun Jan 07, 2018 1:18 pm

Steven Seagal wrote:Are you saying that the GLUE "M2CLOCK" can awake at an arbitrary 0-15 tick (~32mhz) on the STF relative to DCYC, and not always on the same n, n+4, n+8 or n+12 tick?


No, it isn't that bad either. It will (or should) trigger always within the same cycle.

MarkP
Atarian
Atarian
Posts: 5
Joined: Tue Jul 31, 2018 10:22 pm

Re: ST Chipset decap

Postby MarkP » Wed Aug 01, 2018 1:59 am

Wow... the ultimate level of hacking, peering right into the microstructure of the chips themselves, and using that to solve one particular longstanding enigma of how data moves through the shifter (that I've wondered about myself right since reading the classic ST Internals book way back in the day and seeing its rather sparing, mostly code-level description of how it appears to work from the outside).

I'm not sure if I'm reading it right, but does the shift matrix diagram (including that brilliant but kooky 2-bit-per-cell design that rolls the FIFO/IR into the shiftreg/RR) imply that it loads in 4 words just-in-time (except in the case of the every-16-pixels glitch) for the previous 4 to have run out in low rez, shunts them all down / shifts out the first pixel / starts loading the first word of the next block of four in more or less the one motion, and places four bits off the far right end of what are effectively 4 parallel, single tapped, 1-bit shift regs with each shift...? And for medium it only loads in two words at a time, twice as often because the individual bits in each effectively-parallel-1-bit reg are then shifted out at twice the frequency, with the lower 2 bits then always being zero so only 4 colours can be produced despite 4 bits technically being shifted out... and for hi-rez it only loads in a single word, and performs the internal transfer 4x as often as in low rez, with only the uppermost output ever changing?

Versus the old model I remember seeing drawn up in the past where the IR and RR are separate 1-bit deep, 64-bit wide registers with four taps arranged every 16 bits along the RR (like a more regularly spaced set of LFSR taps) and the output of *those* changing as the RR is rotated right with low using all four, medium using the second and fourth, and mono only using the fourth... with all modes loading four words into the IR at a time then dumping the full 64 bits down into the RR once full, at the exact same speed, with only the speed of rotation and number of taps changing (?)

What might that (and the actual clock divider structure) then mean for the possibility of tweaking the shifter for additional modes, both from a fantasy-historical and an actual (FPGA?) recreation basis? Easier, harder?

I'm thinking in terms both of adjusting the bitdepth vs frequency relationship (e.g. 3bpp at 32/3 MHz, or 5bpp/6bpp at 32/5, 32/6MHz with extended shifter matrix and palette depth, the latter being similar to the pixel clock used by a lot of 80s consoles plus computers like the MSX and the former being twice that, so 256 or 512 pixels would just hit overscan and 240/480 pixels for a slightly coarse 40 or 80 columns of text would just fit within the underscan limits... as well as a half Hrez 2bpp mode in mono), and aping the Amiga by changing the CPU (or Blitter, DMA controller, etc) vs shifter bandwidth relationship (as mediated by a slightly twiddled MCU)... so the speed available to the rest of the system could be sacrificed to give more colour and/or resolution (eg mono in full rez and 2bpp/half rez and 4bpp, medrez in 3 or 4bpp, original lowrez in 5 or 6 (or 8?) bpp, intermediate rez at 4bpp/reduced rez at 8bpp...), or colour and/or rez could be sacrificed to speed up the rest of the machine (especially making more use of the blitter and other DMA without slowing the CPU) by the MCU giving the shifter fewer cycles and the shifter running with a higher divider or shallower matrix depth... might be able to pull off some decent fresh tricks with that additional speed without the lower rez or colour set being immediately obvious...

(on the other hand, maybe the setup actually makes it harder to alter in that way vs the old assumptions?)

User avatar
Smonson
Obsessive compulsive Atari behavior
Obsessive compulsive Atari behavior
Posts: 119
Joined: Sat Feb 20, 2016 9:45 am
Location: Canberra
Contact:

Re: ST Chipset decap

Postby Smonson » Wed Aug 01, 2018 9:23 am

MarkP wrote:I'm thinking in terms both of adjusting the bitdepth vs frequency relationship (e.g. 3bpp at 32/3 MHz, or 5bpp/6bpp at 32/5, 32/6MHz with extended shifter matrix and palette depth


That's an interesting idea... it would be achievable with the FPGA shifter hardware I've already built. 256x200x32 seems like a decent trade-off.

Luckily, the shifter has 5 address lines, so there are 15 empty register slots that aren't currently used. At least one more register would be needed to access the additional palette slots.

ijor
Hardware Guru
Hardware Guru
Posts: 3418
Joined: Sat May 29, 2004 7:52 pm
Contact:

Re: ST Chipset decap

Postby ijor » Wed Aug 01, 2018 8:19 pm

MarkP wrote:I'm not sure if I'm reading it right, but does the shift matrix diagram (including that brilliant but kooky 2-bit-per-cell design that rolls the FIFO/IR into the shiftreg/RR) imply that it loads in 4 words just-in-time (except in the case of the every-16-pixels glitch) for the previous 4 to have run out in low rez, shunts them all down / shifts out the first pixel / starts loading the first word of the next block of four in more or less the one motion, and places four bits off the far right end of what are effectively 4 parallel, single tapped, 1-bit shift regs with each shift...?


To be honest, I'm not 100% sure I understand exactly everything in that very long sentence :), but mostly yes. That's how hardware works, in parallel. There is no "penalty" or "risk" for doing things in the precise last cycle. Actually it is harder and more expensive (requires more transistors) to do it otherwise.

And for medium it only loads in two words at a time, twice as often because the individual bits in each effectively-parallel-1-bit reg are then shifted out at twice the frequency, with the lower 2 bits then always being zero so only 4 colours can be produced despite 4 bits technically being shifted out... and for hi-rez it only loads in a single word, and performs the internal transfer 4x as often as in low rez, with only the uppermost output ever changing?


The transfer from the FIFO/IR to the RR registers is always 4 words (64 bits), always after every 4 LOAD pulses, regardless of the resolution. What happens is that depending on the resolution the output of one shift register is connected to the input of the other (see the overall shift logic diagram at that same post with the shift matrix). So, for instance in high rez, the four shift registers are combined in a single 64-bit shift. So the unused output bits on the higher resolutions might be not zero, but they are masked out later by the palette lookup logic, and the monochrome output obviously uses a single bit. That reminds me that I still didn't post schematics of that later stages of the video output, sorry about that.

Versus the old model I remember seeing drawn up in the past where the IR and RR are separate 1-bit deep ...


I don't have right now the pictures of Alien's famous articles, but I believe he got the model mostly right in this regard. Combining 1 bit of IR and RR in a single cell is just a physical organization. Logically it behaves exactly the same as it were separate.


Social Media

     

Return to “Coding”

Who is online

Users browsing this forum: No registered users and 2 guests