blitter

All 680x0 related coding posts in this section please.

Moderators: simonsunnyboy, Mug UK, Zorro 2, Moderator Team

gwEm
Captain Atari
Captain Atari
Posts: 220
Joined: Tue Jun 08, 2004 4:43 pm
Location: London, UK
Contact:

blitter

Postby gwEm » Thu Oct 07, 2004 11:14 am

i've been looking into using the blitter over the past few days, and it seems not that hard as i thought, especially with the good guidance docs which are around the web.

anyway, a couple questions:

roughly how much speed up are we talking over using an 8mhz 68000, of course using the zero overhead manipulations increase the speedup... but lets talk about a 1:1 copy. is it 2 or 3 times? or less?

say we have two arrays of a few kb each. is there some trick with the blitter we can use to add these arrays? or in this case is it better to use the 68k? again i speak about standard ste.

thanks

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Re: blitter

Postby tobe » Thu Oct 07, 2004 11:38 am

gwEm wrote:i've been looking into using the blitter over the past few days, and it seems not that hard as i thought, especially with the good guidance docs which are around the web.

anyway, a couple questions:

roughly how much speed up are we talking over using an 8mhz 68000, of course using the zero overhead manipulations increase the speedup... but lets talk about a 1:1 copy. is it 2 or 3 times? or less?

The 1:1 copy is 2 nop / word. I've heard 68k is faster when using movem.l.

say we have two arrays of a few kb each. is there some trick with the blitter we can use to add these arrays? or in this case is it better to use the 68k? again i speak about standard ste.

thanks

If you plan to : x+y | x<-xs y<-ys, you should use the 68k, because the blitter can't perform maths, except when x and y masks are exclusives and you can use OR instead of ADD.
But for xs+ys, you can use the blitter, but i don't think it will be faster than 68k.

Tobe.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
frost
Captain Atari
Captain Atari
Posts: 365
Joined: Sun Dec 01, 2002 2:50 am
Location: Limoges
Contact:

Postby frost » Thu Oct 07, 2004 12:10 pm

tobé: no, the 68k is just a little slower (some cycles, not more) when using movem.l. You can easily say it's the same speed.
My blog, mostly about Atari and demo stuff.

User avatar
Greenious
Hardware Guru
Hardware Guru
Posts: 1440
Joined: Sat Apr 24, 2004 5:39 pm
Location: Sweden

Postby Greenious » Thu Oct 07, 2004 3:41 pm

You should take into account that switching the bus between the 68k and the blitter takes quite a few clockcycles. So trying to run the 68k and the blitter side by side will make it slooooooooow.
(ie, don't use the bus-dividing mode)

punkrulesok
Captain Atari
Captain Atari
Posts: 234
Joined: Tue Aug 05, 2003 7:34 pm

Postby punkrulesok » Thu Oct 07, 2004 4:15 pm

I am right it saying that the Blitter in the ST is not designed to be used in parallel with the CPU?

Really. How useful is the blitter in the ST?
For example: could it be used to move sprites around the screen, saving CPU time?

User avatar
Greenious
Hardware Guru
Hardware Guru
Posts: 1440
Joined: Sat Apr 24, 2004 5:39 pm
Location: Sweden

Postby Greenious » Thu Oct 07, 2004 4:42 pm

punkrulesok wrote:I am right it saying that the Blitter in the ST is not designed to be used in parallel with the CPU?

Really. How useful is the blitter in the ST?
For example: could it be used to move sprites around the screen, saving CPU time?


If you look at the hardware end of how the blitter, CPU, DMA, MMU & Shifter works:

The MMU controls all memory access.
It is a simple device though, and splits the memory accesses into 250 ns slices (2 clockcycles). 250 ns is just enough to access memory once.

Now, the MMU splits the memory accesses evenly between the CPU bus & the videoshifter. So the shifter gets 1 go at memory, and the CPU the next. The CPU thus gets one go at memory for every 4 clockcycles. This is also the reason all CPU instructions on ST(E) always take n*4 clockcycles to complete.

Now, where does the DMA & blitter fit into this?

Whenever the DMA or blitter wants to access memory, the CPU is disconnected from the bus, and either the DMA or blitter is awarded the memory access slots the CPU usually is given, instead.

Unfortunately, disconnecting a device from the bus, and connecting another, takes a few clockcycles aswell. I'm not sure how many, but a figure that keeps popping up in my head, is 36 clockcycles.

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Thu Oct 07, 2004 7:23 pm

punkrulesok wrote:I am right it saying that the Blitter in the ST is not designed to be used in parallel with the CPU?

Really. How useful is the blitter in the ST?
For example: could it be used to move sprites around the screen, saving CPU time?

Yes, blitter was made to draw sprites (and other things). It's not only faster, it have a lot of options, and divide the needed memory for sprites by 16 because you don't have to store shifted ones.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

ijor
Hardware Guru
Hardware Guru
Posts: 3790
Joined: Sat May 29, 2004 7:52 pm
Contact:

Postby ijor » Thu Oct 07, 2004 7:39 pm

The MMU controls all memory access. It is a simple device though, and splits the memory accesses into 250 ns slices (2 clockcycles). 250 ns is just enough to access memory once.

Now, the MMU splits the memory accesses evenly between the CPU bus & the videoshifter. So the shifter gets 1 go at memory, and the CPU the next.


Btw, it’s interesting to note that the MMU is the chip that actually implements most of the “smartness” that people attribute to the SHIFTER.

All the border-removal and other video tricks affect the MMU and not the SHIFTER. The official Atari documentation (and third party one like the Internal’s book) is misleading. The lower I/O addresses of the video I/O map ($FF8201 to $FF820A) are actually MMU registers.

The SHIFTER is indeed not much more than a shifter, planar conversion and a palette lookup device.

User avatar
Greenious
Hardware Guru
Hardware Guru
Posts: 1440
Joined: Sat Apr 24, 2004 5:39 pm
Location: Sweden

Postby Greenious » Thu Oct 07, 2004 7:53 pm

Btw, it’s interesting to note that the MMU is the chip that actually implements most of the “smartness” that people attribute to the SHIFTER.

All the border-removal and other video tricks affect the MMU and not the SHIFTER. The official Atari documentation (and third party one like the Internal’s book) is misleading. The lower I/O addresses of the video I/O map ($FF8201 to $FF820A) are actually MMU registers.

The SHIFTER is indeed not much more than a shifter, planar conversion and a palette lookup device.


Actually, it is GLUE that tells the MMU to show graphics. Which in turn tells the shifter what to do. :) That's why hardware overscan only works on ST, and not STE. (GLUE & MMU is integrated, so you can't intercept the GLUEs display signal to the MMU)

They all work intimately together, so you can't really talk about individual ICs.

gwEm
Captain Atari
Captain Atari
Posts: 220
Joined: Tue Jun 08, 2004 4:43 pm
Location: London, UK
Contact:

Postby gwEm » Mon Oct 11, 2004 9:25 am

thanks for the responses. seems like a case of suck it and see ;) i'll investigate - maybe i can come up with a nice algorithm.

G

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Wed Oct 13, 2004 1:52 pm

Greenious wrote:Unfortunately, disconnecting a device from the bus, and connecting another, takes a few clockcycles aswell. I'm not sure how many, but a figure that keeps popping up in my head, is 36 clockcycles.

I was a bit curious about this, so i did a little test.

68000:
-set the bgcolor to blue
-start the blitter
-nop

Blitter:
-set the bgcolor to white
-set the bgcolor to red

68000:
-set the bgcolor to green
-set the bgcolor to black

The result is shown in the following picture, time needed to switch from 68000 to Blitter is very short, and you will notice there's a nop after starting the blitter, to avoid the execution of the next instruction before the blitter operation was started.
You do not have the required permissions to view the files attached to this post.
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
Greenious
Hardware Guru
Hardware Guru
Posts: 1440
Joined: Sat Apr 24, 2004 5:39 pm
Location: Sweden

Postby Greenious » Wed Oct 13, 2004 9:08 pm

Yeah, 36 clockcycles it's not that much, or what it is. Would be interesting if someone could give an exact figure.

But it does have a impact on performance if you try to use the blitter & cpu in cpu-bus sharing mode. They get 64 clockcycles each, but you lose a lot switching back and forth.

User avatar
unseenmenace
Atari God
Atari God
Posts: 1961
Joined: Tue Sep 21, 2004 9:33 pm
Location: Margate, Kent, UK
Contact:

Postby unseenmenace » Wed Oct 13, 2004 9:48 pm

Am I right in thinking then that if you used the blitter in hog mode but got all the main code and blitting done within a single VBL then it would be worth using the blitter for sprites.
UNSEEN MENACE
Several STFM's, 4MB STE, 2MB TT with 1.2GB Hard Drive and 14MB Falcon with 540MB Hard Drive,
Lynx 2 and Jaguar with JagCD
Member of GamebaseST and AtariLegend team
Check out my website at http://unseenmenace.110mb.com

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Wed Oct 13, 2004 10:34 pm

Greenious wrote:Yeah, 36 clockcycles it's not that much, or what it is. Would be interesting if someone could give an exact figure.

But it does have a impact on performance if you try to use the blitter & cpu in cpu-bus sharing mode. They get 64 clockcycles each, but you lose a lot switching back and forth.

It's quite easy, the beam draw 1 pixel per cycle :)
So, looking at the screenshot and the cycle table, you can see it take 4 cycles (1 nop) to switch from cpu to blitter and vice versa.

The cpu/blitter sharing mode is a strange feature... I used it for Roger because of the sidsound interrupts, and i used a trick called fast-restart to grab cycles from cpu. It worked fine and i don't thing it would be possible to scroll an entire screen with pixel precision only with 68000 at the same speed. But you can't use effects like rasters or overscan with interrupts while using blitter in sharing mode.

unseenmeance : sorry, my english is terribly poor and i don't think i can understand what you mean :oops:
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

User avatar
Nyh
Atari God
Atari God
Posts: 1496
Joined: Tue Oct 12, 2004 2:25 pm
Location: Netherlands

Postby Nyh » Thu Oct 14, 2004 7:47 am

tobe wrote:But you can't use effects like rasters or overscan with interrupts while using blitter in sharing mode.


For rasters and overscan you can use the hogging mode. Just give the blitter so much to do so it is finished when you have to switch colors or open borders.

Hans Wessels

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Thu Oct 14, 2004 11:43 am

Nyh wrote:
tobe wrote:But you can't use effects like rasters or overscan with interrupts while using blitter in sharing mode.


For rasters and overscan you can use the hogging mode. Just give the blitter so much to do so it is finished when you have to switch colors or open borders.

Hans Wessels

Yes of course, some kind of 'hand made' bus sharing, nice idea !
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.

Frank
Retro freak
Retro freak
Posts: 11
Joined: Thu Oct 14, 2004 7:56 pm

Postby Frank » Thu Oct 14, 2004 8:08 pm

Hi. For drawing sprites the blitter is quite a bit faster than the 68k and you get your shifts for free.

Each memory access costs the blitter four clock cycles (5 IIRC on the Falcon).
Clearing and filling memory is the fastest operation taking only 4 cycles.
Things get a bit more complex when you take into account the endmasks.

:)

A copy takes 8 cycles per word SRC (4 cycle read), DST (4 cycle read)

A logical operation say DST = DST & SRC takes 12 cycles IIRC.
4 to read the source
4 to read the destination
4 cycles to write out the result to destination

To mask a BOB on the screen (AND then OR) takes 24 clock cycles per word excluding set up time.

I once hacked away at a blitter routine of mine to see how quick I could make it. I managed to get 31 32*30 2 plane bobs on screen a frame on a standard STE. The only cheat was I didn't mask the very first one ;)
People of the world can be classified as 10 groups. Those that understand binary and those that don't ;)

punkrulesok
Captain Atari
Captain Atari
Posts: 234
Joined: Tue Aug 05, 2003 7:34 pm

Postby punkrulesok » Mon Oct 18, 2004 10:33 am

Just out of interest, what are the bext examples of Blitter use on games / demos? :)

Tobe: Do you use the blitter to draw the sprites and background in Roger?

User avatar
tobe
Atari God
Atari God
Posts: 1459
Joined: Sat Jan 24, 2004 10:06 am
Location: Lyon, France
Contact:

Postby tobe » Mon Oct 18, 2004 11:54 am

punkrulesok wrote:Just out of interest, what are the bext examples of Blitter use on games / demos? :)

Tobe: Do you use the blitter to draw the sprites and background in Roger?

I use blitter in bus sharing mode with fast-restart for the background scrolling and hog mode for sprites. I wrote Roger in GFA, and nothing can beat the blitter under GFA :)
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.


Social Media

     

Return to “680x0”

Who is online

Users browsing this forum: No registered users and 3 guests