SuperVidel performance boost for byte/word/long access

News, Support and Development discussions relating to SuperVidel.
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

shoggoth wrote:As mentioned in some other thread - this features has been implemented and will be included in the next driver revision.
Now, this was posted on "Tue Feb 18, 2014". Nature's website offers an updated package from "March 05 2014". However the archive contains SV_XBIOS.PRG from "15/05/2013".

So now I don't know whether PeP likes to have his Falcon running one year late or that the site offers wrong version. :-(
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

Aaaaaand I can reply myself because the test program is still available. The version on their website is still the slow one so we've been throttled to 2/3 of possible performance for THREE YEARS so far. :-(
User avatar
leech
Atari God
Atari God
Posts: 1484
Joined: Tue Dec 01, 2015 3:26 pm

Re: SuperVidel performance boost for byte/word/long access

Post by leech »

Ouch, so where is the newest driver?
Atari 8Bits: 800xl, 600xl, XEGS, 800, 130xe, 130xe (VBXE, U1MB, Stereo POKEY)
Atari STs: 1040STf (broken shifter), 1040STe, Mega STe, TT030, Falcon (CT60e, SuperVidel)
User avatar
shoggoth
Nature
Nature
Posts: 1447
Joined: Tue Aug 01, 2006 9:21 am
Location: Halmstad, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by shoggoth »

Communications mishap, I think. Sorry about that ! Good thing we found out about it :)
Ain't no space like PeP-space.
User avatar
wongck
Ultimate Atarian
Ultimate Atarian
Posts: 13541
Joined: Sat May 03, 2008 2:09 pm
Location: Far East
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by wongck »

Isn't it normal we wait for long time for a good piece of hardware or software.
it will finally be delivered sometime in the future.
My Stuff: FB/Falcon CT63 CTPCI ATI RTL8139 USB 512MB 30GB HDD CF HxC_SD/ TT030 68882 4+32MB 520MB Nova/ 520STFM 4MB Tos206 SCSI
Shared SCSI Bus:ScsiLink ethernet, 9GB HDD,SD-reader @ http://phsw.atari.org
My Atari stuff that are no longer for sale due to them over 30 years old - click here for list
instream
Nature
Nature
Posts: 176
Joined: Mon Aug 03, 2009 9:08 am
Location: Floda, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by instream »

Now the drivers are available at nature.atari.org for download :D just took three years...
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

Great! Now the only missing piece is a GEM app for FW11 updates. ;-) (for us suckers without a JTAG cable...)
User avatar
leech
Atari God
Atari God
Posts: 1484
Joined: Tue Dec 01, 2015 3:26 pm

Re: SuperVidel performance boost for byte/word/long access

Post by leech »

Even though I ended up getting a JTAG cable, I am still at FW10. Will possibly update it soon.
Atari 8Bits: 800xl, 600xl, XEGS, 800, 130xe, 130xe (VBXE, U1MB, Stereo POKEY)
Atari STs: 1040STf (broken shifter), 1040STe, Mega STe, TT030, Falcon (CT60e, SuperVidel)
Rustynutt
Atari God
Atari God
Posts: 1847
Joined: Wed Mar 21, 2012 7:38 am
Location: Oregon

Re: SuperVidel performance boost for byte/word/long access

Post by Rustynutt »

instream wrote:Now the drivers are available at nature.atari.org for download :D just took three years...
As a late comer to installing my SV, wish to express my gratification to all supporting the project over years and their long hours!
instream
Nature
Nature
Posts: 176
Joined: Mon Aug 03, 2009 9:08 am
Location: Floda, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by instream »

Rustynutt wrote:
instream wrote:Now the drivers are available at nature.atari.org for download :D just took three years...
As a late comer to installing my SV, wish to express my gratification to all supporting the project over years and their long hours!
Thank you :)
instream
Nature
Nature
Posts: 176
Joined: Mon Aug 03, 2009 9:08 am
Location: Floda, Sweden
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by instream »

mikro wrote:Great! Now the only missing piece is a GEM app for FW11 updates. ;-) (for us suckers without a JTAG cable...)
Maybe I can kick my brother Henrik into doing that. He has a streak of GEM-coding right now. :D
User avatar
leech
Atari God
Atari God
Posts: 1484
Joined: Tue Dec 01, 2015 3:26 pm

Re: SuperVidel performance boost for byte/word/long access

Post by leech »

Sweet, while you've got the whip on him, get him to code... kidding!!

My brother doesn't do much, I should make him start coding on the Atari.
Atari 8Bits: 800xl, 600xl, XEGS, 800, 130xe, 130xe (VBXE, U1MB, Stereo POKEY)
Atari STs: 1040STf (broken shifter), 1040STe, Mega STe, TT030, Falcon (CT60e, SuperVidel)
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

shoggoth wrote: Fri Jan 31, 2014 12:49 pm
mikro wrote:It doesn't require anything else. It just "limits" the amount of usable memory to 1 GB, then it perhaps crashes on an PMMU error because there are no PMMU entries for addresses between 0x40000000 and 0x7FFFFFFF (and these are no longer translated as copy back, they are illegal from now). So the correct solution would be of course to map the SV-RAM area (0xA0000000 - 0xA7FFFFFF) in the PMMU tree but it's too much work for too little (zero) effect. It's software, can be changed later ;)
Awesome. I'll make this optional for now in case it clashes with some other hardware (CTPCI?).
As I occasionally ask myself how exactly was this supposed to work, I'll make a public note here. :)

Before my "patch", the PMMU was set in the following way:

0x00000000 - 0x20EFFFFF: translated using the PMMU (527 MiB: 14 MiB ST RAM, 512 MiB TT RAM, I/O)
0x40000000 - 0x7FFFFFFF: marked as copyback (1 GiB)
0x80000000 - 0xFFFFFFFF: marked as cache-inhibited, precise (2 GiB: hardware regs)

After my "patch", the PMMU is set in the following way:

0x00000000 - 0x20EFFFFF: translated using the PMMU (527 MiB: 14 MB ST RAM, 512 MB TT RAM, I/O)
0x80000000 - 0xFFFFFFFF: marked as cache-inhibited, precise (2 GiB: hardware regs)
0xA0000000 - 0xA7FFFFFF: marked as cache-inhibited, imprecise (128 MiB: DDR2 SDRAM; overrides the previous one)

My statement from 2014 wasn't very precisely formulated: the change does not limit the amount of usable memory, just replaces the 1 GiB block with another (128 MiB) one. You might ask: doesn't this cause trouble for some (future?) hardware? And the answer is yes and no. ;)

If we are about to believe the CTPCI memory map, 0x40000000 should be occupied by 512 MiB of PCI address space and 512 MiB of reserved space. 0x40000000 can't be cached (otherwise the hardware couldn't get fresh data to display immediately) and there is even the "No cache" note next to it. So most likely 0x40000000 is marked the same way as the SuperVidel block (which is wrongly positioned and even marked in that map, btw), i.e. "cache-inhibited, imprecise", I'm assuming using the same registers as I'd used for SuperVidel (I'll try to verify this on a CTPCI machine).

So... when booting Falcon+CT60+CTPCI+SuperVidel (if it is even possible) with SV_XBIOS.PRG + SV.INF with "pmmu_boost = true", then yes, any write to the CTPCI address space will lead to a crash as there wouldn't be any PMMU entries to mark it as active (most likely as soon as AUTO folder executes SV_BIOS.PRG because CTPCI TOS would like to write some pixels).

However: as SuperVidel uses either the original Videl registers (not used in CTPCI) or its own patched NVDI output (CTPCI uses its own fVDI), clearly, it's insane have both at the same time because their VDIs would fight against each other. I can't remember whether you can disable video output in CTPCI and use just, say, USB or network cards. If this is the case then "pmmu_boost = false" has its place.

But let's be honest, how many people use CTPCI these days, let alone with SuperVidel. I may be the only one left with at least a theoretical option to do that. :) (and maybe I will try it, just for the sake of curiosity).

And as for new (CT60) hardware... yes, if something takes the free address space at 0x40000000 (or anywhere else) and would like to use the same PMMU register for its own setting, then it would conflict with "pmmu_boost = true".

But before that happens... make sure the option is enabled, my ScummVM port counts on it heavily! (otherwise you'd get worse performance with SuperVidel than without!)
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

mikro wrote: Sat Feb 18, 2023 3:47 pmIf we are about to believe the CTPCI memory map, 0x40000000 should be occupied by 512 MiB of PCI address space and 512 MiB of reserved space. 0x40000000 can't be cached (otherwise the hardware couldn't get fresh data to display immediately) and there is even the "No cache" note next to it. So most likely 0x40000000 is marked the same way as the SuperVidel block (which is wrongly positioned and even marked in that map, btw), i.e. "cache-inhibited, imprecise", I'm assuming using the same registers as I'd used for SuperVidel (I'll try to verify this on a CTPCI machine).
Thanks to Latz I was able to do so. CTPCI TOS indeed remaps 0x40000000 - 0x7FFFFFFF as cache-inhibited but ... precise! Not sure what is the reason here but if it is the same kind of oversight as with SuperVidel, CTPCI's memory access time could be similarly boosted by ~30%. However it is possible that the precise model is needed for USB and networking, as CTPCI handles not only graphics (in comparison to SuperVidel).

I guess that concludes my investigation. Using tools like Ozk's SET_MMU one could create a much nicer setup if needed.
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

OK, so I have done a few tests with the CTPCI as well.

As expected, there's absolutely no harm done with marking 0x40000000 - 0x7FFFFFFF as "cache-inhibited, imprecise". In both cases the precise model cripples copy speed from TT-RAM to VRAM at about 70% (SV: 26,109 MB/s -> 18,357 MB/s; CTPCI: 20,871 MB/s -> 14,894 MB/s).

So yeah, the absolute copy speed is about 5,2 MB/s lower than SuperVidel, not great, especially if you consider that TT-RAM -> TT-RAM peaks at about 28,3 MB/s. Also, what should everyone avoid doing is copying *from* VRAM to TT-RAM, be it SuperVidel (12,578 MB/s) or CTPCI (5,152 MB/s !!!).

Copying between VRAM (what you should never do, there's SuperBlitter / Radeon for that) is surprisingly both fast and balanced: SV 24,918 MB/s, CTPCI 20,871 MB/s.

Interestingly, Kronos didn't show up much speed boost in any of its tests, perhaps it is blitting just between VRAM regions.

For anyone interested I'm attaching a short prg which enables the imprecise model. Put it in AUTO as early as possible. Most likely it will have an effect only on the future ScummVM release (I plan to put the functionality in CTPCI TOS but that will be ... far away in the future).

Btw putting SuperVidel's VRAM at 0xA0000000 was really an unfortunate decision. Now it's completely impossible to use just xTT0 & xTT1 registers to make the proper mapping for both cards. If SV were placed at 0x30000000 as implied by the CTPCI map linked above, that would be piece of cake. My loud thinking in the post above omitted one super-important scenario: if you want PCI peripherals (USB, ethernet etc) and SuperVidel video ouput. Now for that scenario one has to completely rework the defined PMMU tree in CTPCI TOS instead of changing four register values. Or live with slow access to SuperVidel VRAM.
You do not have the required permissions to view the files attached to this post.
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

mikro wrote: Wed Jan 03, 2024 11:13 pmMost likely it will have an effect only on the future ScummVM release
OK, that was rather dumb. Of course it will have effect on anything using offscreen bitmaps (placed in TT-RAM). Also anything compiled with recent SDL, e.g. Led Blur should definitely show some FPS improvement.
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

I guess this is becoming my private CTPCI/SV investigation blog. :-) But I like to publish my findings because then I don't have to store/remember them by myself.

I found this old post of mine as a follow-up to this thread's original discovery: https://www.dhs.nu/bbs-ct60/index.php?request=13456.

And indeed, it is true. Even with the imprecise model, both SV and CTPCI are pretty angry at byte and word accesses:

SuperVidel:

Code: Select all

Copy a 32 MB TT-RAM -> TT-TRAM
------------------------------
single bytes (1 * move.b)     : 2.195000 secs, 14928.473804 kB/s
single words (1 * move.w)     : 1.554999 secs, 21072.682362 kB/s
single longwords (1 * move.l) : 1.324999 secs, 24730.584702 kB/s

Copy a 32 MB TT-RAM -> VRAM
---------------------------
single bytes (1 * move.b)     : 3.594999 secs, 9114.884316 kB/s
single words (1 * move.w)     : 2.034999 secs, 16102.219215 kB/s
single longwords (1 * move.l) : 1.274999 secs, 25700.412314 kB/s
CTPCI:

Code: Select all

Copy a 32 MB TT-RAM -> TT-TRAM
------------------------------
single bytes (1 * move.b)     : 2.225026 secs, 14727.018921 kB/s
single words (1 * move.w)     : 1.559999 secs, 21005.141669 kB/s
single longwords (1 * move.l) : 1.309999 secs, 25013.759554 kB/s

Copy a 32 MB TT-RAM -> VRAM
---------------------------
single bytes (1 * move.b)     : 6.940025 secs, 4721.596824 kB/s
single words (1 * move.w)     : 3.084999 secs, 10621.721433 kB/s
single longwords (1 * move.l) : 1.564973 secs, 20938.380407 kB/s
So while writing longs to SV/CTPCI VRAM is really a good idea, writing random bytes isn't, especially on CTPCI (about 3x slowdown but SV doesn't shine either). Now you ask, hey, what about the first post in this thread: SuperVidel performance boost for byte/word/long access ? Its linked post (https://dhs.nu/bbs-ct60/index.php?request=13274) clearly states writing 8-bit pixels so how that could have worked?

Well, even byte write access is accelerated (see Re: SuperVidel performance boost for byte/word/long access). Not by large but it could lead to the the 41 -> 62 FPS boost alone:

41 FPS (precise model) / 62 FPS (imprecise model) = 66% so we are basically running on 2/3 of possible speed.

If I take a look at the memspeed benchmark (a single move.b copy from TT-RAM to SV-RAM):

5393.91 KB/s (precise model) / 9114.88 KB/s (imprecise model) = 59,2% (CTPCI is much worse, 4286.20 KB/s / 4721.60 KB/s = 90.7%)

Not exactly a 1:1 match but keep in mind that a typical tunnel effect does also some texture lookup for each pixel so while the test demo was writing a pixel to SV RAM, 060 could have been reading texels from TT-RAM.

So no mystery here (perhaps the FSB buffer does help, too as implied in the first post) but it poses an interesting dilemma: if having a generic case (think ScummVM), what mode to set as default? Writing to TT-RAM and then move16 to SV RAM or directly writing to SV-RAM? (obviously, ScummVM usually copies surfaces around, i.e. not byte by byte but sometimes it does read each byte, modify it and write back and this certainly hurts).

Oh and move16. I always refer to it as move16 but in reality, on CT60 move16 definitely isn't the fastest way how to copy memory. The good old movem.l is. I plan to kick it out of ScummVM soon but I was busy with studying about CTPCI.

Btw, despite the popular (a decade old ;)) belief, there is a DMA burst read from TT-RAM to CTPCI VRAM available. However only for the vrt_cpyfm() function and only inside CTPCI TOS (i.e. even fvdi+radeon.sys can't use it). No clue why Didier didn't publish it. The opposite direction (VRAM -> TT-RAM) supposedly lead to freezes.
User avatar
saulot
Captain Atari
Captain Atari
Posts: 457
Joined: Sat Sep 18, 2004 9:09 pm
Location: Warszawa
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by saulot »

Btw, despite the popular (a decade old ;)) belief, there is a DMA burst read from TT-RAM to CTPCI VRAM available. However only for the vrt_cpyfm() function and only inside CTPCI TOS (i.e. even fvdi+radeon.sys can't use it). No clue why Didier didn't publish it. The opposite direction (VRAM -> TT-RAM) supposedly lead to freezes.
It's not belief, that was official. And you can't really use something if you doesn't know about. vrt_cpyfm() is vdi function, you don't want to use vdi, when you want to use xbios.

Last message I've got from R.Czuba regarding BURST mode, which was response to following question:
"...Are you planning adding BURST mode for PCI-to-Local Address (space #0
/ SDRAM) ? If it is possible, then what it would be the speed
increase in comparison to NO BURST mode (which is very slow at this
moment)?.."

Response was:
".. Not possible on CTPCI because of double speed of the SDRAM to PLX bridge.
Only burst from SDRAM to PCI is possible (I have to fix it)..."

I asked about it and an answer was "no" in 17.06.2011. That's it. Maybe it was fixed at some point partially, it's not that burst mode DMA didn't work in general. I think that I've heard about those readback from vram freezes back in the time (either from R.Czuba or Didier).

And even when it works it doesn't mean that it couldn't be much faster, which is disappointing. Here is original specs page: https://mikrosk.github.io/ct60tos/ctpci ... erview.htm.

Excerpt from original annoncement (from link above):
".. BURST transfers on PCI bus (133MB/s).
- 060 BURST transfers from/to PCI bus (write at 66/100Mhz = 68/103 MB/s).
- PLX BURST transfers from SDRAM (read = 66MB/s at 66 Mhz) & PLX SINGLE to SDRAM (write = 44MB/s at 66Mhz)..."

And current benchmarks show "~25mb/s", so it's still not there (maybe there's difference without DMA ?). I'm not hardware person, but something is 'little' off.. I would expect transfers like 44mib/s, because it looks like bottleneck on PLX bridge, but reality looks much different. MMU config improved things for sure.

I've also digged this mail from Didier, maybe it will help (it's from 2011-03-13), regarding sdram/radeon transfers. Maybe I should look through them and save everything for future generations ;)...
>>> And since I enabled RADEON_RENDER now it's possible to use texture function, there are an example with init.c inside the drivers when I add multiples atari logo to the bootscreen with an alpha level. This feature was disabled because I hd no idea for the VDI, but with a separate XBIOS call like VIDIX, all is possible.
>>> For speed if you want use the screen area like the Videl, Rodolphe must add burst transfers. I hope before 2020.
>>>
>> From CTPCI hardware docs it seems that ther is no burst for SDRAM, but it isn't clearly indicated if it was planned or not. There is only statement that it is configurable. I wonder how this burst would speed up the whole process. I wrote to Radolphe and asked him about it, I'm curious what he will say about it.
>
> I mean PCI, the actual CPLD support just single access, no burst with/without DMA. And seep is very slow on this bridge (PLX9054) with single access.
>
> Else for transfers between CT60 SDRAM / Radeon SDRAM you can use the PCI bridge.
> I added since the beginning of the project (July 2005 oups...) 4 functions to the original PCI BIOS (only from XBIOS):
>
> #define dma_setbuffer(pci_address,local_address,size) (long)trap_14_wlll((short)(350),(unsigned long)(pci_address),(unsigned long)(local_address),(unsigned long)(size))
> #define dma_buffoper(mode) (long)trap_14_ww((short)(351),(short)(mode))
> #define read_mailbox(mailbox,pointer) (long)trap_14_wwl((short)(352),(short)(mailbox),(unsigned long *)(pointer))
> #define write_mailbox(mailbox,data) (long)trap_14_wwl((short)(353),(short)(mailbox),(unsigned long)(data))
>
> You need only the 2 fisrt functions for the transfer:
>
> /* dir = 2 Local Bus To PCI */
> /* dir = 1 PCI to Local Bus */
> dma_setbuffer(src, dest, size);
> dma_buffoper(dir);
> while(dma_buffoper(-1) == 1); /* buzy */
>
> Warning, for convert PCI address (src here) to local memory PCI address, read the PCI BIOS. It's not faster that the CPU, but you can do something with the CPU during the transfer.
Regarding supervidel reads/write of byte/word sizes I wonder if it could be somehow fixed on firmware side. On some resolution you would like to modify memory in different way depending on pixel size. For 32bit argb or something like this fine, but for high color/8-bit modes you have to make for difficult decisions or eg. on aligned 8-bit depth modify four neighbouring bytes at once (,which is not standard use case). I think that Evil has raised similar thing on DHS forum, he wanted to know "how it should be done and what is best on SV". Which I think was a good question and probably was part of the reason you post those benchmarks, which is great to see :).
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

saulot wrote: Fri Jan 12, 2024 1:31 amIt's not belief, that was official. And you can't really use something if you doesn't know about. vrt_cpyfm() is vdi function, you don't want to use vdi, when you want to use xbios.
Indeed. I think part of the confusion (especially on my side) is that there are three BURST modes, exactly as you linked:

1. BURST transfers on PCI bus (133MB/s)
2. 060 BURST transfers from/to PCI bus (write at 66/100Mhz = 68/103 MB/s)
3. PLX BURST transfers from SDRAM (read = 66MB/s at 66 Mhz) & PLX SINGLE to SDRAM (write = 44MB/s at 66Mhz)

Also it seems that this differs from a DMA functionality, i.e. copying without CPU involvement. Then one can find various notes:

[Apr 14, 2010] Kronos 1.91 runs with DMA ON and all tests are ok. Sure there are some redraw problems like in the desktop... and it is crashing in 65K at start of OPEN GL test (no problem without DMA).

Note in 256 colors modes with OPEN GL test we get a result of 14365 with 100% 060 use and 14563 with DMA PLX use. So, we can say here that the DMA is replacing the 060 with good performances. The DMA is actually Single transfers (1 long at each transfer) and I have to finish the BURST mode (4 longs per transfer only in DMA read mode from SDRAM).

Now we can use DMA and continue to resolve the software problems...


[May 2, 2010] If you want use the DMA (and option inside the CPX / Video), you need to update only ABE hardware with the JTAG cable => 1500% for video ram writing test in TC :-) (Kronos 1.70). With this lasted ABE I got always random read problem (memory corruption) so DMA isn't used for reading video ram.

[Sep 13, 2010] Without burst DMA usage, on single PCI access, the PLX9054 is very slow.

[Dec 26, 2011] The only thing I'm not sure to finish rapidely is the BURST for PLX read from SDRAM (actually we have Single accesses): this is very complex timing problem. PLX write burst to SDRAM will never be possible as I always told because of clock speed difference between SDRAM and PLX (half speed) and no dedicated DATA bus to buffer the PLX DATA BURST.

[Jan 29, 2012] PLX DMA RED BURST from SDRAM, the last thing to be done.

So in the end, I have no clue. Maybe someone more knowledgeable about hardware could shed some light on this confusing terminology.
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

Now I'm re-reading what I wrote... I think the gist of it is this:

You can copy between PCI and SDRAM with CPU or with DMA. The CPU transfer is basically what I do with my benchmark.

Then there are the DMA functions (nowhere documented, argh!) which do basically the same and as Didier and Rodolphe had mentioned in those snippets, you can use them too but they are about the same speed. Because there is this "single PCI access" = 1 long written at time. However you can still use them for offloading the CPU which is really, really nice!

Then there's the theoretical possibility of using the burst READ access (as WRITE is mentioned as "impossible") and this functionality hasn't been finished and would greatly help with uploading textures to Radeon.

I'm still confused about the three bursts' meaning but from a developer's point of view, I think the explanation above makes sense. I'm especially confused by this note:

1500% for video ram writing test in TC :-) (Kronos 1.70). With this lasted ABE I got always random read problem (memory corruption) so DMA isn't used for reading video ram.

This clearly indicates that he's talking about DMA copying between SD RAM and PCI and that write accesses are still not reliable but where the heck the 1500% is coming from if the copy speed is about the same?

Btw there was an interesting post by Rodolphe:

Except something that I cannot correct : INT PLX line between PLX and CPLD needs a pull-up resistor that I forgot in the design.

Unfortunatelly there is no easy patch that can be done with iron solder except a micro surge operation on the PLX or CPLD pin : I do not recommend to try that!

So, Didier was in current since a long time (!) about that : the PLX INT principal use for him would to be informed (by an INT) for the end of a PLX DMA transfert. Dider has to check the DMAs status registers to see the end of the transfert. Sure this method consumes more CPU time, but we have no other choice !


If I'm reading this right, as the interrupt line between the PLX and CPLD doesn't exist, instead of relying on an interrupt "end of DMA transfer" one has to use polling, which is perhaps what Didier meant by that "while(dma_buffoper(-1) == 1); /* buzy */".
mikro
Hardware Guru
Hardware Guru
Posts: 4722
Joined: Sat Sep 10, 2005 11:11 am
Location: Kosice, Slovakia
Contact:

Re: SuperVidel performance boost for byte/word/long access

Post by mikro »

mikro wrote: Wed Jan 03, 2024 11:13 pmIn both cases the precise model cripples copy speed from TT-RAM to VRAM at about 70% (SV: 26,109 MB/s -> 18,357 MB/s; CTPCI: 20,871 MB/s -> 14,894 MB/s).

So yeah, the absolute copy speed is about 5,2 MB/s lower than SuperVidel, not great, especially if you consider that TT-RAM -> TT-RAM peaks at about 28,3 MB/s. Also, what should everyone avoid doing is copying *from* VRAM to TT-RAM, be it SuperVidel (12,578 MB/s) or CTPCI (5,152 MB/s !!!).

Copying between VRAM (what you should never do, there's SuperBlitter / Radeon for that) is surprisingly both fast and balanced: SV 24,918 MB/s, CTPCI 20,871 MB/s.
mikro wrote: Thu Jan 11, 2024 10:22 pm Oh and move16. I always refer to it as move16 but in reality, on CT60 move16 definitely isn't the fastest way how to copy memory. The good old movem.l is. I plan to kick it out of ScummVM soon but I was busy with studying about CTPCI.
As explained in https://www.atari-forum.com/viewtopic.p ... 85#p478885, this was a rather untrue claim. In fact, I have verified on both SuperVidel and CTPCI that for copying between VRAM and TT RAM (in any direction) move16 is the absolutely best possible option.

On SV, all combinations are ~30 MB/s (VRAM<->VRAM 24, as quoted above) with move16, basically doubling the speed against movem.l and others.

On CTPCI, writing to VRAM is fine (~30 MB/s) however reading from it, that's really terrible. move16 increased that 5 MB/s to 10 MB/s (and VRAM->VRAM from 4 MB/s to 6 MB/s, the quoted number above is wrong for some reason) but that's still 3x slower than the value we'd wish for.

But it was nice to see CTPCI running again.
Post Reply

Return to “SuperVidel”