Eero Tamminen wrote:I wasn't clear on what I was asking was for, it was about what should happen on *each* executed instruction. Should I record just that there was miss, or how many misses executing the current instruction at this point (once) incurred ie. should I increase the sum of misses for given address by 0-1 or by 0-3.
I think recording the sum of misses by 0-3 is more useful - this way all information is captured.
Eero Tamminen wrote:Also, does the DSP have any kind of cache / cache misses?
No it has a simple onboard / external static ram arrangement for all memory, but it does have some less obvious complexities to it's (otherwise quite simple) timing scheme.
There are 3 internal memories (X: data, Y: data, P: program/instructions) and assuming all memory sources for a given instruction come from one of those, the timing is 'ideal' i.e. usually 2 oscillator clocks or whatever the manual says for that instruction (most are 2, 4 or 6).
If one of the sources used by the instruction (be that instruction fetch, or a data read/write) comes from external memory (anything above address $200 for X: and Y: or above address $100 for P:) this will be free - no effect on timing.
However if 2 or more sources come from external memory there is a 2-clock penalty for each source. So if an instruction is in external memory, and fetches from both X: and Y: in external memory, there is a 2x2 clock penalty on top of the instruction time.
There is a further complexity which is also covered in the manual but limited to certain instructions. There are costs for special addressing modes and for jump address calculation (jx), the latter being more complex. The manual says '6+jx' but jx can be quite large IIRC depending on the opcode encoding, and where the instruction lives! There are one or two other details but more trivial.
Eero Tamminen wrote:What about TT, does that also have burst mode normally off?
The TT sacrificed its Blitter to gain a 32bit bus, so I imagine it retained its burst mode. I don't know for sure though.
Eero Tamminen wrote:Latter point affects just timings, right? Anyway, that's more for Laurent than me...
I think so, although I don't know that it affects it in a trivial way I have seen a description/writeup of how the Falcon varies from Motorola timings because of this, but I can't vouch for that either. Laurent has a difficult task generally
I started writing a new benchmark tool (really an instruction pattern timing tool) to help me figure some of these things out for optimizations. Some of the things I found already were found by experiment and testing only.
Eero Tamminen wrote:I get following statistics:
- 0: 29282934
- 1: 43281
- 2: 3323
- 3: 6510
If this is instruction cache only, it looks reasonable for longword misses yes. I think it's important to keep i-cache and d-cache results separate though.
The analysis looks good! I'll do some tests with it as soon as I can. I think I know roughly what to expect already so I should be able to tell if things make sense in context or not (hopefully ).