Speech.tos
Speech.tos
Hi,
Does somebody have any technical infos, sources or docs about this proggy (speech.tos) ? It is quite short (28Kb), are the phoneme sample-data compacted or non_pcm format or ?
thx
Does somebody have any technical infos, sources or docs about this proggy (speech.tos) ? It is quite short (28Kb), are the phoneme sample-data compacted or non_pcm format or ?
thx
Hello !
This prog remember me the one i used to play on C64 ! it has the same voice
!
I think it generate waves from meta-data, the real work is done by the sequencer/mixer (no samples !).
Some recent voice generator use samples, but the trick is to mix not only phonemes but whole words togethers !
Bye !
Tobe.
edit : It could be nice to have a speech synthesis inside a demo screen to read the scrolltext with a ugly computer voice
!
This prog remember me the one i used to play on C64 ! it has the same voice

I think it generate waves from meta-data, the real work is done by the sequencer/mixer (no samples !).
Some recent voice generator use samples, but the trick is to mix not only phonemes but whole words togethers !
Bye !
Tobe.
edit : It could be nice to have a speech synthesis inside a demo screen to read the scrolltext with a ugly computer voice

step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
i recently dissasembled the speech.tos program. it seems it uses 3 sinus waveforms at different frequencies and amplitudes mixed toghether.. the third channel can be replaced with PSG noise to simulate a 'hiss'. i think this is a way to simulate the formants present in the human voice + adding some hiss.
i also understand how it makes the PSG replay the mixed wave, i think just by the usual interrupt that sets PSG channel volumes. many sample replay routs use the same.
it's definitely not as natural sounding as a 'diphone' speech synth (about 3000 indivual samples required), but it sounds nicely robotty
anyway, i have absolutely _no_ idea how the wave amplitudes and frequencies are controlled. somewhere in between phoneme conversion and the sample interrupt it got a bit blurry
i also understand how it makes the PSG replay the mixed wave, i think just by the usual interrupt that sets PSG channel volumes. many sample replay routs use the same.
it's definitely not as natural sounding as a 'diphone' speech synth (about 3000 indivual samples required), but it sounds nicely robotty

anyway, i have absolutely _no_ idea how the wave amplitudes and frequencies are controlled. somewhere in between phoneme conversion and the sample interrupt it got a bit blurry

Hmm... interresting !
I've used the record as wav option in Steem to take a closer look at the wave-form and i've noticed most phonemes produced by speech.tos have a main periodic signal (but modulated with 'something' else).
I will try to experiment this way, with some sin & R6 noise.
Thanx for the clue
If you find how the amplitudes & frequencies are managed, i will be happy to know about that.
That sound robotty but would be pretty cool if used in a demo
I've used the record as wav option in Steem to take a closer look at the wave-form and i've noticed most phonemes produced by speech.tos have a main periodic signal (but modulated with 'something' else).
I will try to experiment this way, with some sin & R6 noise.
Thanx for the clue

If you find how the amplitudes & frequencies are managed, i will be happy to know about that.
That sound robotty but would be pretty cool if used in a demo

- Nils Schneider
- Atari User
- Posts: 42
- Joined: Tue May 02, 2006 12:20 am
- Location: Neuss, Germany
- Contact:
- karlm
- Atari Super Hero
- Posts: 717
- Joined: Thu Nov 13, 2003 4:09 am
- Location: Top of the World - Australia
Man do I feel old now 
I remembered even the filename of this baby.
Used to be c_say.zoo, had a hard time finding even an unzoo program!
Now in zip format.
Has documentation on how to interface the speech2.tos program using C and also the documented assember code for speech2.tos.
Don't ask me to explain it to you all though, my vocabulary is not that expansive
poo, I must nearly be as old as Mug ... now that is scary
cheers
karlm

I remembered even the filename of this baby.
Used to be c_say.zoo, had a hard time finding even an unzoo program!
Now in zip format.
Has documentation on how to interface the speech2.tos program using C and also the documented assember code for speech2.tos.
Don't ask me to explain it to you all though, my vocabulary is not that expansive

poo, I must nearly be as old as Mug ... now that is scary

cheers
karlm
You do not have the required permissions to view the files attached to this post.
karlm: i guess you did the same as i did but some years earlier 
don't know if i told this before, but speech.tos works like this:
1) convert text -> phoneme stuffs
2) magic waveform + hiss generation from phonemes
3) waveform + hiss replay using YM
(1) and (3) are relatively easy. (2) is quite illusive (magic) and not understood by anyone. that's why this is basically the only speech synth ever used on ST. I checked some PeeCee C source for a miniature speech synth but this sounded like complete nads compared to stspeech. the concept was almost the same.. converting phonemes into some sinewaves (matching formants). the speech synth in planet potion (amiga ppc demo) and windoze are pretty good, though.
i just hope someone more knowledgable on this stuff comes to visit this board some day..

don't know if i told this before, but speech.tos works like this:
1) convert text -> phoneme stuffs
2) magic waveform + hiss generation from phonemes
3) waveform + hiss replay using YM
(1) and (3) are relatively easy. (2) is quite illusive (magic) and not understood by anyone. that's why this is basically the only speech synth ever used on ST. I checked some PeeCee C source for a miniature speech synth but this sounded like complete nads compared to stspeech. the concept was almost the same.. converting phonemes into some sinewaves (matching formants). the speech synth in planet potion (amiga ppc demo) and windoze are pretty good, though.
i just hope someone more knowledgable on this stuff comes to visit this board some day..
- Mug UK
- Administrator
- Posts: 12298
- Joined: Thu Apr 29, 2004 7:16 pm
- Location: Stockport (UK)
- Contact:
Cheeky barst .. I may look 40+ but I'm only 36karlm wrote:Man do I feel old now
poo, I must nearly be as old as Mug ... now that is scary
karlm

Main site: www.mug-uk.co.uk - digging up bits from my past: Atari ST, ZX Spectrum, Sega 8-bit (game hacks) and NDS (ripping guides). I host a C64 Radio Show for a mate, Max Hall via www.chipsidshow.co.uk
I develop a free Word (for Windows) add-in for Word 2007 upwards. A toolbox that will allow power Word users to fix document errors. You can find it at: mikestoolbox.co.uk
I develop a free Word (for Windows) add-in for Word 2007 upwards. A toolbox that will allow power Word users to fix document errors. You can find it at: mikestoolbox.co.uk
STSPEECH has been converted to Windows and the Nintendo DS. Check this out: http://nintendo-ds.dcemu.co.uk/DSSpeech.shtml
and go nag the authors for a source relelase

Trivia: The original credits A.D.Beveridge as its co-author. Wonder if that was the same as the 'Andy Beveridge' shown in the credits of Cybercon III (assembly line games rulez
)
George
and go nag the authors for a source relelase


Trivia: The original credits A.D.Beveridge as its co-author. Wonder if that was the same as the 'Andy Beveridge' shown in the credits of Cybercon III (assembly line games rulez

George
is 73 Falcon patched atari games enough ? ^^
karlm wrote:Man do I feel old now
I remembered even the filename of this baby.
Used to be c_say.zoo, had a hard time finding even an unzoo program!
Now in zip format.
Has documentation on how to interface the speech2.tos program using C and also the documented assember code for speech2.tos.
Don't ask me to explain it to you all though, my vocabulary is not that expansive
poo, I must nearly be as old as Mug ... now that is scary
cheers
karlm





Thanks a lot Karlm for the sources

I'm commenting the code right now, it seems it need a few optimisations in the parsing function

Maybe a demo with speech soon who know ?
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
- lp
- Fuji Shaped Bastard
- Posts: 2821
- Joined: Wed Nov 12, 2003 11:09 pm
- Location: GFA Headquarters
- Contact:
Does anyone remember the commercial release 'Smooth Talker'? This app sounded really good, touch better than speech.tos ,but it was useless as it had an awkward GUI and you could not call it externally. Sounded great though. It would read text files out loud.
That one might be good one to hack if someone could dig up a copy.
That one might be good one to hack if someone could dig up a copy.

Re: Speech.tos
this program has also a multitude of parametrisations
* you can change the pitch
* modification of talking speed
* put in the phonemes directly
* insert pauses
I had quite some fun with it, but somehow forgot how it works.
The program even made it into the german charts in a techno song (U96: Das Boot)
1..2..3.. techno
Georges
* you can change the pitch
* modification of talking speed
* put in the phonemes directly
* insert pauses
I had quite some fun with it, but somehow forgot how it works.
The program even made it into the german charts in a techno song (U96: Das Boot)
1..2..3.. techno
Georges
- karlm
- Atari Super Hero
- Posts: 717
- Joined: Thu Nov 13, 2003 4:09 am
- Location: Top of the World - Australia
lolmuguk wrote:Cheeky barst .. I may look 40+ but I'm only 36karlm wrote:Man do I feel old now
poo, I must nearly be as old as Mug ... now that is scary
karlmMind you, the grey beard doesn't help matters!


/me young whippersnapper

glad you like it Tobe
cheers
karlm
gunstick:
didn't know it could change talking speed or pitch.. man.. it really is interesting stuffs.. wish i had smoe more time..
btw a bit off-topic, but i'm using a 4->16 bit adpcm for my demos now which seems to work very nice. i made a variation on this one:
http://www.syncscroller.net/psx/depack.c
is teh rules! =)
didn't know it could change talking speed or pitch.. man.. it really is interesting stuffs.. wish i had smoe more time..
btw a bit off-topic, but i'm using a 4->16 bit adpcm for my demos now which seems to work very nice. i made a variation on this one:
http://www.syncscroller.net/psx/depack.c
is teh rules! =)
Hoping to hear the results in a finished project soonearx wrote:gunstick:
didn't know it could change talking speed or pitch.. man.. it really is interesting stuffs.. wish i had smoe more time..
btw a bit off-topic, but i'm using a 4->16 bit adpcm for my demos now which seems to work very nice. i made a variation on this one:
http://www.syncscroller.net/psx/depack.c
is teh rules! =)

is 73 Falcon patched atari games enough ? ^^
We're all taking about speech engines!
You can always make your own speech engine out of samples of your own vocal tones. I created one of these a few years ago but the result had some imperfections. (Clicks between pheonems) I've never got around to fixing it:
1. Get a table of phonetic sounds.
2. Record all phonetic sounds into sampler using your own voice. (You should use constant vocal notes at the same pitch)
3. Write a program to convert text into phonetic strings.
4. The program should now sew bits of phonetic noise together and play it.
5. Playback pitch can be adjusted by expanding or compressing the wave.
You could probably do a Fourier analysis and "fade" the harmonics to create smooth transitions between pheonems.
You can always make your own speech engine out of samples of your own vocal tones. I created one of these a few years ago but the result had some imperfections. (Clicks between pheonems) I've never got around to fixing it:
1. Get a table of phonetic sounds.
2. Record all phonetic sounds into sampler using your own voice. (You should use constant vocal notes at the same pitch)
3. Write a program to convert text into phonetic strings.
4. The program should now sew bits of phonetic noise together and play it.
5. Playback pitch can be adjusted by expanding or compressing the wave.
You could probably do a Fourier analysis and "fade" the harmonics to create smooth transitions between pheonems.
Two years later... :)
I'm quite interested if there's any one able to explain how speech.tos generate it's phonemes.
Up to now, I'm using a set of short 4bit phonemes samples, but it still take too much space, even compressed (to fit in a 4Kb intro for exemple). That's why I'm still looking for a way to generate these samples using some simple additive synthesis (I do not care if it sound robotic or not, if it's understandable, I will be quite happy yet :).
The attachement below include 2 speech software i've found on the ST, if it can be of any use. Give an hear to the one in the "SMOOTH" folder :)
I'm quite interested if there's any one able to explain how speech.tos generate it's phonemes.
Up to now, I'm using a set of short 4bit phonemes samples, but it still take too much space, even compressed (to fit in a 4Kb intro for exemple). That's why I'm still looking for a way to generate these samples using some simple additive synthesis (I do not care if it sound robotic or not, if it's understandable, I will be quite happy yet :).
The attachement below include 2 speech software i've found on the ST, if it can be of any use. Give an hear to the one in the "SMOOTH" folder :)
You do not have the required permissions to view the files attached to this post.
I'm currently studying the source code "c_say.s", adding comments and labels. As soon as i get something readable, i will post it here.
edit
Does someone know what kind of phonemes it is ?
edit
Code: Select all
EY AY OY OW WX YX AE IY ER AO UX UH AH AA OH
AX IX IH EH DH ZH CH CH LX RX SH NX TH /H V
Z J L R W Y Q P T K B D G M N
F S - ? . UL UM UN IL IM IN
step 1: introduce bug, step 2: fix bug, step 3: goto step 1.
i now released a nicely fixed (and sub-optimal) version of STSPEECH for falcon (also all boosted ones: CT60, nemesis, CT2, phantom, etc): I removed the Self-Modifying Code, MOVEP instructions and YM shadow register access. The whole thing still runs in timer A and so, you can use it in parallel with most MP2/3 and MOD players.
the phonemes listed by tobe are very well disassembled in the C_SAY program. these phonemes match little programs like "noise pulse with specified frequency" or "3 formant waves at specified frequencies".
For instance, the "S" and "F" phonemes are noise-only using the pseudo-random generator of the YM2149. IIRC the "F" has a lower frequency than the "S", for the rest they are identical. And they don't even use an envelope (if i saw correctly) !
I now understand the whole ST Speech thing. The core of the thing is the phoneme table which contains little "structures" that translate phonemes into stuff a soundchip can understand.
ST Speech is not primarily very big because of these but also because of premultiplied sinewave tables that are DC.B'ed in the program.. Also the 8b sample -> 3 channel YM amplitude conversion table is DC.B'ed in there. Another costly thing is the code that translates english into phonemes and even the prompt (User Interface). When you trash some of this code and replace the rest with table generation code you may end up with a 4Ktro yet. Or a very attractive engine for use in combination with a 96ktro.
I can post my Falcon/TT/CT60 fixed C_SAY here this evening.
the phonemes listed by tobe are very well disassembled in the C_SAY program. these phonemes match little programs like "noise pulse with specified frequency" or "3 formant waves at specified frequencies".
For instance, the "S" and "F" phonemes are noise-only using the pseudo-random generator of the YM2149. IIRC the "F" has a lower frequency than the "S", for the rest they are identical. And they don't even use an envelope (if i saw correctly) !
I now understand the whole ST Speech thing. The core of the thing is the phoneme table which contains little "structures" that translate phonemes into stuff a soundchip can understand.
ST Speech is not primarily very big because of these but also because of premultiplied sinewave tables that are DC.B'ed in the program.. Also the 8b sample -> 3 channel YM amplitude conversion table is DC.B'ed in there. Another costly thing is the code that translates english into phonemes and even the prompt (User Interface). When you trash some of this code and replace the rest with table generation code you may end up with a 4Ktro yet. Or a very attractive engine for use in combination with a 96ktro.
I can post my Falcon/TT/CT60 fixed C_SAY here this evening.
- lp
- Fuji Shaped Bastard
- Posts: 2821
- Joined: Wed Nov 12, 2003 11:09 pm
- Location: GFA Headquarters
- Contact:
If anyone happens to have a Hades, FSPEECH works. Very cool.
Nice work earx.
I downloaded the c_say archive, noticed there was a file called say.o in there. Appears to be a DRI format object file, so I linked this against a test program I made in GFA, but I don't get what sounds like correct speech. I then compiled the testsay.s, which worked. So I took the sayspl.s, added opt l2 and uncommented the .global lines and built an new object file. If I link against this new object file I get correct speech. My Hades uttered the word "Atari" from a compiled GFA program. lol
Do you plan to put the "english -> phonetic text" routine back in?
Nice work earx.

I downloaded the c_say archive, noticed there was a file called say.o in there. Appears to be a DRI format object file, so I linked this against a test program I made in GFA, but I don't get what sounds like correct speech. I then compiled the testsay.s, which worked. So I took the sayspl.s, added opt l2 and uncommented the .global lines and built an new object file. If I link against this new object file I get correct speech. My Hades uttered the word "Atari" from a compiled GFA program. lol
Do you plan to put the "english -> phonetic text" routine back in?