Speech and autism

(more on autism)

Ok, so I ripped the sound off a video called "Straight talk about autism: adolescent issues", and also "21-up" (from the 7-up series) for some controls, extracted a bunch of samples. I have samples of autistic adolescents, their parents, and some 21-year olds.

Tried a bunch of different ways of analysing them. Couldn't find any particular difference in distribution of dynamics. Perhaps I'm not analysing it right, perhaps the sound has had dynamic range compression, perhaps there's no effect.

Did find an effect in the duration of utterances however.

What I did was I split each sample into speech and non-speech (see Appendix). Then made up a list of the duration of each segment of speech, roughly corresponding to syllables. Fitted Levy-stable distributions to this in the same way I've been doing to paragraph lengths.

Results (leftmost column is alpha value, 1.0=Cauchy to 2.0=Gaussian):

1.38   Parent female 2, sample 2
1.39     21up female 1, sample 1
1.45 Autistic female 1, sample 2
1.45 Autistic male   3, sample 5
1.45     21up male   3, sample 1
1.51   Parent female 3, sample 1
1.53   Parent female 5, sample 2
1.57     21up male   4, sample 1
1.61 Autistic male   1, sample 3
1.63   Parent female 1, sample 1
1.63   Parent female 5, sample 1
1.63     21up male   4, sample 2
1.69   Parent female 4, sample 1
1.74   Parent male   1, sample 2
1.75   Parent female 2, sample 3
1.76 Autistic female 2, sample 1
1.80 Autistic female 1, sample 1
1.80     21up male   1, sample 1
1.81   Parent female 4, sample 2
1.85     21up female 2, sample 1
1.85     21up male   2, sample 2
1.96 Autistic male   3, sample 1
2.00   Parent female 2, sample 1
2.00   Parent female 5, sample 3
2.00   Parent male   1, sample 1
2.00 Autistic male   1, sample 1
2.00 Autistic male   1, sample 2
2.00 Autistic male   2, sample 1
2.00 Autistic male   2, sample 2
2.00 Autistic male   3, sample 2
2.00 Autistic male   3, sample 3
2.00 Autistic male   3, sample 4
2.00     21up female 1, sample 2
2.00     21up male   2, sample 1

It's not a perfect split, but it looks like there might be some kind of effect there (t-test, one tailed: t=1.67, df=32, likelihood of null hypothesis 0.052). These samples are all about 30 seconds long, not really enough to properly nail the alpha value. Slight changes in parameters shuffle the order a bit, but the autistic group always ending up clumped mostly at the bottom is pretty consistent.

Not definitive, hopefully enough for me to convince people it's worth checking out further. Controlled conditions, no background noise, no dynamic range compression, matched subject and control groups. Well, I can dream.

Appendix

Short time fourier transform, 2048 bytes (at 44100Hz) per frame. Apply k-means algorithm to frames, where the feature vector is the frequency magnitudes and k=2. Class of greater magnitude represents speech, class of lesser magnitude silence.

Update 9/6/05... Results above are based on the difference in length between each syllable and the last. For straight syllable lengths, fitted using PyLevy (which can handle non-symmetric stable distributions), results aren't as compelling:

0.83 Autistic female 1, sample 1
0.86     21up female 1, sample 1
0.96   Parent female 2, sample 1
1.02 Autistic male   3, sample 3
1.02 Autistic female 1, sample 2
1.03     21up male   4, sample 2
1.06 Autistic male   3, sample 5
1.15     21up female 1, sample 2
1.20     21up male   3, sample 1
1.22     21up male   2, sample 2
1.22     21up male   4, sample 1
1.23   Parent female 5, sample 1
1.23   Parent female 3, sample 1
1.24   Parent female 1, sample 1
1.24   Parent female 5, sample 2
1.26 Autistic male   3, sample 2
1.28   Parent female 2, sample 3
1.34   Parent female 4, sample 1
1.38   Parent female 2, sample 2
1.45   Parent female 4, sample 2
1.48 Autistic male   2, sample 1
1.53     21up female 2, sample 1
1.54     21up male   1, sample 1
1.56 Autistic male   1, sample 3
1.60     21up male   2, sample 1
1.60 Autistic male   2, sample 2
1.62 Autistic male   1, sample 2
1.65 Autistic male   3, sample 1
2.00   Parent male   1, sample 2
2.00   Parent male   1, sample 1
2.00 Autistic male   1, sample 1
2.00   Parent female 5, sample 3
2.00 Autistic male   3, sample 4
2.00 Autistic female 2, sample 1

Syllable lengths are certainly temporally correlated. Markov processes. Levy flights. I guess this needs to be taken into account when analysing data. i.e. even autistic people will gradually vary the parameters of their speech over time, just won't have sharemarket-like jumps.

Or there could be no effect, and the effect I seem to see is just because when you analyse things enough ways one of them will by chance give the result you were looking for. I need more data.