Auditory Motor Mapping for Autism
Auditory Motor Mapping, there’s a mouthful. Parents of children with autism are clambering for this new therapy that promises to enable non-verbal children to speak, just be learning to sing their words! Wow, pretty amazing, or is it?
Does this idea make any sense based on what we know scientifically? There is evidence that specific brain areas are associated with different kinds of musical expertise, such as hearing singing (if you are a singer) verus hearing violin music (if you’re a violinist) or singing versus speaking [Dick. F. Auditory-motor expertise alters "speech selectivity" in professional musicians and actors. Cereb Cortex. 2011 Apr;21(4):938-48; Jeffries, KJ, et.al. Neuroreport. 2003 Apr 15;14(5):749-54. Words in melody: an H(2)15O PET study of brain activation during singing and speaking.]
For many years it has been known that many people who stutter stop doing so when they sing the same words they had tried to say, which seems magical, especially to people in the arts who already thing music is magical, which it is. Same for some people who have had strokes. Wow, all they need to do is go through life singing! Hmmm, no, not really. There’s a bit more to it than that.
Over the past couple of years researchers in the Gottfried Schlaug laboratory of Music and Neuroimaging in the department of Neurology, Beth Israel Deaconess Medical Center and Harvard Medical School, have been studying brain activity changes associated with music and phonation (i.e. singing words versus saying words). Very recently that group published an article about “Auditory-Motor Mapping Training,” which claims to have demonstrated exposure to combined music and speech therapy can induce speech in non-speaking children with autism [Wan, CY. Et.al. PLoS One.
Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: a proof of concept study. 2011;6(9):e25505. Epub 2011 Sep 29]
Their theoretical argument goes something like this. Much as it is possible to promote synaptic reorganizations in brain areas involved in teaching facial recognition in autism (e.g. the fusiform gyrus through teaching facial recognition) it may be possible to reorganize functioning of speech related brain areas related to singing and speech, such as the superior temporal gyrus through comparable practice. That’s an appealing idea and could make sense in principle, based on what is known about brain plasticity.
The Wan et.al. study which claims to show that this actually works in practice, is significantly flawed and doesn’t really show that, but the idea is interesting nonetheless. Here’s what was done. Five boys and one girl, 5 yrs 9 months to 8 yrs 9 months at baseline were included in the study. Two received articulation therapy the rest PECS at baseline. All had educational activities in school, presumably some related to such things as recognizing pictures. All participants had receptive language skills of around 2 years of age or more, based on the Mullen Scales of Early Learning. So these were not among the lower functioning children with autism. Children had to display the ability to 1) sit in a chair for more than 15 minutes; 2) follow one-step commands without prompting; and 3) imitate simple gross motor and oral motor movements such as clapping their hands, stomping their feet, and opening their mouth. Again, these criteria would rule out most of the lower function children with any behavioral challenges.
A six step procedure was used, a very complex procedure. The children were exposed to pictures of familiar two syllable things, which they first identified by pointing to them, they heard two drums tuned to two pitches as the names of those things were sung by an adult, in the same pitch. The child was led from listening, to unison production, to partially-supported production, to immediate repetition, and finally to producing the target word/phrase on their own. Fading procedures were used to gradually reduce the sung and drum prompts until they emitted an utterance on their own, at least some of the utterances (see the data below).
The first procedure is uniformly used as part of ABA behavioral intervention called receptive matching to sample and is generally the first step in language production training and the second is expressive matching to sample. Each Auditory-Motor Mapping Session lasted 45 minutes including a pretest and post-test. Both sets of stimuli contained bi-syllabic words or phrases that were matched on: frequency in typical early language acquisition and difficulty of consonants. The authors did not indicate whether the child was already familiar with the words. The outcome measure of interest was the child's speech production when he or she was presented with the picture stimuli (trained and untrained sets) during the probe assessment sessions. For each target word/phrase, each child's utterances were transcribed and analyzed based on their best production of the target word within a trial, and by determining the number of consonants and vowels produced correctly. The criterion for approximate CV production was met if the child produced a consonant approximation combined with a correctly produced vowel. The main measure was the percent of consonant vowel combinations said by the child that corresponded with those shown in the picture. CV approximations produced during the best baseline probe versus those made during 40th session of the entire group was the main measure. Keep this in mind when you look at the following graph… i.e. the statistical data are for the entire group.
Now for the problems. (1) There is no control condition in which all of the same procedures were used except the words were not sung, but were said while the therapist simply clapped her/his hands, not singing no drums, and for 40 sessions,, 45 minutes per session. That is a very standard speech therapy procedure. I doubt any of the children had been exposed to such a procedure with this intensity and consistency in the past. Without that condition we have no idea whether the singing or drums were relevant to the outcome. (2) The children were all described as non-verbal, and it was stated they said no words at baseline, but the date in Figure 2 shows that at baseline only two of the six children said either no words or almost no recognizable consonant vowel combinations, so in fact 2/3 of the children had some expressive communication at baseline. (3) The main outcome measure is the average of six children’s consonant vowel combinations percent correct. It is possible to say a consonant and vowel without saying a word, e.g. dah, so this is NOT a measure of functional speech. One child’s percent correct increased from around 25% at baseline to 71% after therapy, most of the rest showed small or no changes. By averaging 71% improvement with the other scores (i.e. 8%, 21%. 29%, 21% and 6% at the end of therapy, which were very similar to baseline scores of the latter subjects) this gives an average inflated score for the group that is not at all representative of the group, which was the measure used. A statistical T test, which is an inappropriate test under these circumstances, was used. Had a nonparametric test been used, such as a median test, it is very unlikely there would have been significant differences because of therapy on average. Note in this table, the authors state that percent of correctly target words were zero at baseline, but their other graph indicates they produced from zero to around 30 percent of consonant vowel pairs at baseline. This table suggests most learned to imitate 3-4 of 15 words, and one learned to imitate 11 words over 40 sessions.
Summary: Is this a dumb idea? No, it’s an interesting idea. Does it show singing and listening to a tuned drum are essential aspects of treatment? No. Does this study show that this Auditory Motor Mapping Treatment does what it is claimed to do? No, it is only effective for one of six children with autism who had moderate consonant vowel expressive use at baseline, and it is not clear the musical part is relevant to outcome. Most of the kids learned to say 3-4 of 15 words, which is not functional speech. Here's the link