Learning to understand others' actions

Despite nearly two decades of research on mirror neurons, there is still much debate about what they do. The most enduring hypothesis is that they enable ‘action understanding’. However, recent critical reviews have failed to find compelling evidence in favour of this view. Instead, these authors argue that mirror neurons are produced by associative learning and therefore that they cannot contribute to action understanding. The present opinion piece suggests that this argument is flawed. We argue that mirror neurons may both develop through associative learning and contribute to inferences about the actions of others.


INTRODUCTION
Mirror neurons, which have been discovered in the premotor area F5 [1] and inferior parietal lobule, area PF [2] of macaque monkeys, discharge not only when the monkey executes an action of a certain type (e.g. precision grip), but also when it observes the experimenter performing the same action. A number of neuroimaging studies have provided evidence that a similar system also exists in humans (e.g. [3]). A matter of much debate is whether activity in the so-called 'mirror neuron system' (MNS) reflects neural processes engaged in 'action understanding', that is, inferences about the goals and intentions driving an observed action. It has been suggested that mirror neurons are simply the result of learned sensorimotor associations, as proposed in the associative sequence learning (ASL) model [4,5], and that this ontogeny is inconsistent with a role in understanding the actions of others [6,7]. In contrast, we argue that mirror neurons may develop through associative learning and subsequently contribute to action understanding.

ASL MODEL
The ASL model [4,5] proposes that the mirror properties of the MNS emerge through sensorimotor associative learning. Under this hypothesis, we are not born with an MNS. Rather, experience in which observation of an action is correlated with its execution establishes excitatory links between sensory and motor representations of the same action. We have abundant experience of matching relationships between observed and executed actions during our lives [8]. Following such experience, observation of an action is sufficient to activate its motor representation. Therefore, representations that were originally motor become 'mirror' (activated when observing and executing the same action, figure 1).
If the ASL model is correct, mirror neurons do not have an 'adaptive function', they did not evolve 'for' action understanding or to meet the demands of any other cognitive task [5]. However, as a by-product of associative learning, mirror neurons could still be recruited in the course of development to play some part in a variety of cognitive tasks. Therefore, according to the ASL model, they could be useful without being essential, and without their utility explaining their origins. Specifically, mirror neurons could play a part in action understanding even if this functional role was not favoured by natural selection in the course of phylogenetic evolution.
So why has the ASL hypothesis been interpreted as evidence against a functional role of mirror neurons in action understanding? Hickok [6] argued that some of the evidence that has been published in support of ASL is inconsistent with the hypothesis that the MNS is involved in action understanding. The studies in question require participants to observe actions while systematically executing non-matching actions, and subsequently record indices of MNS functioning. The rationale for these experiments assumes that, if the MNS develops through associative learning, then experiences that differ from those typically encountered during life should reconfigure the MNS and change the way it operates. Consistent with this prediction, it has been found that training in which participants are required to perform index finger actions when they see little finger actions, and vice versa, results in activation of primary motor cortical representations of the index finger when passively observing little finger actions, and activation of representations of the little finger when observing index finger actions [9,10]. Catmur et al. [11] demonstrated that such training effects are likely to be mediated by cortical circuits that overlap with areas of the MNS. They required one group of participants to lift their hand when they saw a hand lift, and to lift their foot when they saw a foot lift (matching group). Another group was required to lift their hand when they saw a foot lift, and to lift their foot when they saw a hand lift (non-matching group). Following such training, voxels in premotor and inferior parietal cortices that responded more when observing hand than foot actions in the matching group responded more to foot than hand actions in the non-matching group. This finding suggests that, following non-matching training, observation of hand actions activates motor representations of foot actions. Similar 'countermirror' training effects have also been observed in behavioural paradigms (e.g. [12,13], see also [14,15] for 'logically related' activations that may have been generated through naturally occurring nonmatching experience).
Hickok [6] argued that these studies provide evidence that mirror neurons cannot underlie action understanding. Embracing the idea that countermirror training reconfigures the MNS-making it responsive to the sight of one action and the execution of a different action-he reasoned that, if the MNS contributes to action understanding, this reconfiguration should have an impact on action understanding. However, he considered that participants who showed counter-mirror activation (e.g. stronger activation of the index finger muscle during observation of little than of index finger movement) 'presumably did not mistake the perception of index finger movement for little finger movement and vice versa' ( [6], p.1236). The key word here is 'presumably'. Neither the focal study by Catmur et al. [9], nor any other study, has examined the effects of countermirror training on indices of action understanding.

PREDICTIVE CODING AND ACTION UNDERSTANDING
The aim of the predictive coding (PC) account [16,17] was to answer the question 'if mirror neurons enable the observer to infer the intention of an observed action, how might they do this'? In many accounts of the MNS, it is assumed that mirror neurons are driven by the sensory data and that when the mirror neurons discharge, the action is 'understood'. However, within this scheme mirror neurons could only enable action understanding if there was a one-toone mapping between the sensory stimulus and the intention of the action. This is not the case. If you see someone in the street raise their hand, they could be hailing a taxi or swatting a wasp. The context must establish which intention is more likely to drive an action. Consistent with the PC account, the empirical evidence does not support the view that mirror neurons are driven solely by sensory data from focal action stimuli. For example, Umilta et al. [18] found that neurons in F5, which fire both when the monkey executes and observes grasping actions, also fired when the monkey observed the experimenter's grasping action disappear behind a screen. That is, the premotor neurons represented a grasping action in its entirety, but where the grasping phase was not actually seen. Therefore, mirror neurons could not be driven entirely by the focal stimulus input. The PC account provides a framework that resolves these issues.
The essence of the PC account is that, when we observe someone else executing an action, we use our own motor system to generate a model of how we would perform that action to understand it [19,20]. PC enables inference of the intentions of an observed action by assuming that the actions are represented at several different levels [21] and that these levels are organized hierarchically such that the description of one level will act as a prior constraint on sub-ordinate levels. These levels include: (i) the intention level that defines the long-term desired outcome of an action, (ii) the goal level that describes intermediate outcomes that are necessary to achieve the long-term intention, (iii) the kinematic level that describes, for example, the shape of the hand and the movement of the arm in space and time. Therefore, to understand the intentions or goals of an observed action, the observer must be able to represent the observed movement at either the goal level or the intention level, having access only to a visual representation of the kinematic level. PC proposes that contextual cues generate a prior expectation about the intention of the person we are observing. In the above example of the hand-raising action, these cues could be the presence of a taxi or wasp, or a facial expression. On the basis of these intentions, we can generate a prior expectation of the person's intermediate goals. Given their intermediate goals, we can predict the perceptual kinematics. Backward connections convey the prediction to the lower level where it is compared with the representation at this sub-ordinate level to produce a prediction error. This prediction error is then sent back to the higher level, via forward connections, to update the representation at this level (figure 2). By minimizing the prediction error at all the levels of action representation, the most likely cause of the action, at both the intention and the intermediate goal level, will be inferred. Thus, the PC process uses information, supplied by the MNS, about which goals are most likely, given a certain intention, and which kinematics are most likely, given a certain goal, to test hypotheses about the observed actors' intentions.
The assumptions of the PC model are consistent with those of ASL. If both models are correct, the MNS develops through associative learning and subsequently supports inferences about the goals and intentions driving others' actions. Therefore, it remains an open and important empirical question whether any intervention that systematically changes the MNS has correlated effects on action understanding.

CONCLUSION
PC and ASL accounts of the MNS address different questions and offer compatible answers. The PC account considers the requirements that are necessary to enable goal or intention inference during action observation. It assumes that the sensorimotor connection strengths have been learned, but does not propose a mechanism by which these are learned. ASL provides an associative mechanism for such learning. Although ASL does not provide a mechanistic account of how such learning could enable action understanding, it allows for the possibility that the MNS, once acquired, could support such functions. In other words, the MNS could enable inferences about the intentions of others, even if this function is not an evolutionary adaptation. Therefore, if both the PC and ASL hypotheses are correct, we learn, via the principles specified in associative learning theory, to predict others' intentions using our own motor systems. These predictions are compared with the representations at the sub-ordinate level to produce a prediction error. This prediction error is then sent back to the higher level, via forward connections, to update the representation. By minimizing the prediction error at all the levels of the MNS, the most likely cause of the action will be inferred. Dotted line, prediction error; thick line, prediction.
Opinion piece. Learning to understand others' actions C. Press et al. 459