Although the motivation seems distinct, the conclusion is the same, and in reality the motivation for attention in any system is to reduce the quantity of information to process in order to complete some task see Computational Foundations for Attentive Processes. But depending on one's interest, modeling efforts do not always have the same goals. That is, one may be trying to model a particular set of experimental observations, one may be trying to build a robotic vision system and attention is used to select landmarks for navigation, one may have interest in eye movements , or in the executive control function, or any one or more of the functional elements described in Visual Attention.
As a result, comparing models is not straightforward, fair, or useful. Comparing pieces that represent the same functionality is more relevant, but there are so many of these combinations that it would be an exercise beyond the scope of this overview. The use of attentive methods has pervaded the computer vision literature demonstrating the importance for reducing the amount of information to be processed. It is important to note that several early analyses of the extent of the information load issue appeared Uhr , Feldman and Ballard , Tsotsos with converging suggestions for its solution, those convergences appearing in a number of the models below particularly those of Burt or Tsotsos Specifically, the methods can be grouped into four categories.
Within modern computer vision, there are many, many variations and combinations of these themes because regardless of the impressive rapid increases in power in modern computers, the inherent difficulty of processing images demands attentional processes see Computational Foundations or Attentional Processes. One way to reduce the amount of an image to be processed is to concentrate on the points or regions that are most interesting or relevant for the next stage of processing such as for recognition or action.
The idea is that perhaps 'interestingness' can be computed in parallel across the whole image and then those interesting points or regions can be processed in more depth serially. The first of these methods is due to Moravec and since then a large number of different kinds of 'interest point' computations have been used. It is interesting to note the parallel here with the Saliency Map Hypothesis described below.
The computational load is not only due to the large number of image locations this is not so large a number as to cause much difficulty for modern computers , but rather it is due to the combinatorial nature of combinations of positions or regions.
In perceptual psychology, how the brain might organize items is a major concern, pioneered by the Gestaltists Wertheimer Thus, computer vision has used grouping strategies following Gestalt principles in order to limit the possible subsets of combinatorially defined items to consider. The first such use appeared in Muerle and Allen in the context of object segmentation. Human eyes move, and humans move around their world in order to acquire visual information. Active vision in computer vision uses intelligent control strategies applied to the data acquisition process depending on the current state of data interpretation Bajcsy , Tsotsos A variety of methods have appeared following this idea, perhaps the earliest one most relevant to this discussion is the robotic binocular camera system of Clark and Ferrier , featuring a salience-based fixation control mechanism.
The application of domain and task knowledge to guide or predict processing is a powerful tool for limiting processing, a fact that has been formally proved Tsotsos ; Parodi et al. The first use was for oriented line location in a face-recognition task Kelly The first instance for temporal window prediction was in a motion recognition task Tsotsos et al.
Clearly, in this class, the major motivation has always been to provide explanations for the characteristics of biological, especially human, vision. Typically, these have been developed to explain a particular body of experimental observations. This is a strength; the authors usually are the ones who have done some or all of the experiments and thus completely understand the experimental methods and conclusions.
Simultaneously, however, this is also a weakness because usually the models are often difficult to extend to a broader class of observations. Along the biological vision branch, the three classes identified here are:. Their value lies in the explanation they provide of certain attentional processes; the abstractness of explanation is also their major problem because it is typically open to interpretation. Classic models, even though they were motivated by experiments in auditory attention, have been very influential.
The Biased Competition Model has garnered many followers mostly due to the conceptual aspect of it combining competition with top-down bias, concepts that actually appeared in earlier models such as Grossberg or Tsotsos These are conceptual frameworks, ways of thinking about the problem of attention.
Many have played important, indeed foundational, roles in how the field has developed. These models are mathematical and are developed to capture parameter variations in experimental data in as compact and parsimonious form as possible. Their value lies primarily in how well they provide a fit to experimental data, and in interpolation or extrapolation of parameter values to other experimental scenarios. Good examples are the Theory of Visual Attention Bundesen and the set of models that employ normalization as a basic processing element. An early one is the model of Reynolds et al.
These models provide mathematics and algorithms that govern their performance and as a result present a process by which attention might be computed and deployed. They, however, do not provide sufficient detail or methodology so that the model might be tested on real stimuli. These models often provide simulations to demonstrate their actions. In a real sense they are a combination of descriptive and data-fitting models; they provide more detail on descriptions so they may be simulated while showing good comparison to experimental data at qualitative levels and perhaps also quantitative.
It is interesting to note that the Saliency Map Model is strongly related to the Interest Point Operations on the other side of this taxonomy.
As mentioned earlier, the point of intersection between the computer vision and biological vision communities is represented by the set of computational models in the taxonomy. Computational Models not only include a process description for how attention is computed, but also can be tested by providing image inputs, similar to those an experimenter might present a subject, and then seeing how the model performs by comparison.
The biological connection is key and pure computer vision efforts are not included here. Under this definition, computational models generally provide more complete specifications and permit more objective evaluations as well. This greater level of detail is a strength but also a weakness because there are more details that require experimental validation.
Many models have elements from more than one class so the separation is not a strict one. Computational models necessarily are Algorithmic Models and often also include Data-Fitting elements. Nevertheless, in recent years four major schools of thought have emerged, schools that will be termed 'hypotheses' here since each has both supporting and detracting evidence. In what follows, an attempt is made to provide the intellectual antecedents for each of these major hypotheses.
The taxonomy is completed in Section 2. This hypothesis focuses on how attention solves the problems associated with stimulus selection and then transmission through the visual cortex. The issues of how signals in the brain are transmitted to ensure correct perception appear, in part, in a number of works.
Milner , for example, mentions that attention acts in part to activate feedback pathways to the early visual cortex for precise localization, implying a pathway search problem. The routing issues, described in Tsotsos et al. Thus, each event interferes with the interpretation of other events in the visual field the Cross-Talk Problem - see Figure 3c. Any model that uses a biologically plausible network of neural processing units needs to address these problems.
One class of solutions is that of an attentional 'beam' through the processing network as shown in Figure 3d. This hypothesis has its roots in Feature Integration Theory Treisman and Gelade and appears first in the class of algorithmic models above Koch and Ullman It includes the following elements see Figure 4: Feature maps code conspicuity within a particular feature dimension.
The saliency map combines information from each of the feature maps into a global measure where points corresponding to one location in a feature map project to single units in the saliency map. Saliency at a given location is determined by the degree of difference between that location and its surround. The drive to discover the best representation of saliency or conspicuity is a major current activity; whether or not a single such representation exists in the brain remains an open question with evidence supporting many potential loci summarized in Tsotsos et al.
The earliest conceptualization of this idea seems to be due to Grossberg who between and , presented ideas and theoretical arguments regarding the relationship among neural oscillations , visual perception and attention see Grossberg His work led to the ART model that provided details on how neurons may reach stable states given both top-down and bottom-up signals and play roles in attention and learning Grossberg Milner also suggested that the unity of a figure at the neuronal level is defined by synchronized firing activity Milner He defined a detailed model of how this might be accomplished, including neurons with dynamically modifiable synaptic strengths that became known as von der Malsburg synapses.
This is done by generating coherent semi-synchronous oscillations in the Hz range. These oscillations then activate a transient short-term memory. Models subscribing to this hypothesis typically consist of pools of excitatory and inhibitory neurons connected as shown in Figure 5. The actions of these neuron pools are governed by sets of differential equations; it is a dynamical system. Strong support for this view appears in a nice summary by Sejnowski and Paulsen A number of other models exist but do not conform to our definition of computational model; they are mathematical models that only provide simulations of their performance.
As such, we cannot include them here but do provide these citations because of the intrinsic interest in this model class Niebur et al.
Clearly, there is room for expansion of these models into computational form. This hypothesis remains controversial see Shadlen and Movshon The emergent attention hypothesis proposes that attention is a property of large assemblies of neurons involved in competitive interactions of the kind mediated by lateral connections and selection is the combined result of local dynamics and top-down biases see Figure 6. In other words, there is no explicit selection process of any kind.
Close this book for a moment and look around you. You scan Physics of Neural Networks Physics of Neural Networks Models of Neural Networks. Free Preview. © Models of Neural Networks IV. Early Vision and Attention Neurons, Networks, and Cognition: An Introduction to Neural Modeling Itsykson, V. (et al. ). Models of Neural Networks IV: Early Vision and Attention (Physics of Neural Networks) (v. 4) [J. Leo van Hemmen, Jack D. Cowan, Eytan Domany] on.
The mathematics of the dynamical system of equations leads through its evolution alone to single peaks of response that represent the focus of attention. When attention was directed into the receptive field of the recorded neuron the following changes were observed compared with attention directed outside the receptive field.
When the contrast of the stimulus was varied there could be saturation effects: Cortical interneurons are part of dedicated networks connected by gap junctions and chemical inhibitory synapses [ 4 , 5 ]. Interneuron networks synchronize when activated by excitatory neurotransmitters or neuromodulators. The firing rate of cortical output neurons is modulated by the resulting synchronous inhibition. Here we explore using model simulations and in vitro experiment, the hypothesis that selective attention is mediated by changes in synchrony of local interneuron networks. When the effect of attention is modeled as an increase in synchrony of interneuron networks, we find that: Thus, changes in interneuron synchrony could potentially underlie a variety of seemingly unrelated observations.
A model interneuron network produced an oscillatory activity that consisted of a sequence of synchronized spike volleys [ 1 ]. The method for obtaining synchronous volleys is given in Ref. The resulting train of conductance pulses drove a single compartment neuron with Hodgkin—Huxley voltage-gated sodium and potassium channels, a passive leak current, the synaptic currents described above, and a white noise current with mean I and variance 2 D [ 11 ]. The model was run under four sets of parameters representing the following experimental conditions: The effects of attention were modeled by changing the parameters of the synchronous inhibitory drive.
The stimulus-induced synaptic inputs that drove the neuron were not temporally patterned. To validate the computer model, neurophysiological experiments in slice were performed. These were carried out in accordance with animal protocols approved by the N. Coronal slices of rat pre-limbic and infra limbic areas of prefrontal cortex were obtained from 2 to 4 weeks old Sprague—Dawley rats.
We recorded from regularly spiking layer 5 pyramidal cells that were identified morphologically. Current was injected into the neuron using dynamic clamp [ 9 ] to mimic the effect of a oscillatory inhibitory synaptic drive. Full experimental details are in Ref. Results on attentional modulation of the firing rate and coherence of V4 neurons in macaque cortical area V4 were recently reported in three key papers [ 6 , 7 , 3 ]. We reproduced the results by McAdams and Maunsell [ 6 ] and those of Fries and coworkers [ 3 ] under the assumption that attention modulates the synchrony of local interneuron networks results not shown.
Cortical neurons can fire at high rates with a coeNcient of variation CV between 0. A potential consequence of driving neurons with a synchronous oscillatory inhibitory drive is a decrease in variability [ 11 ]. Therefore, we investigated whether attentional modulation by inhibitory synchrony could operate on neurons with high CV values Fig.
The model neuron was driven by a noisy synchronous inhibitory drive, a temporally homogeneous excitatory Poisson process and a white noise current. During the attended state the firing rate increased to The spike field coherence see Ref. Increasing inhibitory input synchrony led to an increased firing rate. We show a the membrane potential, b the local field potential LFP , c the firing rate as a function of time, and d the rastergram of the first 10 trials.
Ascending sensory inputs can often be represented as a depolarizing current I. The resulting firing rate that such an input elicits is determined by the f — I curve, the shape of which depends on the modulating inputs to the receiving neuron. These inputs may represent the value of an additional variable, such as attentional state. There are important computational consequences when modulating inputs change the gain of the f — I curve multiplicatively [ 8 ]. We investigated how inhibitory synchrony altered the f — I curves. The response could saturate as a function of I Fig. In the other case, the response could be nonsaturating Fig.
We studied using numerical simulations the changes in f — I over a wide range of parameters Fig. This indicates that the main effect of inhibition for this parameter range is gain modulation.
Fred Hamker , Chemnitz University of Technology. These inputs may represent the value of an additional variable, such as attentional state. Contents 1 Introduction 1. For the curves that did saturate, the firing rate did not increase further with stimulus strength at high contrast. Feature maps code conspicuity within a particular feature dimension. This hypothesis remains controversial see Shadlen and Movshon The first instance for temporal window prediction was in a motion recognition task Tsotsos et al.
Hence, for this set of parameters the change in gain was minimal. Subtractive and divisive modulation of f — I curves with inhibitory synchrony. A f — I curves for two different neurons in slices of rat prefrontal cortex. The firing rate did not saturate in a , whereas it did saturate in b. The solid lines are fits to a sigmoid function, filled circles are simulation results.
The input activity a IV is proportional to the size of the network times the mean number of neurons that are active on each cycle.