Jeremy Freeman - Publications http://jeremyfreeman.net/ Publications by Jeremy Freeman. Fri, 03 Feb 2012 03:25:29 -0700 en-us <![CDATA[Freeman J & Simoncelli EP (2011) Metamers of the ventral stream. Nature Neuroscience. 14(9): 1195-1201.]]> http://jeremyfreeman.net/publications/13 http://jeremyfreeman.net/publications/13 Mon, 01 Aug 2011 00:00:00 -0600 The human capacity to recognize complex visual patterns emerges in a sequence of brain areas known as the ventral stream, beginning with primary visual cortex (V1). We developed a population model for mid-ventral processing, in which nonlinear combinations of V1 responses are averaged in receptive fields that grow with eccentricity. To test the model, we generated novel forms of visual metamers, stimuli that differ physically but look the same. We developed a behavioral protocol that uses metameric stimuli to estimate the receptive field sizes in which the model features are represented. Because receptive field sizes change along the ventral stream, our behavioral results can identify the visual area corresponding to the representation. Measurements in human observers implicate visual area V2, providing a new functional account of neurons in this area. The model also explains deficits of peripheral vision known as crowding, and provides a quantitative framework for assessing the capabilities and limitations of everyday vision.

]]>
<![CDATA[Freeman J, Donner TH, & Heeger DJ (2011) Inter-area correlations in the ventral visual pathway reflect feature integration. Journal of Vision. 11(4):15, 1-23.]]> http://jeremyfreeman.net/publications/1 http://jeremyfreeman.net/publications/1 Tue, 26 Apr 2011 22:03:24 -0600 During object perception, the brain integrates simple features into representations of complex objects. A perceptual phenomenon known as visual crowding selectively interferes with this process. Here, we use crowding to characterize a neural correlate of feature integration. Cortical activity was measured with functional magnetic resonance imaging, simultaneously in multiple areas of the ventral visual pathway (V1-V4 and the visual word form area, VWFA, which responds preferentially to familiar letters), while human subjects viewed crowded and uncrowded letters. Temporal correlations between cortical areas were lower for crowded letters than for uncrowded letters, especially between V1 and VWFA. These differences in correlation were retinotopically specific, persisted when attention was diverted from the letters, and they disappeared when we substituted the letters with grating patches that were not crowded under our stimulus conditions. We conclude that inter-area correlations reflect feature integration, and are disrupted by crowding. We propose that crowding may perturb the transformations between neural representations along the ventral pathway that underlie the integration of features into objects.

]]>
<![CDATA[Freeman J, Brouwer GJ, Heeger DJ, & Merriam EP (2011) Orientation decoding depends on maps, not columns. The Journal of Neuroscience. 31(13):4792-4804.]]> http://jeremyfreeman.net/publications/21 http://jeremyfreeman.net/publications/21 Thu, 10 Mar 2011 17:53:15 -0700 The representation of orientation in primary visual cortex (V1) has been examined at a fine spatial scale corresponding to the columnar architecture. We present functional magnetic resonance imaging (fMRI) measurements providing evidence for a topographic map of orientation preference in human V1 at a much coarser scale, in register with the angular-position component of the retinotopic map of V1. This coarse-scale orientation map provides a parsimonious explanation for why multivariate pattern analysis methods succeed in decoding stimulus orientation from fMRI measurements, challenging the widely-held assumption that decoding results reflect sampling of spatial irregularities in the fine-scale columnar architecture. Decoding stimulus attributes and cognitive states from fMRI measurements has proven useful for a number of applications, but our results demonstrate that the interpretation cannot assume decoding reflects or exploits columnar organization.

]]>
<![CDATA[Freeman J & Ziemba CM (2011) Unwrapping the ventral stream. The Journal of Neuroscience. 31(7):2349-2351.]]> http://jeremyfreeman.net/publications/23 http://jeremyfreeman.net/publications/23 Sat, 26 Feb 2011 00:00:00 -0700 <![CDATA[Freeman J, Chakravarthi R, & Pelli DG (2011) Substitution and pooling in crowding. In press, Attention Perception & Psychophysics.]]> http://jeremyfreeman.net/publications/2 http://jeremyfreeman.net/publications/2 Tue, 25 Jan 2011 22:04:23 -0700 Unless we fixate directly on it, it is hard to see an object among other objects. This breakdown in object recognition, called “crowding,” severely limits peripheral vision. The effect is more severe when objects are more similar. When observers mistake the identity of a “target” among “flanker” objects, they often report a flanker. Many have taken these flanker reports as evidence of internal substitution of the target by a flanker. Here, we ask observers to identify a target presented in between one similar and one dissimilar flanker. (Simple) substitution takes only one letter, which is often the target but, by unwitting mistake, is sometimes a flanker. The opposite of substitution is pooling, which takes in more than one letter. Having taken only one letter, the substitution process knows only its identity, not its similarity to the target. Thus, it must report similar and dissimilar flankers equally often. Contrary to this prediction, the similar flanker is reported much more often than the dissimilar flanker, showing that rampant flanker substitution cannot account for most flanker reports. Mixture modeling shows that simple substitution can account for at most about half the trials. Pooling and non-pooling (simple substitution) together comprise all possible models of crowding. When observers are asked to identify a crowded object, at least half of their reports are pooled, based on a combination of information from target and flankers, rather than being based on a single letter.

]]>
<![CDATA[Wang HX, Freeman J, Merriam EP, Hasson U, & Heeger DJ (2011) Temporal eye movement strategies during naturalistic viewing. Under review.]]> http://jeremyfreeman.net/publications/24 http://jeremyfreeman.net/publications/24 Sat, 01 Jan 2011 00:00:00 -0700 <![CDATA[Pelli DG, Tillman KA, Freeman J, Su M, Berger TD, & Majaj NJ (2007) Crowding and eccentricity determine reading rate. Journal of Vision. 7(2):20, 1-36.]]> http://jeremyfreeman.net/publications/4 http://jeremyfreeman.net/publications/4 Fri, 19 Oct 2007 22:11:11 -0600 Bouma’s law of crowding predicts an uncrowded central window through which we can read and a crowded periphery through which we cannot. The old discovery that readers make several fixations per second, rather than a continuous sweep across the text, suggests that reading is limited by the number of letters that can be acquired in one fixation, without moving one’s eyes. That “visual span” has been measured in various ways, but remains unexplained. Here we show (1) that the visual span is simply the number of characters that are not crowded and (2) that, at each vertical eccentricity, reading rate is proportional to the uncrowded span. We measure rapid serial visual presentation (RSVP) reading rate for text, in both original and scrambled word order, as a function of size and spacing at central and peripheral locations. As text size increases, reading rate rises abruptly from zero to maximum rate. This classic reading rate curve consists of a cliff and a plateau, characterized by two parameters, critical print size and maximum reading rate. Joining two ideas from the literature explains the whole curve. These ideas are Bouma’s law of crowding and Legge’s conjecture that reading rate is proportional to visual span. We show that Legge’s visual span is the uncrowded span predicted by Bouma’s law. This result joins Bouma and Legge to explain reading rate’s dependence on letter size and spacing. Well-corrected fluent observers reading ordinary text with adequate light are limited by letter spacing (crowding), not size (acuity). More generally, it seems that this account holds true, independent of size, contrast, and luminance, provided only that text contrast is at least four times the threshold contrast for an isolated letter. For any given spacing, there is a central uncrowded span through which we read. This uncrowded span model explains the shape of the reading rate curve. We test the model in several ways. We use a “silent substitution” technique to measure the uncrowded span during reading. These substitutions spoil letter identification but are undetectable when the letters are crowded. Critical spacing is the smallest distance between letters that avoids crowding. We find that the critical spacing for letter identification predicts both the critical spacing and the span for reading. Thus, crowding predicts the parameters that characterize both the cliff and the plateau of the reading rate curve. Previous studies have found worrisome differences across observers and laboratories in the measured peripheral reading rates for ordinary text, which may reflect differences in print exposure, but we find that reading rate is much more consistent when word order is scrambled. In all conditions tested—all sizes and spacings, central and peripheral, ordered and scrambled—reading is limited by crowding. For each observer, at each vertical eccentricity, reading rate is proportional to the uncrowded span.

]]>
<![CDATA[Freeman J, & Pelli DG (2007) An escape from crowding. Journal of Vision. 7(2):22, 1-14.]]> http://jeremyfreeman.net/publications/3 http://jeremyfreeman.net/publications/3 Fri, 19 Oct 2007 22:10:11 -0600 Crowding occurs when nearby flankers jumble the appearance of a target object, making it hard to identify. Crowding is feature integration over an inappropriately large region. What determines the size of that region? According to bottom-up proposals, the size is that of an anatomically determined isolation field. According to top-down proposals, the size is that of the spotlight of attention. Intriligator and Cavanagh (2001) proposed the latter, but we show that their conclusion rests on an implausible assumption. Here we investigate the role of attention in crowding using the change blindness paradigm. We measure capacity for widely and narrowly spaced letters during a change detection task, both with and without an interstimulus cue. We find that standard crowding manipulations—reducing spacing and adding flankers—severely impair uncued change detection but have no effect on cued change detection. Because crowded letters look less familiar, we must use longer internal descriptions (less compact representations) to remember them. Thus, fewer fit into working memory. The memory limit does not apply to the cued condition because the observer need remember only the cued letter. Cued performance escapes the effects of crowding, as predicted by a top-down account. However, our most parsimonious account of the results is bottom-up: Cued change detection is so easy that the observer can tolerate feature degradation and letter distortion, making the observer immune to crowding. The change detection task enhances the classic partial report paradigm by making the test easier (same/different instead of identifying one of many possible targets), which increases its sensitivity, so it can reveal degraded memory traces.

]]>
<![CDATA[Freeman J, Ziemba CM, Movshon JA, Simoncelli EP (2011) Building and testing a population model of V2 Presented at Society for Neuroscience conference in Washington DC, November 2011 (Poster)]]> http://jeremyfreeman.net/publications/26 http://jeremyfreeman.net/publications/26 Thu, 10 Nov 2011 00:00:00 -0700 <![CDATA[Ziemba CM, Freeman J, Movshon JA, Simoncelli EP (2011) Sensitivity to naturalistic texture differentiates V1 and V2 Presented at Society for Neuroscience in Washington DC, November 2011 (Poster)]]> http://jeremyfreeman.net/publications/27 http://jeremyfreeman.net/publications/27 Thu, 10 Nov 2011 00:00:00 -0700 <![CDATA[Freeman J, Ganguli D, & Simoncelli EP (2011) Do humans use Occam's Razor when learning probability distributions? Presented at Computational and Systems Neuroscience (CoSyNe) conference in Salt Lake City, Utah, February 2010 (Poster).]]> http://jeremyfreeman.net/publications/22 http://jeremyfreeman.net/publications/22 Tue, 01 Mar 2011 18:14:33 -0700 Human behavior appears to reflect probabilistic inference over sensory and motor variables. What are the types of distribution that the brain chooses to encode, and what tradeoffs are involved in learning complex distributions? Here we present experimental evidence that learning a distribution depends systematically on properties of the distribution and the amount of observed data. We characterize the ability of human observers to learn bimodal distributions over the orientation of sequentially presented sinusoidal gratings. We focus on two distributions with modes that are either slightly overlapping or distinct. We use a recently developed technique to teach observers these distributions, and then have them produce samples from their learned distributions [1]. We demonstrate that observers can learn the distinct bimodal distribution after only 60 training samples. For the same number of samples from the overlapping bimodal, observers instead learn a unimodal distribution, typically centered between the two modes. However, with additional training, the learned distribution tends to approach the true one. We then provide an example of a model that exhibits this behavior. The model evaluates the probability of two hypotheses: did a set of samples come from a unimodal or bimodal distribution? Evaluating the probability of each hypothesis requires integrating over the model parameters associated with that hypothesis. In a Bayesian setting, this integration naturally penalizes more complicated hypotheses — a form of Occam’s razor [2]. We use the model to select the best hypothesis given samples from a bimodal distribution, varying the separation between modes and the number of training samples. The model qualitatively predicts our observers’ behavior. This result suggests that humans use Occam’s razor to regularize the distribution learning problem when data are insufficient. Our findings may help constrain the strategies neural systems use to learn distributions to perform inference.

[1] Sanborn, AN, Griffiths, TL. Markov chain Monte Carlo with people. NIPS, 2008. [2] MacKay, D. Information Theory, Inference, and Learning Algorithms. 2001.

]]>
<![CDATA[Freeman J, Brouwer GJ, Heeger DJ, & Merriam EP (2010) Topographic map of orientation in human visual cortex Presented at Society for Neuroscience conference in San Diego, CA, November 2010 (Poster).]]> http://jeremyfreeman.net/publications/25 http://jeremyfreeman.net/publications/25 Thu, 11 Nov 2010 02:41:18 -0700 The representation of orientation in primary visual cortex (V1) has been examined extensively at a fine spatial scale corresponding to the columnar architecture. In humans, orientation can be decoded from functional magnetic resonance imaging (fMRI) signals using multivariate classification methods, but it is unknown whether orientation decoding depends on fine-scale, columnar architecture. We used a phase-encoded mapping procedure to test the hypothesis that orientation is represented in human cortex at a coarse spatial scale, and that this organization provides the basis for orientation decoding.

Cortical responses were measured using fMRI (3T Siemens Allegra, 8-ch phased-array surface coil, 2 × 2 × 2 mm, 24 sl perpendicular to calcarine sulcus), while subjects viewed an oriented sinusoidal grating (0.5 cycles/deg) that filled a 4 deg peripheral annulus with a smooth edge. The orientation of the stimulus changed every 1.5 s, cycling through sixteen evenly spaced angles (0-180 deg). The response of each voxel was fit to a sinusoid with period of the stimulus. The phase of the best-fitting sinusoid indicated the preferred orientation of the voxel.

We observed a topographic map of orientation preference in human V1, confirming and extending previous reports of a quadrant bias for radial orientations (Sasaki et al, Neuron, 2006). The map was tightly co-localized with the retinotopic map: at each location within V1, responses exhibited a preference for radial orientations. Circular correlation was used to quantify the similarity between the orientation and polar angle maps (r = 0.75; p < 0.0001). Control experiments confirmed that the orientation map was robust to a variety of stimulus parameters, and was not due to either attention or eye movements. Multivariate classification analyses were applied to decode stimulus orientation. Averaging the data in a manner consistent with the structure of the topographic map did not affect decoding accuracy, demonstrating that the map was sufficient to classify orientation. Our results strongly suggest that orientation decoding does not reflect the irregular spatial arrangements of orientation columns.

There is a coarse-scale topographic map of orientation in V1. The orientation map provides a parsimonious explanation for how multivariate classification methods decode stimulus orientation from fMRI measurements, and challenges the conjecture that decoding reflects random irregularities in the fine-scale columnar architecture.

]]>
<![CDATA[Freeman J & Simoncelli EP (2010) Crowding and metamerism in the ventral stream. Presented at Vision Sciences Society conference in Naples, FL, May 2010 (Talk).]]> http://jeremyfreeman.net/publications/16 http://jeremyfreeman.net/publications/16 Fri, 14 May 2010 00:00:00 -0600 Vision is degraded in the periphery. The phenomenon of “crowding” provides a striking example: objects closer together than half their eccentricity are unrecognizable. Crowding has been described as statistical or textural averaging of features over spatial regions (Parkes et al., 2001), and recently Balas et al. (2009) showed that applying a texture analysis-synthesis model (Portilla & Simoncelli, 2000) to crowded stimuli simulates crowding effects. We develop this hypothesis with an explicit model of extrastriate ventral stream processing that performs eccentricity-dependent pooling across the entire visual field. Images are decomposed with V1-like filters, followed by simple and complex-cell-like nonlinearities. Pairwise products among V1 outputs are averaged within overlapping spatial regions that grow with eccentricity according to a single scaling parameter (ratio of size-to-eccentricity). If this model captures the information available to human observers, then two properly fixated images with identical model responses should be metamers. We perform experiments to determine the scaling parameter that produces metameric images. Given a natural image, we generate images that have identical model responses, but are otherwise as random as possible. We measure discriminability between such synthetic images as a function of scaling. When images are statistically matched within small pooling regions, performance is at chance (50%), despite substantial differences in the periphery. With larger pooling regions, peripheral differences increase, and discriminability approaches 100%. We fit the psychometric function to estimate the pooling regions (scaling) over which the observer estimates statistics. The result is consistent with the known eccentricity-dependence of crowding, and also with receptive field sizes in macaque mid-ventral areas, particularly V2. Finally, we show that metamers synthesized from classic crowding stimuli (e.g., groups of letters) yield images with jumbled, unidentifiable objects. Thus, the model associates the spatial extent of crowding with mid-ventral receptive field sizes, and provides specific hypotheses for the computations performed by underlying neural populations.

]]>
<![CDATA[Pelli DG, Freeman J, & Chakravarthi R (2010) Crowding combines. Presented at Vision Sciences Society conference in Naples, FL, May 2010 (Talk).]]> http://jeremyfreeman.net/publications/17 http://jeremyfreeman.net/publications/17 Fri, 14 May 2010 00:00:00 -0600 Visual crowding provides a window into object recognition: observers fail to recognize objects in clutter. Here we ask, what do they see instead? We analyze observers’ errors to show that crowding necessarily reflects the combination of information across multiple complex objects, rather than the mislocalization (or substitution) of one object for another. First, we presented single letters, randomly chosen, in noise in the periphery and tabulated a confusion matrix based on observers’ (n=3) reports. We then tested the same observers in a classic crowding task, in which they viewed a triplet (target and two flankers) of closely spaced letters in the periphery (10 deg) and reported the identity of the middle target. For each observer, we tailored the triplets based on that observer’s single-letter confusion matrix. One flanker was chosen to be a letter that was most confused with (most “similar” to) to the target, and the other was chosen to be a letter that was least confused (least similar). Consistent with the literature, when mistaken, observers tend to report the flankers. The crucial issue, however, is which of the two flankers observers report on these trials. Blind substitution predicts that the two flankers (similar and dissimilar) are equally likely to be reported. Instead, we find that observers are more likely to report the similar flanker (70%) than the dissimilar flanker (30%). The effect of similarity on erroneous responses proves that the response combines information from both the target and the reported flanker. By systematically tailoring the stimuli, we induced a bias in the reports that reveals a pooled, “mongrel-like,” underlying percept. Our method, applicable to any object, generalizes the evidence for “compulsory pooling” from the narrow domain of grating orientation (Parkes et al., 2001) to complex, everyday objects.

]]>
<![CDATA[Ganguli D, Freeman J, Rajashekar U, & Simoncelli EP (2010) Orientation statistics at fixation. resented at Vision Sciences Society conference in Naples, FL, May 2010 (Poster).]]> http://jeremyfreeman.net/publications/18 http://jeremyfreeman.net/publications/18 Fri, 14 May 2010 00:00:00 -0600 Eye movements are not random. When viewing images, human observers tend to fixate on regions that, on average, have higher contrast than randomly selected regions (Reinagel & Zador, 1999). We extend this analysis to the study of local orientation statistics via the “orientation tensor” (Granlund & Knutsson, 1994), computed as the 2x2 covariance matrix of local horizontal and vertical derivatives (i.e., the gradient vector) within an image patch. This may be converted into three natural parameters: energy, orientedness, and orientation. Energy is the total variance in the gradients, and is related to contrast; orientedness indicates the strength of the dominant orientation; orientation indicates the predominant orientation. We use an eye movement database (van der Linde et al., 2009) to measure the orientation tensor within local 1 deg image patches that are either fixated by human observers (n=29), or selected at random (by using fixations for a different, randomly chosen image). We then obtain image-specific log distributions of the three parameters of the orientation tensor. Averaged across all images and subjects, energy is higher in fixated patches, consistent with similar reports on contrast, but we do not observe such differences for orientation or orientedness. However, when we compare fixated and random distributions of these parameters on an image-by-image basis, we observe systematic differences. In particular, for the majority of images, the distribution of fixated patches, when compared to that of random patches from that image, is closer to the generic distribution averaged over all images. We use multi-variate techniques to characterize this effect across the database. We find that fixated distributions shift towards the generic distribution by about 10 to 20%, and the trend is significant for all three parameters. Our results suggest that when viewing a particular image, observers fixations are biased towards locations that reflect the typical orientation statistics of natural scenes.

]]>
<![CDATA[Wang H, Freeman J, Merriam EP, Hasson U, & Heeger DJ (2010) Temporal scramble disrupts eye movements to naturalistic stimuli. Presented at Vision Sciences Society conference in Naples, FL, May 2010 (Poster).]]> http://jeremyfreeman.net/publications/19 http://jeremyfreeman.net/publications/19 Fri, 14 May 2010 00:00:00 -0600 When viewing a scene, humans rapidly move their eyes to foveate visual features and objects of interest. In natural conditions, this process is temporally complex, yet little is known about how the temporal structure of naturalistic stimuli affects the dynamics of eye movements under free viewing. We tracked eye position while observers watched a 6-minute scene from a feature film that was shot as a continuous sequence (with no cuts). Consistent with previous reports (Hasson et al., J Neurosci, 2008), eye movements were highly reliable, both across repeated presentations and across observers. We then divided the scene into clips of various durations (ranging from 500 ms to 30 s) and scrambled the temporal order of the clips, thereby introducing cuts. Eye-movement reliability, quantified as the covariance between eye positions to the scrambled clips and those during the corresponding portions of the full-length scene, was found to increase as a power-law function of clip duration, from ~0 for the 500 ms clips to an asymptote for clips >30 s in duration. We developed a model that assumed that observers searched randomly following each cut, fixating at arbitrary locations until finding a target of interest and then tracking it faithfully. We fit the model to the data by analytically deriving the model’s prediction for the relationship between clip duration and eye movement reliability (covariance). While simple, this model fit the data well with only two free parameters (number of possible target locations, asymptotic covariance). However, the model fits exhibited a systematic bias at the shortest scramble durations. We conclude that exploratory fixations depend critically on the temporal continuity of stimuli, and that human observers might utilize a random search strategy when viewing naturalistic, time-varying stimuli.

]]>
<![CDATA[Freeman J & Simoncelli EP (2010) Metamers of the ventral stream. Presented at Computational and Systems Neuroscience (CoSyNe) conference in Salt Lake City, Utah, February 2010 (Talk).]]> http://jeremyfreeman.net/publications/15 http://jeremyfreeman.net/publications/15 Fri, 26 Feb 2010 15:28:31 -0700 How is image structure encoded in the extrastriate ventral visual pathway? Direct characterization of the stimulus selectivity of individual extrastriate cells has proven difficult. However, one robust population-level property of all visual areas is that receptive field sizes grow with eccentricity. It has also been reported (Gattass et al., 1988) that the rate of growth increases along the ventral stream. We hypothesize that this successive increase in pooling region size causes information loss. A well known example occurs in the retina, where spatial pooling in the periphery means that high spatial frequency information is lost. In general, stimuli that differ only in terms of information discarded by the visual system will be indistinguishable to a human observer. Such stimuli are called metamers. Here, we probe the population-level computations of the ventral stream using novel metameric stimuli. Starting from any prototype image, we generate stimuli that match in terms of the responses of a simple model for extrastriate ventral computation. The model is based on measurements previously used to characterize visual texture (Portilla & Simoncelli, 2000). The model decomposes an image using a bank of V1-like filters tuned for local orientation and spatial frequency, computing both simple and complex-cell responses. Extrastriate responses are then computed by taking pairwise products amongst these V1 responses, and averaging within overlapping spatial regions that grow with eccentricity. Stimuli are generated by using gradient descent to adjust a random (white noise) image to match the model responses of the original prototype. Previous work showed that the same statistics, averaged over an entire image, allow for the analysis and synthesis of homogenous visual textures.

If this model accurately reflects representations in early extrastriate areas, then images synthesized to produce identical model responses should be metameric to a human observer. For each of several natural images and pooling region sizes, we generate multiple samples that are statistically-matched but otherwise as random as possible. We use a standard psychophysical task to measure observers’ ability to discriminate between image samples, as a function of the rate at which the statistical pooling regions grow with eccentricity. When image samples are statistically matched within small pooling regions, observers perform at chance (50%), failing to notice substantial differences in the periphery. When images are matched within larger pooling regions, discriminability approaches 100%. We fit the psychometric function to estimate the pooling region over which the observer estimates statistics. The result is consistent with receptive field sizes in macaque mid-ventral areas (particularly V2).

Our model also fully instantiates a recently proposed explanation (Balas et al., 2009) of the phenomenon of “visual crowding”, in which humans fail to recognize a peripheral target object surrounded by background clutter. In our model, crowding occurs because multiple objects fall within the same pooling region and the model responses cannot uniquely identify the target object. We synthesize images that are metameric to classic crowding stimuli (e.g. groups of letters), and find that stimulus configurations that produce crowding yield synthesized images with jumbled, unidentifiable objects.

]]>
<![CDATA[Freeman J, Hallum LE, Landy MS, & Heeger DJ (2009) Hierarchical representation of naturalistic texture in cortex. Presented at Society for Neuroscience conference in Chicago, IL, October 2009 (Poster).]]> http://jeremyfreeman.net/publications/14 http://jeremyfreeman.net/publications/14 Sun, 18 Oct 2009 15:22:15 -0600 Cortical processing of complex visual information, like form, texture, and object identity, is thought to be hierarchical. Naturalistic textures are a useful tool for probing these hierarchical representations because they are well characterized by a hierarchical statistical model (Portilla & Simoncelli 2000). According to the model, image pairs will appear to be composed of the same texture if they match with respect to a small collection of image statistics. These statistics are based on a multi-scale, oriented image representation, and include both marginal statistics (e.g., energy in each band) and correlations (between neighboring filter responses in space, orientation and scale). Using this model, one can synthesize a large number of different naturalistic images that are texturally equivalent (with respect to the model’s statistics). We tested the hypothesis that the set of texture samples from a given natural texture category produce a reliable, unique and distributed pattern of fMRI responses across visual cortical areas.

We measured fMRI responses in multiple retinotopic areas while subjects viewed texture samples drawn from 6 different texture categories. Categories were based on texture images selected from the Brodatz database. These images were preprocessed to ensure that their power spectra, averaged over orientation, were equal. For each category, many samples were generated using the Portilla/Simoncelli model. On each trial (inter-stimulus interval: 3.5-6.5 s), we showed a 1 s burst at 5 Hz of random texture samples drawn from the same category, mitigating any response specific to a particular texture sample. Texture category was randomized across trials.  Stimuli were presented within an annulus (radii: 0.5 and 9 deg). To divert and control attention, subjects performed a difficult letter-identification task at fixation.

We used a multivariate pattern-classification analysis with leave-one-run-out cross validation to test whether distributed patterns of fMRI responses reliably differed between the texture categories. Classification was significantly above chance in multiple visual areas, including primary visual cortex (V1), and areas V2 and V4. Classification performance was more robust across texture categories in V2 and V4 than in V1. Also, classification performance in V1 was highest for those texture categories with larger orientation differences, but this was not the case for V2 and V4. We conclude that responses in striate and extra-striate cortex reflect the statistical properties that distinguish natural texture categories, and that higher-tier areas extract increasingly complex statistics.

]]>
<![CDATA[Freeman J, Donner TH, & Heeger DJ (2008) Interactions between human inferotemporal and early visual areas reflect feature integration. Presented at Society for Neuroscience conference in Washington, DC, November 2008 (Talk).]]> http://jeremyfreeman.net/publications/5 http://jeremyfreeman.net/publications/5 Wed, 19 Nov 2008 22:13:04 -0700 The visual system constructs object representations hierarchically. Features are detected in early visual areas, and then progressively integrated in the ventral visual pathway, with neurons at higher levels of processing showing selectivity to increasingly complex stimuli. Interactions between areas in this recurrent pathway should mediate feature integration. We tested this hypothesis by measuring inter-area interactions with and without disruptions of feature integration.

We used fMRI to measure the effect of letter crowding (which disrupts feature integration while preserving feature detection) on inter-area correlations in the ventral visual pathway. Observers viewed closely-spaced letters (8º eccentricity, presented at 1 Hz for blocks of 15-21 s, separated by 15-21 s blocks of no stimulation). Adjacent letters were displayed in alternation in the uncrowded condition and simultaneously in the crowded condition. During fMRI, observers performed a demanding contrast discrimination task at fixation to ensure that attention was diverted from the letter stimuli. In a separate psychophysics experiment, we confirmed that letter identification was impaired by crowding under these stimulus conditions. Retinotopic visual areas (including V1, V2, V3, V4) were defined in a separate session using standard procedures, and the visual word-form area (VWFA) in inferotemporal cortex was defined by measuring responses in alternation to English and Chinese character strings. Sub-regions of each retinotopic area were identified corresponding to the letter locations in the main experiment. To quantify interactions between cortical areas, we first removed the mean stimulus-driven response, separately for each cortical area and each condition (crowded, uncrowded), and then computed pairwise correlations between the residuals for each pair of cortical areas and both conditions. The residual correlations specifically reflected interactions between areas rather than common input from the stimulus.

Crowding reduced residual correlations between several pairs of visual areas, particularly between early visual areas (V1-V4) and VWFA. Differences in residual correlations occurred despite little or no differences between conditions (crowded, uncrowded) in either mean response or residual variance. Differences in residual correlations were eliminated when letters were replaced with Gabor stimuli, which are elementary features not requiring integration.

We conclude that interactions between visual areas, especially between early areas and higher inferotemporal areas, reflect feature integration during object processing.

]]>
<![CDATA[Freeman J, Donner TH, & Heeger DJ (2008) Inter-area correlations in the human ventral visual pathway Reflect feature integration. Presented at Vision Sciences Society conference in Naples, FL, May 2008 (Poster).]]> http://jeremyfreeman.net/publications/6 http://jeremyfreeman.net/publications/6 Mon, 19 May 2008 22:14:54 -0600 The neural representation of an object is constructed hierarchically. Elementary features are detected in early visual cortex and are progressively integrated throughout higher levels of processing. This hierarchical computation should be reflected in interactions between lower and higher visual areas. Furthermore, disrupting feature integration should reduce these neural interactions. We used letter crowding to disrupt feature integration, testing whether crowding affects the interactions between visual cortical areas. We used fMRI to measure neural activity in multiple visual areas while observers viewed closely-spaced letters (8º eccentricity, presented at 1 Hz for blocks of 15-21 s, separated by 15-21 s blocks of no stimulation). Adjacent letters were displayed in alternation in the uncrowded condition and simultaneously in the crowded condition. We performed a control psychophysics experiment to confirm that letter identification was impaired by crowding under these stimulus conditions. However, during the fMRI experiment, observers performed a demanding contrast discrimination task at fixation to ensure that attention was diverted from the letter stimuli. In each observer, we defined sub-regions in retinotopic visual areas that corresponded to the letter locations, and the visual word-form area (VWFA) in inferotemporal cortex. For each area, we removed the mean response using orthogonal projection, and then computed pairwise correlations between the residuals for both conditions. We found that crowding reduced correlations between several pairs of visual areas. This effect was particularly strong between retinotopic visual areas (V1, V2, V4) and VWFA. This effect was eliminated in a control experiment using Gabor patches, which are elementary features that do require feature integration. Crowding reduces correlations between early visual areas and higher visual areas, presumably by disrupting feature integration. We conclude that interactions between early feature-selective areas and higher object-selective areas reflect the feature integration process.

]]>
<![CDATA[Huk A, Freeman J, & Durgin FH (2007) Motion capture is motion integration. Journal of Vision, 7(9):397, 397a. Presented at Vision Sciences Society conference in Sarasota, FL, June 2007 (Poster).]]> http://jeremyfreeman.net/publications/8 http://jeremyfreeman.net/publications/8 Tue, 19 Jun 2007 22:18:07 -0600 The phenomenon of “motion capture” has been described as a suppression of the motion of high spatial-frequency (SF) elements in favor of the motion of low SF components, allowing larger forms to capture their smaller features when they move (e.g., Ramachandran, 1990). However, recent evidence suggests that motion capture might not reflect active suppression of high SF signals, but instead is simply an instance of weighted combination of high and low SF signals (Durgin, Freeman, Huk, VSS 2005). Here we show that increasing the contrast (and thus the motion energy) of high SF components allows them to “capture” the low SF components, thus arguing against the need for the specialized motion capture mechanism originally posited. We utilized displays in which we independently manipulated the contrast and speeds of two superimposed motion components, consisting of randomly scattered 2D Gaussian elements of two different sizes (low-SF and high-SF). Participants were instructed to judge the speed of either the high-SF or low-SF elements relative to a standard speed (2.5 deg/sec). The irrelevant (unjudged) elements moved either 75%, 100%, or 125% of the standard speed. The contrast of the low-SF elements was always 20%; the contrast of the high-SF elements was either 20% or 80%. We measured the actual speed of the relevant elements necessary to perceptually match the standard speed (for each irrelevant element speed). When nominal contrasts were equal, the low-SF elements exerted larger effects on the perceived speed of the high-SF elements than vice versa. But when the high-SF elements were higher in contrast the relative weights reversed. A generic model of weighted motion integration accounts for both traditional motion capture as well as for the reversed capture of low SF components by high SF elements.

]]>
<![CDATA[Freeman J, & Pelli DG (2007) Attention can relieve crowding. Journal of Vision, 7(9):330, 330a. Presented at Vision Sciences Society conference in Sarasota, FL, June 2007 (Talk).]]> http://jeremyfreeman.net/publications/7 http://jeremyfreeman.net/publications/7 Tue, 19 Jun 2007 22:15:53 -0600 Crowding occurs when nearby flankers hinder the identification of a target object. Crowding is feature integration over an inappropriately large region, but what determines the size of that region? According to bottom-up proposals the size is that of an anatomically determined isolation field. According to top-down proposals the size is that of the spotlight of attention. Here we investigate the role of attention in crowding using the change blindness paradigm. We measure capacity for widely and narrowly spaced letters during a change detection task, with or without an inter-stimulus cue. Consistent with both bottom-up and top-down accounts, we find that standard crowding manipulations – reducing spacing and adding flankers – impair uncued change detection performance. However, the same crowding manipulations fail to impair cued change detection. The cue relieves crowding. This has been sought but never found before. We suppose that object recognition consists of several steps: detection, segmentation, and integration. Perhaps feature detection is always followed by a coarse, pre-attentive segmentation, which is sometimes followed by a further, finer segmentation (triggered by an inter-stimulus cue), before the final integration of all the features within each segment.

]]>
<![CDATA[Durgin FH, Freeman J, & Huk A (2006) Reciprocal interaction between high and low frequencies in the perception of motion. Journal of Vision, 6(6):574, 574a. Presented at Vision Sciences Society conference in Sarasota, FL, June 2006 (Poster).]]> http://jeremyfreeman.net/publications/9 http://jeremyfreeman.net/publications/9 Mon, 19 Jun 2006 22:20:53 -0600 The phenomenon of “motion capture” posits that coherent object motions are estimated after discarding high spatial frequencies. Here we show that “captured” dot motion signals are not discarded, but are instead combined with the low-frequency carrier signal. Although low-frequency carrier gratings influence the perceived speed of superimposed high-frequency dot elements (consistent with capture), the speed of the dot elements also affects the perceived speed of the carrier grating. Stimuli were vertical gratings (0.5 cpd; 40% peak contrast) with superimposed random dots (20 dots/deg^2; 40% peak contrast) that moved coherently in a Gaussian window. Dot and grating speed were varied independently. Seven frames of motion were shown at 10 Hz. Naïve subjects compared test stimuli to a remembered standard (2.67 deg/s). Subjects were instructed to judge either dot or grating speed exclusively. For grating trials, superimposed dot motions of 2.00, 2.67 or 3.33 deg/s yielded average PSEs of 3.21, 2.91, and 2.71 deg/s of grating motion, respectively. This is consistent with dot speed having a weight of about 0.25 in the estimation of grating motion. For dot trials, carrier gratings with speeds of 2.00, 2.67, or 3.33 deg/s yielded average PSEs of 3.43, 2.89 and 2.53 deg/s of dot motion, consistent with a 0.40 weight for grating speed in the estimation of dot speed. We conclude that motion mechanisms estimate coherent motion by integrating across a wide spectral range (with limited task-dependent tuning). Motion capture phenomena probably do not depend on selective capture (or inhibition) of high-frequency signals by low-frequency carriers.

]]>
<![CDATA[Freeman J (2009) "The eye of the beholder." The National.]]> http://jeremyfreeman.net/publications/10 http://jeremyfreeman.net/publications/10 Mon, 19 Jan 2009 22:22:53 -0700 Review of Simon Ing’s “A Natural History of Seeing: The Art and Science of Vision.”

]]>
<![CDATA[Freeman J (2008) "Mind games." The National.]]> http://jeremyfreeman.net/publications/11 http://jeremyfreeman.net/publications/11 Sat, 19 Jul 2008 22:23:29 -0600 Review of George Lakoff’s “The Political Mind: Why You Can’t Understand 21st Century American Politics with an 18th Century Brain.”

]]>