Saliency Benchmarking

I am currently running the MIT Saliency Benchmark. We are constantly updating the results page with new models, and will continue to refine the way models are evaluated to give the most accurate sense of progress in the field of saliency modeling.

Bylinskii, Z., Judd, T., Borji, A., Itti, L., Durand, F., Oliva, A., and Torralba, A. "MIT Saliency Benchmark", available at:

Can we model context and observer effects on memorability?

Bylinskii, Z., Isola, P., Bainbridge, C., Torralba, A., Oliva, A. "Intrinsic and Extrinsic Effects on Image Memorability", under review at Vision Research. FIGRIM (FIne GRained Image Memorability) dataset coming soon

What makes a visualization memorable, comprehensible, and effective?

A collaboration with Harvard University's visualization group, this line of work aims to understand how people interact with data visualizations (graphs, charts, infographics, etc.). We are interested in answering the questions: which visualizations are easily remembered and why? What information can people quickly extract from visualizations? How can we measure comprehensibility? Does chart junk help or hinder understanding and memorability? What do people pay the most attention to?

Kim, N.W., Bylinskii, Z., Borkin, A.M., Oliva, A., Gajos, K.Z., and Pfister, H., 2015, "A Crowdsourced Alternative to Eye-tracking for Visualization Understanding", CHI'15 Extended Abstracts (accepted).

pdf    poster   

Borkin, M., Vo, A., Bylinskii, Z., Isola, P., Sunkavalli, S., Oliva, A., and Pfister, H., 2013, "What Makes a Visualization Memorable?", IEEE Transactions on Visualization and Computer Graphics (Proceedings of InfoVis 2013).

pdf    media coverage

More to come on this front!

Image Memorability in the Eye of the Beholder

We are using gaze paths and pupilometry data to measure human behavior during image memory tasks. Can we predict whether an image is being remembered or if it is forgotten, by looking at the observer's eyes? We think so!

Our initial pupilometry findings were presented at:

Vo, M., Gavrilov, Z., and Oliva, A. (2013) Image Memorability in the Eye of the Beholder: Tracking the Decay of Visual Scene Representations. Vision Sciences Society. (oral)


Results to be published...

Are all training examples equally valuable?

When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others. The goal of this paper is to bring to the attention of the vision community the following considerations: (1) some examples are better than others for training detectors or classifiers, and (2) in the presence of better examples, some examples may negatively impact performance and removing them may be beneficial. In this paper, we propose an approach for measuring the training value of an example, and use it for ranking and greedily sorting examples. We test our methods on different vision tasks, models, datasets and classifiers. Our experiments show that the performance of current state-of-the-art detectors and classifiers can be improved when training on a subset, rather than the whole training set.

Lapedriza, A., Pirsiavash, H., Bylinskii, Z., Torralba, A. (2013) arXiv:1311.6510 [cs.CV]

Detecting Reduplication in Videos of American Sign Language

Abstract: A framework is proposed for the detection of reduplication in digital videos of American Sign Language (ASL). In ASL, reduplication is used for a variety of linguistic purposes, including overt marking of plurality on nouns, aspectual inflection on verbs, and nominalization of verbal forms. Reduplication involves the repetition, often partial, of the articulation of a sign. In this paper, the apriori algorithm for mining frequent patterns in data streams is adapted for finding reduplication in videos of ASL. The proposed algorithm can account for varying weights on items in the apriori algorithmís input sequence. In addition, the apriori algorithm is extended to allow for inexact matching of similar hand motion subsequences and to provide robustness to noise. The formulation is evaluated on 105 lexical signs produced by two native signers. To demonstrate the formulation, overall hand motion direction and magnitude are considered; however, the formulation should be amenable to combining these features with others, such as hand shape, orientation, and place of articulation.

Gavrilov, Z., Sclaroff, S., Neidle, C., Dickinson, S.

Published in: Proc. Eighth International Conf. on Language Resources and Evaluation (LREC), 2012.

pdf    poster

Skeletal Part Learning for Efficient Object Indexing

Ongoing work:

The goal of this project is to construct an indexing and matching framework operating on the graph encodings of object shapes. A parts-based indexing mechanism has greater robustness to occlusion and part articulation, while the graph-based representation provides angle and size invariance. The idea is pair-wise matching object graphs to extract common recurring subgraphs which then constitute the part vocabulary. Given a novel query object, its graph can be matched to the parts which vote for object hypotheses. Classifiers can additionally be used to learn associations between object categories and object-to-part similarity values.

Gavrilov, Z., Macrini, D., Zemel, R., Dickinson, S.

Funded by the Natural Sciences and Engineering Research Council of Canada via the Undergraduate Summer Research Award - NSERC USRA (2010,2011,2012)