Language and Vision Ambiguities (LAVA) Corpus

Language and Vision Ambiguities (LAVA) is a multimodal corpus that supports the study of ambiguous language grounded in vision. The corpus contains ambiguous sentences coupled with visual scenes that depict the different interpretations of each sentence. LAVA sentences cover a wide range of linguistic ambiguities, including PP and VP attachment, conjunctions, logical form, anaphora and ellipsis.


Sentence Visual Setup Video Image Syntactic Parses Semantic Parses
Danny approached the chair with a yellow bag.
  1. Danny with bag
  2. Chair with bag
  1. λx.λy.λz.person(x)∧chair(y)∧bag(z)∧yellow(z)∧has(x,z)∧approach(x,y)
  2. λx.λy.λz.person(x)∧chair(y)∧bag(z)∧yellow(z)∧has(y,z)∧approach(x,y)
Danny looked at Andrei picking-up a yellow bag.
  1. Danny picking-up bag
  2. Andrei picking-up bag
  1. λx.λy.λz.yellow(x)∧bag(x)∧person(y)∧person(z)∧look-at(y,z)∧pick-up(y,x)
  2. λx.λy.λz.yellow(x)∧bag(x)∧person(y)∧person(z)∧look-at(y,z)∧pick-up(z,x)


This corpus is available to the public here.


Yevgeni Berzak, Andrei Barbu, Daniel Harari, Boris Katz, and Shimon Ullman (2015). Do You See What I Mean? Visual Resolution of Linguistic Ambiguities. Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. [PDF]


This material is based upon work supported by the Center for Brains, Minds, and Machines (CBMM), funded by NSF STC award CCF-1231216.