"Toward a universal decoder of linguistic meaning from brain activation"

Datasets (MATLAB)

Experiments

The results described in the text were obtained on data from three separate imaging experiments: 180 sentences (Experiment 1), 384 sentences (Experiment 2), and 243 sentences (Experiment 3). Experiment 1 has three variants: sentence stimuli, picture (+word) stimuli, and word cloud stimuli.

Data

Experiment 1 was carried out on all participants, whereas Experiments 2 and 3 were only carried out on a subset of them; these were participants where we obtained good decoding results on Experiment 1, if and when they were available. The links below will allow you to download all the data available for each participant, in MATLAB format (described in the next section).

Stimuli and semantic vectors

The stimuli used for the three experiments were
  1. 180 concepts (presented embedded in sentences, together with pictures, or surrounded by related words)
  2. 384 sentences (as presented) and 384 sentences (after dereferencing pronouns or making implicit nouns explicit, used in generating semantic vectors).
  3. 243 sentences (as presented) and 243 sentences (after dereferencing pronouns or making implicit nouns explicit, used in generating semantic vectors).
The sentences in experiments 2 and 3 belong to 96 and 72 passages, respectively, and the information below explains how to retrieve that information from the distributed data. The corresponding GloVe semantic vectors (300-dimensional, 42B token version) are
  1. 180 concepts (vectors for words naming the concepts)
  2. 384 sentences (average vector for content words in each sentence)
  3. 243 sentences (average vector for content words in each sentence)
The content words were all the words left after removing stopwords from sentences with dereferenced pronouns; words were not singularized, so different vectors were used for singular and plural of each word.

Format

Experiment 1

Once you unpack the tar file for each participant, you will get a directory containing a separate file for each MATLAB experiment design. We will provide examples using the data from participant P01, as they have been scanned on all three experiments. In folder P01 you will find Let us begin with a file from experiment 1. If you type
	  load examples_180concepts_sentences.mat
	  whos
	
you will see the following variables
	  examples              180x201011
	  keyConcept            180x1
	  labelsConcept         180x1
	  labelsConcreteness    180x1
	  meta                  1x1
	
The matrix examples contains beta coefficient images for each of 180 concepts, obtained by deconvolution of the imaging data described in the paper. The corresponding concepts are listed in keyConcept. The concreteness ratings for the word naming each concept are provided in labelsConcreteness (5-point scale going from abstract to concrete, more details in [Brysbaert, M., Warriner, A.B., & Kuperman, V. "Concreteness ratings for 40 thousand generally known English word lemmas". Behavior Research Methods, 46, 904-911 (2014)]).

The matrix examples has as many columns as there are voxels in a whole-head mask; for efficiency, we do not store all voxels in each 3D volume. The information needed to map between each column and the 3D position of the corresponding voxel is stored in fields of the meta structure:

      dimensions     - the dimensions of the imaging volume
      dimx,dimy,dimz - same, if you quickly want to get each one
      indicesIn3D    - m element vector, the linear indices into a [dimx] x [dimy] x [dimz] matrix of the value 1 voxels in the mask.
For example, to reconstruct the 3D volume for "apartment" (4th row of examples), you would do
  volume = zeros(meta.dimensions);
  volume(meta.indicesIn3D) = examples(4,:);
To get information about each voxel specifically, you can use these fields of meta
     colToCoord     - #voxels x 3 matrix
     coordToCol     - dimx x dimy x dimz  matrix
You can use vector = meta.coordToCol(i,:) to place the 3D coordinates of voxel i in a vector. Conversely, you can use i = meta.colToCoord(x,y,z) to get the column corresponding to the voxel with 3D coordinates x, y, and z.

The remaining fields of meta
    roiMultimaskAAL - vector containing ROI indices of each voxel in the AAL atlas (or 0 if not in the atlas)
    roiMultimaskGordon - vector containing ROI indices of each voxel in the Gordon atlas (or 0 if not in the atlas)
    roisAAL - key of ROI names in the AAL atlas
    roisGordon - key of ROI names in the Gordon atlas
    roiColumnsAAL - columns of voxels in each ROI in the AAL atlas (one ROI per cell)
    roiColumnsGordon - columns of voxels in each ROI in the Gordon atlas (one ROI per cell)
are used to construct smaller versions of the examples matrix segmented according to one of two atlases: The following steps will create the reduced dataset for the Gordon 2016 atlas used in the paper:
  indicesInGordon = find(meta.roiMultimaskGordon);
  examplesGordon = examples(:,indicesInGordon);
  voxelROI = meta.roiMultimaskGordon(indicesInGordon);
and you can use voxelROI to identify the ROI each voxel came from (via meta.roisGordon).

Experiments 2 and 3

The files for experiments 2 and 3 have a similar format to those of experiment 1. If you load the file for experiment 2
          load examples_384sentences.mat;
	  whos
you will see the following variables
          examples                          384x201011s
          keyPassageCategory                  1x24
          keyPassages                        96x1
          keySentences                      384x1
          labelsPassageCategory              96x1
          labelsPassageForEachSentence      384x1
          labelsSentences                   384x1
          meta                              1x1
	
The matrix examples is similar to that described for experiment 1; the main difference is having 384 rows, corresponding to 384 sentences. The struct meta can be used to reduce the dataset and obtain voxel locations in different atlases, as described above.

There are three separate sets of numeric labels: labelsSentences (which sentence, key in keySentences), labelsPassageForEachSentence (which passage each sentence came from, passage topic key in keyPassages), and labelsPassageCategory (the superordinate topic in experiment 2, the topic in experiment 3, since there are multiple passages with the same topic; key in keyPassageCategory).

Datasets (processed NIfTI data)

Processed datasets in NIfTI format will be made available at a future date.