"Toward a universal decoder of linguistic meaning from brain activation"
Datasets (MATLAB)
Experiments
The results described in the text were obtained on data from three separate imaging experiments: 180 sentences (Experiment 1), 384 sentences (Experiment 2), and 243 sentences (Experiment 3). Experiment 1 has three variants: sentence
stimuli, picture (+word) stimuli, and word cloud stimuli.
Data
Experiment 1 was carried out on all participants, whereas Experiments 2 and 3 were only carried out on a subset of them; these were participants where we obtained good decoding results on Experiment 1, if and when they were available. The
links below will allow you to download all the data available for each participant, in MATLAB format (described in the next section).
- P01 (experiment 1, 2, and 3)
- M01 (experiment 1)
- M02 (experiment 1, 2, and 3)
- M03 (experiment 1 and 3)
- M04 (experiment 1, 2, and 3)
- M05 (experiment 1)
- M06 (experiment 1)
- M07 (experiment 1, 2, and 3)
- M08 (experiment 1 and 2)
- M09 (experiment 1 and 2)
- M10 (experiment 1)
- M13 (experiment 1)
- M14 (experiment 1 and 2)
- M15 (experiment 1, 2, and 3)
- M16 (experiment 1)
- M17 (experiment 1)
Stimuli and semantic vectors
The stimuli used for the three experiments were
-
180 concepts (presented embedded in sentences, together with pictures, or surrounded by related words)
-
384 sentences (as presented) and 384
sentences (after dereferencing pronouns or making implicit nouns explicit, used in generating semantic vectors).
-
243 sentences (as presented) and 243
sentences (after dereferencing pronouns or making implicit nouns explicit, used in generating semantic vectors).
The sentences in experiments 2 and 3 belong to 96 and 72 passages, respectively, and the information below explains how to retrieve that information from the distributed data.
The corresponding GloVe semantic vectors (300-dimensional, 42B token version) are
- 180 concepts (vectors for words naming the concepts)
- 384 sentences (average vector for content words in each sentence)
- 243 sentences (average vector for content words in each sentence)
The content words were all the words left after removing stopwords from sentences with dereferenced pronouns; words were not singularized, so different vectors
were used for singular and plural of each word.
Format
Experiment 1
Once you unpack the tar file for each participant, you will get a directory containing a separate file for each MATLAB experiment design. We will provide examples using the data from participant P01, as they have been scanned on all three
experiments. In folder P01 you will find
- examples_180concepts_sentences.mat
- examples_180concepts_pictures.mat
- examples_180concepts_wordclouds.mat
- examples_384sentences.mat
- examples_243sentences.mat
Let us begin with a file from experiment 1. If you type
load examples_180concepts_sentences.mat
whos
you will see the following variables
examples 180x201011
keyConcept 180x1
labelsConcept 180x1
labelsConcreteness 180x1
meta 1x1
The matrix examples contains beta coefficient images for each of 180 concepts, obtained by deconvolution of the imaging data described in the paper. The corresponding concepts are listed in keyConcept. The
concreteness ratings for the word naming each concept are provided in labelsConcreteness (5-point scale going from abstract to concrete, more details in [Brysbaert, M., Warriner, A.B., & Kuperman, V. "Concreteness ratings
for 40 thousand generally known English word lemmas". Behavior Research Methods, 46, 904-911 (2014)]).
The matrix examples has as many columns as there are voxels in a whole-head mask; for efficiency, we do not store all voxels in each 3D volume. The information needed to map between each column and the 3D position of the
corresponding voxel is stored in fields of the meta structure:
dimensions - the dimensions of the imaging volume
dimx,dimy,dimz - same, if you quickly want to get each one
indicesIn3D - m element vector, the linear indices into a [dimx] x [dimy] x [dimz] matrix of the value 1 voxels in the mask.
For example, to reconstruct the 3D volume for "apartment" (4th row of examples), you would do
volume = zeros(meta.dimensions);
volume(meta.indicesIn3D) = examples(4,:);
To get information about each voxel specifically, you can use these fields of meta
colToCoord - #voxels x 3 matrix
coordToCol - dimx x dimy x dimz matrix
You can use vector = meta.coordToCol(i,:) to place the 3D coordinates of voxel i in a vector. Conversely, you can use i = meta.colToCoord(x,y,z) to get the column corresponding to the voxel with 3D
coordinates x, y, and z.
The remaining fields of meta
roiMultimaskAAL - vector containing ROI indices of each voxel in the AAL atlas (or 0 if not in the atlas)
roiMultimaskGordon - vector containing ROI indices of each voxel in the Gordon atlas (or 0 if not in the atlas)
roisAAL - key of ROI names in the AAL atlas
roisGordon - key of ROI names in the Gordon atlas
roiColumnsAAL - columns of voxels in each ROI in the AAL atlas (one ROI per cell)
roiColumnsGordon - columns of voxels in each ROI in the Gordon atlas (one ROI per cell)
are used to construct smaller versions of the examples matrix segmented according to one of two atlases:
-
AAL - [Tzourio-Mazoyer, N. et al., "Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain". Neuroimage, 15(1), 273-289 (2002)]
-
Gordon - [Gordon, E.M. et al. "Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations". Cereb. Cortex 26, 288-303 (2016)]
The following steps will create the reduced dataset for the Gordon 2016 atlas used in the paper:
indicesInGordon = find(meta.roiMultimaskGordon);
examplesGordon = examples(:,indicesInGordon);
voxelROI = meta.roiMultimaskGordon(indicesInGordon);
and you can use voxelROI to identify the ROI each voxel came from (via meta.roisGordon).
Experiments 2 and 3
The files for experiments 2 and 3 have a similar format to those of experiment 1. If you load the file for experiment 2
load examples_384sentences.mat;
whos
you will see the following variables
examples 384x201011s
keyPassageCategory 1x24
keyPassages 96x1
keySentences 384x1
labelsPassageCategory 96x1
labelsPassageForEachSentence 384x1
labelsSentences 384x1
meta 1x1
The matrix examples is similar to that described for experiment 1; the main difference is having 384 rows, corresponding to 384 sentences. The struct meta can be used to reduce the dataset and obtain voxel
locations in different atlases, as described above.
There are three separate sets of numeric labels: labelsSentences (which sentence, key in keySentences), labelsPassageForEachSentence (which passage each sentence came from, passage topic key in
keyPassages), and labelsPassageCategory (the superordinate topic in experiment 2, the topic in experiment 3, since there are multiple passages with the same topic; key in keyPassageCategory).
Datasets (processed NIfTI data)
Processed datasets in NIfTI format will be made available at a future date.