Exercise IX: The Haxby Experiment (2001)

Exercise IX: The Haxby Experiment (2001)#

Thie exercise is based on an example from nilearn’s documentation.

The Haxby Experiment (2001) [1] shows that the representation of objects from different categories is distributed and overlaps over brain regions that were broadly considered to be “specialized” to particular stimuli categories (i.e. the fusiform area for faces and the parahippocampal place area for places).

Cats Faces Arrow [Decoding] Mask Classification

For the purposes of this exercise, we will predict stimulus categories based on the signal acquired from the ventral temporal cortex alone. This demonstration will show that while there was some expected variation, classification performance was relatively high across categories.

Let’s start things off by using nilearn’s datasets module to download a single subjects dataset:

from nilearn import datasets

haxby_data = datasets.fetch_haxby()
[_add_readme_to_default_data_locations] Added README.md to /home/runner/nilearn_data
[get_dataset_dir] Dataset created in /home/runner/nilearn_data/haxby2001
[fetch_single_file] Downloading data from https://www.nitrc.org/frs/download.php/7868/mask.nii.gz ...
[fetch_single_file]  ...done. (0 seconds, 0 min)

[fetch_single_file] Downloading data from http://data.pymvpa.org/datasets/haxby2001/MD5SUMS ...
[fetch_single_file]  ...done. (0 seconds, 0 min)

[fetch_single_file] Downloading data from http://data.pymvpa.org/datasets/haxby2001/subj2-2010.01.14.tar.gz ...
[_chunk_report_] 
Downloaded 127180800 of 291168628 bytes (43.7%%,    1.3s remaining)
[_chunk_report_] 
Downloaded 279027712 of 291168628 bytes (95.8%%,    0.1s remaining)
[fetch_single_file]  ...done. (2 seconds, 0 min)
[uncompress_file] Extracting data from /home/runner/nilearn_data/haxby2001/9cabe068089e791ef0c5fe930fc20e30/subj2-2010.01.14.tar.gz...
[uncompress_file] .. done.
haxby_data
{'anat': ['/home/runner/nilearn_data/haxby2001/subj2/anat.nii.gz'],
 'func': ['/home/runner/nilearn_data/haxby2001/subj2/bold.nii.gz'],
 'session_target': ['/home/runner/nilearn_data/haxby2001/subj2/labels.txt'],
 'mask_vt': ['/home/runner/nilearn_data/haxby2001/subj2/mask4_vt.nii.gz'],
 'mask_face': ['/home/runner/nilearn_data/haxby2001/subj2/mask8b_face_vt.nii.gz'],
 'mask_house': ['/home/runner/nilearn_data/haxby2001/subj2/mask8b_house_vt.nii.gz'],
 'mask_face_little': ['/home/runner/nilearn_data/haxby2001/subj2/mask8_face_vt.nii.gz'],
 'mask_house_little': ['/home/runner/nilearn_data/haxby2001/subj2/mask8_house_vt.nii.gz'],
 'mask': '/home/runner/nilearn_data/haxby2001/mask.nii.gz',
 'description': '.. _haxby_dataset:\n\nHaxby dataset\n=============\n\nAccess\n------\nSee :func:`nilearn.datasets.fetch_haxby`.\n\nNotes\n-----\nResults from a classical :term:`fMRI` study that investigated the differences between\nthe neural correlates of face versus object processing in the ventral visual\nstream. Face and object stimuli showed widely distributed and overlapping\nresponse patterns.\n\nSee :footcite:t:`Haxby2001`.\n\nContent\n-------\nThe "simple" dataset includes:\n    :\'func\': Nifti images with bold data\n    :\'session_target\': Text file containing run data\n    :\'mask\': Nifti images with employed mask\n    :\'session\': Text file with condition labels\n\nThe full dataset additionally includes\n    :\'anat\': Nifti images with anatomical image\n    :\'func\': Nifti images with bold data\n    :\'mask_vt\': Nifti images with mask for ventral visual/temporal cortex\n    :\'mask_face\': Nifti images with face-reponsive brain regions\n    :\'mask_house\': Nifti images with house-reponsive brain regions\n    :\'mask_face_little\': Spatially more constrained version of the above\n    :\'mask_house_little\': Spatially more constrained version of the above\n\nReferences\n----------\n\n.. footbibliography::\n\nFor more information see:\nPyMVPA provides a tutorial using this dataset :\nhttp://www.pymvpa.org/tutorial.html\n\nMore information about its structure :\nhttp://dev.pymvpa.org/datadb/haxby2001.html\n\n\nLicense\n-------\nunknown\n'}

Review the dataset’s description:

print(haxby_data.description.decode())
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 print(haxby_data.description.decode())

AttributeError: 'str' object has no attribute 'decode'

Visualization#

fMRI data is generally represented as a 4D matrix of 3D EPI scans over time.

To read the fMRI volume from a .nii.gz file we’ll use nibabel’s load() function:

from nilearn import plotting
import nibabel as nib

fmri_path = haxby_data.func[0]
fmri_data = nib.load(fmri_path)
fmri_data.shape
(40, 64, 64, 1452)

To plot a 3D image representing the mean fMRI signal over time for the single subject in the dataset:

from nilearn.image import mean_img

fmri_mean_epi = mean_img(fmri_data)
plotting.view_img(fmri_mean_epi, threshold=None)

Masking#

In order to analyze the data, we need to produce a matrix containing the measured signal in the ventral temporal (VT) cortex over time.

First, let’s have a look at the provided mask:

import matplotlib.pyplot as plt

mask = haxby_data.mask_vt[0]
anatomical = haxby_data.anat[0]

figure = plt.figure(figsize=(14, 5))
_ = plotting.plot_roi(mask,
                      bg_img=anatomical,
                      cmap="spring",
                      black_bg=False,
                      figure=figure)
../../_images/2d2bb872e1f60d356ddf95b24fe8ca7749fcd7aa6caffa303b5237d22489eddc.png

Now, we will use this ROI in order to mask our 4D fMRI data:

Masking illustration

from nilearn.input_data import NiftiMasker

masker = NiftiMasker(mask_img=mask, standardize=True)
masked_data = masker.fit_transform(fmri_data)

The result containts time points as rows and voxels as columns:

masked_data.shape
(1452, 464)

For example, to plot the change in the measured signal in some 5 voxels over time:

START_VOXEL, END_VOXEL = 0, 5
SIGNAL_PLOT_CONFIGURATION = {
    "title": "Voxel Time Series",
    "xlabel": "Scan Number",
    "ylabel": "Normalized Signal",
    "xlim": (0, len(masked_data))
}

fig, ax = plt.subplots(figsize=(15, 6))
ax.plot(masked_data[:, START_VOXEL:END_VOXEL], linewidth=0.75)
_ = ax.set(**SIGNAL_PLOT_CONFIGURATION)
../../_images/3fc61abec48f045c0a9b5467bf7dd5b5be43f49daba21963678e2f78a4f66af7.png

Prediction#

Let’s try and predict which stimulus category the subject was viewing based on the masked fMRI data.

Reading the labels#

The session_target file contains a labels column, describing the stimulus category at each presentation block, as well as the session (chunks) out of 12 sessions presented to each subjects.

import pandas as pd

LABEL_COLUMNS = {'labels': 'Label', 'chunks': 'Session'}

labels_path = haxby_data.session_target[0]
labels = pd.read_csv(labels_path, delimiter=' ')
labels.rename(columns=LABEL_COLUMNS, inplace=True)
labels
Label Session
0 rest 0
1 rest 0
2 rest 0
3 rest 0
4 rest 0
... ... ...
1447 rest 11
1448 rest 11
1449 rest 11
1450 rest 11
1451 rest 11

1452 rows × 2 columns

Hide code cell source
import numpy as np

LABEL_COLORS = {
    "rest": "white",
    "scissors": "grey",
    "face": "red",
    "cat": "orange",
    "shoe": "black",
    "house": "cyan",
    "scrambledpix": "blue",
    "bottle": "green",
    "chair": "brown"
}


def legend_without_duplicate_labels(ax):
    handles, labels = ax.get_legend_handles_labels()
    unique = [(h, l) for i, (h, l) in enumerate(zip(handles, labels))
              if l not in labels[:i]]
    ax.legend(*zip(*unique), bbox_to_anchor=(1.15, 1))


category_fig, category_ax = plt.subplots(figsize=(14, 3))
for label in labels.Label.unique():
    y1 = np.zeros(len(labels))
    y2 = np.ones(len(labels)) * (labels.Label == label)
    category_ax.fill_between(labels.index,
                             y1,
                             y2,
                             color=LABEL_COLORS[label],
                             label=label)
for session_id in labels.Session.unique():
    color = "black" if session_id % 2 == 0 else "white"
    y1 = np.ones(len(labels))
    y2 = np.ones(len(labels))
    y2[labels.Session == session_id] = 2
    category_ax.fill_between(labels.index, y1, y2, color=color)
legend_without_duplicate_labels(category_ax)
_ = category_ax.set(title="Stimulus Categories and Sessions Over Time",
                    yticks=[],
                    xlabel="Time")
../../_images/3f15c9494f01f5bc741207b059e599ab32393ba7f95ae5822144ac104ede3c84.png

Let’s create a couple of references to the given data for comfort and readability.

is_task = labels["Label"] != 'rest'

task_data = masked_data[is_task]
task_labels = labels["Label"][is_task]
task_sessions = labels["Session"][is_task]

categories = task_labels.unique()

Logistic Regression#

To grid search across \(\ell_1\) and \(\ell_2\) with various C values and score based on the ROC AUC:

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV, LeaveOneGroupOut

SOLVER = "liblinear"  # See note 1
model = LogisticRegression(random_state=0, solver=SOLVER)
C = [10**i for i in range(-2, 3)]
PARAM_GRID = {"C": C, "penalty": ["l1", "l2"]}
SCORING = "roc_auc"

searcher = GridSearchCV(model,
                        param_grid=PARAM_GRID,
                        scoring=SCORING,
                        cv=LeaveOneGroupOut())  # See note 2

Note

sklearn offers a variety of different solvers for logistic regression, in our case we need to make sure we use one that supports both \(\ell_1\) and \(\ell_2\) regularization. For more information, see the LogistricRegression model’s user guide.

Note

The fMRI data is acquired by sessions, and the noise is autocorrelated in a given session. Hence, it is better to predict across sessions when doing cross-validation.

Our goal now it to evaluate how well we manage to classify the target category of stimuli based on the signal from the verntral temporal cortex alone:

# Create a DataFrame to store the results
scores = pd.DataFrame(index=categories, columns=["Penalty", "C", "AUC"])

# Iterate categories and updates best scores
for category in categories:
    target = task_labels == category
    searcher.fit(
        X=task_data,
        y=target,
        groups=task_sessions,
    )
    cv_results = searcher.cv_results_
    scores.loc[category, "Penalty"] = searcher.best_estimator_.penalty
    scores.loc[category, "C"] = searcher.best_estimator_.C
    scores.loc[category, "AUC"] = searcher.best_score_
scores
Penalty C AUC
scissors l2 10 0.912551
face l1 1 0.986919
cat l2 100 0.961052
shoe l2 1 0.925779
house l1 0.1 0.998089
scrambledpix l1 10 0.989418
bottle l1 1 0.898883
chair l1 10 0.93107

Not bad! Clearly, the ventral temporal cortex has some significant value in decoding the represenation of the various stimulus categories (and not just faces).