diff --git a/01 - Intro-ML-for-Imaging.ipynb b/01 - Intro-ML-for-Imaging.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..836ac06773953e51adebd7174168ae9a8173442d
--- /dev/null
+++ b/01 - Intro-ML-for-Imaging.ipynb
@@ -0,0 +1,732 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tutorial 1: Introduction to Machine Learning for Imaging"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on Colab"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! wget https://www.doc.ic.ac.uk/~bglocker/teaching/notebooks/supervised-data.zip\n",
+ "! unzip supervised-data.zip\n",
+ "\n",
+ "# data directory\n",
+ "data_dir = 'data/mnist/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on DoC lab machines"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data directory\n",
+ "data_dir = '/vol/lab/course/416/data/mnist/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Image classification with Python"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this tutorial, we will learn basics of image IO and simple processing, and visualisation in Python. \n",
+ "If you want to refresh your python basics, please check this [tutorial](http://cs231n.github.io/python-numpy-tutorial/) from the computer vision course at Stanford.\n",
+ "\n",
+ "By the end of the tutorial, you should be able to:\n",
+ "1. Use python, numpy, and run jupyter notebook\n",
+ "2. Build a simple binary classifier \n",
+ "3. Implement a logistic regression classifier using numpy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "### Import stuff and set up some helper functions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# import common libraries\n",
+ "import numpy as np\n",
+ "\n",
+ "# adjust settings to plot nice figures inline\n",
+ "%matplotlib inline\n",
+ "import matplotlib\n",
+ "import matplotlib.pyplot as plt\n",
+ "plt.rcParams['axes.labelsize'] = 14\n",
+ "plt.rcParams['xtick.labelsize'] = 12\n",
+ "plt.rcParams['ytick.labelsize'] = 12"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import matplotlib\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "#########################################################\n",
+ "# functions to plot digits\n",
+ "#########################################################\n",
+ "\n",
+ "def plot_digit(data):\n",
+ " image = data.reshape(28, 28)\n",
+ " plt.imshow(image, cmap = matplotlib.cm.gray,\n",
+ " interpolation=\"nearest\")\n",
+ " plt.colorbar()\n",
+ " # plt.axis(\"off\")\n",
+ "\n",
+ "\n",
+ "def plot_digits(data, n_samples_row=10):\n",
+ " images = [image.reshape(28,28) for image in data]\n",
+ " n_rows = (len(images) - 1) // n_samples_row + 1\n",
+ " # append empty images if the last row is not complete\n",
+ " empty_images = n_rows * n_samples_row - len(data)\n",
+ " images.append(np.zeros((28, 28 * empty_images)))\n",
+ " # draw row by row\n",
+ " images_row = []\n",
+ " for current_row in range(n_rows):\n",
+ " tmp_row_images = images[current_row * n_samples_row : (current_row + 1) * n_samples_row]\n",
+ " images_row.append(np.concatenate(tmp_row_images, axis=1))\n",
+ " # draw all in one image\n",
+ " image = np.concatenate(images_row, axis=0)\n",
+ " plt.figure(figsize=(n_samples_row,n_rows))\n",
+ " plt.imshow(image, cmap = matplotlib.cm.gray)\n",
+ " plt.colorbar()\n",
+ " # plt.axis(\"off\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "\n",
+ "## MNIST digit recognition\n",
+ "\n",
+ "In a real ML task, data would be available in a database and organised in tables, documents or files. In this tutorial, we will be using the [MNIST dataset](http://yann.lecun.com/exdb/mnist/), small images of digits handwritten by high school students and employees of the US Census Bureau. It consists of a training set of 60,000 examples, and a test set of 10,000 examples. Each image is size-normalized and centered in a fixed-size image 28x28 pixels, and labeled with the digit it represents. It is kind of the *hello world* of machine learning for imaging. You can find more benchmark datasets [here](https://pytorch.org/docs/stable/torchvision/datasets.html)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import torchvision.datasets as dset\n",
+ "\n",
+ "# train data\n",
+ "train_set = dset.MNIST(root=data_dir, train=True, download=True)\n",
+ "train_data = np.array(train_set.train_data)\n",
+ "train_labels = np.array(train_set.train_labels)\n",
+ "\n",
+ "# test data\n",
+ "test_set = dset.MNIST(root=data_dir, train=False, download=True)\n",
+ "test_data = np.array(test_set.test_data)\n",
+ "test_labels = np.array(test_set.test_labels)\\\n",
+ "\n",
+ "# print train and test data details\n",
+ "print('Train data:')\n",
+ "print('shape (images, x,y) = {}'.format(train_data.shape))\n",
+ "print('labels = {}'.format(np.unique(train_labels)))\n",
+ "\n",
+ "print('Test data:')\n",
+ "print('shape (images, x,y) = {}'.format(test_data.shape))\n",
+ "print('labels = {}'.format(np.unique(test_labels)))\n",
+ "\n",
+ "\n",
+ "# plot sample digits\n",
+ "plot_digits(train_set.train_data[:100])\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "Here, we will sort our data and fix the random seed to ensure geting same results everytime you run the experiments. Then plot some sampled digits after sorting the data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# we will sort our data and fix the random generator seed to get similar results from different runs\n",
+ "np.random.seed(42)\n",
+ "\n",
+ "# sort dataset\n",
+ "def sort_data(data, labels):\n",
+ " sorted_idxs = np.array(sorted([(target, i) for i, target in enumerate(labels)]))[:, 1]\n",
+ " return data[sorted_idxs], labels[sorted_idxs]\n",
+ "\n",
+ "############################################################################\n",
+ "# Q: use the previous function to sort both training and testing data\n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE\n",
+ "\n",
+ "############################################################################\n",
+ "\n",
+ "# plot sampled images from sorted data\n",
+ "# here it samples 20 samples of [0,1], 30 samples of [2,3,4], and 50 samples of [5,6,7,8,9] - 10 samples for each digit\n",
+ "example_images = np.r_[train_data[:12000:600], train_data[13000:30600:600], train_data[30600:60000:590]]\n",
+ "plot_digits(example_images)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "### Simple Binary Classifier\n",
+ "\n",
+ "Now our data are cleaned and sorted, we will train a simple binary classifier to distinguish between two selected digits. \n",
+ "\n",
+ "Data usually is divided into three sets for training, validation, and testing. The training data is used to train the model's parameters, while the validation set is used to adjust the model's hyperparameters. Finally, the performance of the trained model is evaluated on the testing data. For this tutorial we will split the data into train and test for simplification. \n",
+ "\n",
+ "**Task**\n",
+ "\n",
+ "1. Extract ones and eights from both training and testing data\n",
+ "2. Shuffle training data\n",
+ "3. Plot number of images versus number of 'white' pixels per image\n",
+ "4. Can you predict the label based only on the number of 'white' pixels? What's the training and testing error for such an approach?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Extract sample digits of ones and eights\n",
+ "############################################################################\n",
+ "\n",
+ "def sample_data_digits(data, labels, labels_to_select):\n",
+ " # convert input 3d arrays to 2d arrays\n",
+ " nsamples, nx, ny = data.shape\n",
+ " data_vec = np.reshape(data,(nsamples,nx*ny))\n",
+ " \n",
+ " selected_indexes = np.isin(labels, labels_to_select)\n",
+ " selected_data = data_vec[selected_indexes]\n",
+ " selected_labels = labels[selected_indexes]\n",
+ " \n",
+ " # Convert images from gray to binary by thresholding intensity values\n",
+ " selected_data = 1.0 * (selected_data >= 128)\n",
+ "\n",
+ " # convert labels to binary: digit_0=False, digit_1=True\n",
+ " selected_labels = selected_labels==labels_to_select[1]\n",
+ " \n",
+ " # shuffle data\n",
+ " shuffle_index = np.random.permutation(len(selected_labels))\n",
+ " selected_data, selected_labels = selected_data[shuffle_index], selected_labels[shuffle_index]\n",
+ "\n",
+ " return selected_data, selected_labels\n",
+ "\n",
+ "\n",
+ "############################################################################\n",
+ "# Q: extract ones and eights digits from both training and testing data \n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE\n",
+ "\n",
+ "############################################################################\n",
+ "\n",
+ "# plot sampled digits\n",
+ "plot_digits(selected_train_data[0:50])\n",
+ "plt.show()\n",
+ "plot_digits(selected_train_data[8000:8050])\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Q:plot number of images versus number of 'white' foreground pixels \n",
+ "# for both 1s and 8s classes.\n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Q: select threshold value to sperate between the two classes\n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Q: classify digits using a threshold \n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Q: calculate both training and testing accuracy\n",
+ "# You should get accuracies around 89-90%\n",
+ "############################################################################\n",
+ "\n",
+ "train_acc = #INSERT CODE HERE\n",
+ "print('Train accuracy = {:.2f}%'.format(train_acc))\n",
+ "\n",
+ "test_acc = #INSERT CODE HERE\n",
+ "print('Test accuracy = {:.2f}%'.format(test_acc))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "**Task**\n",
+ "\n",
+ "Repeat the previous examples to classify digits 0s and 8s instead of 1s and 8s. Will the threshold binary classifier differentiate between the two categories based on number of 'white' pixels?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Q: extract zeros and eights digits from both training and testing data\n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE\n",
+ "\n",
+ "############################################################\n",
+ "# Q: plot number of images versus number of pixels\n",
+ "############################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "############################################################################\n",
+ "# Q: select threshold value to sperate between the two classes\n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE\n",
+ "\n",
+ "############################################################################\n",
+ "# Q: classify digits using a threshold \n",
+ "############################################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_acc = #INSERT CODE HERE\n",
+ "print('Train accuracy = {:.2f}%'.format(train_acc))\n",
+ "\n",
+ "test_acc = #INSERT CODE HERE\n",
+ "print('Test accuracy = {:.2f}%'.format(test_acc))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Logistic Regression using Numpy\n",
+ "\n",
+ "In the previous example, we used a simple threshold to classify each image of a digit using one feature (number of 'white' pixels).\n",
+ "\n",
+ "Here, we will use a logistic regression model for the same task but using raw pixel information as input features. The logistic regression function is defined as: $h_{\\Theta}(\\mathbf{x}) = \\frac{1}{1 + \\exp(- \\Theta^{\\top} \\mathbf{x})}$.\n",
+ "\n",
+ "It's useful to group all training samples into one big matrix $\\mathbf{X}$ of size *(number_samples x number_features)*, and their labels into one vector $\\mathbf{y}$ as in the code below.\n",
+ "\n",
+ "Training our model is a loop that includes three main steps\n",
+ "1. Evaluate the cost function $J(\\Theta)$\n",
+ "2. Compute partial derivatives\n",
+ "3. Update the model paramteters\n",
+ "\n",
+ "---\n",
+ "\n",
+ "**Task**\n",
+ "\n",
+ "1. Complete the logistic regression class below \n",
+ "2. Train a logistic regression model on the data from the previous example\n",
+ "3. Compute train and test accuracies, and compare with the previous results\n",
+ "4. Plot the trained parameters and comment on the figure"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class LogisticRegression:\n",
+ " def __init__(self, lr=0.05, num_iter=1000, add_bias=True, verbose=True):\n",
+ " self.lr = lr\n",
+ " self.verbose = verbose\n",
+ " self.num_iter = num_iter\n",
+ " self.add_bias = add_bias\n",
+ " \n",
+ " def __add_bias(self, X):\n",
+ " bias = np.ones((X.shape[0], 1))\n",
+ " return np.concatenate((bias, X), axis=1)\n",
+ " \n",
+ "\n",
+ "\n",
+ " def __loss(self, h, y):\n",
+ " ''' computes loss values '''\n",
+ " y = np.array(y,dtype=float) \n",
+ " ############################################################################\n",
+ " # Q: compute the loss \n",
+ " ############################################################################\n",
+ " return #INSERT CODE HERE\n",
+ "\n",
+ " \n",
+ " def fit(self, X, y):\n",
+ " ''' \n",
+ " Optimise our model using gradient descent\n",
+ " Arguments:\n",
+ " X input features\n",
+ " y labels from training data\n",
+ " \n",
+ " '''\n",
+ " if self.add_bias:\n",
+ " X = self.__add_bias(X)\n",
+ " \n",
+ " ############################################################################\n",
+ " # Q: initialise weights randomly with normal distribution N(0,0.01)\n",
+ " ############################################################################\n",
+ " self.theta = #INSERT CODE HERE\n",
+ " \n",
+ " for i in range(self.num_iter):\n",
+ " ############################################################################\n",
+ " # Q: forward propagation\n",
+ " ############################################################################\n",
+ " z = #INSERT CODE HERE\n",
+ " h = #INSERT CODE HERE\n",
+ " ############################################################################\n",
+ " # Q: backward propagation\n",
+ " ############################################################################\n",
+ " gradient = #INSERT CODE HERE\n",
+ " # update parameters\n",
+ " self.theta -= #INSERT CODE HERE\n",
+ " ############################################################################\n",
+ " # Q: print loss\n",
+ " ############################################################################\n",
+ " if(self.verbose == True and i % 50 == 0):\n",
+ " h = #INSERT CODE HERE\n",
+ " print('loss: {} \\t'.format(self.__loss(h, y)))\n",
+ " \n",
+ " def predict_probs(self,X):\n",
+ " ''' returns output probabilities\n",
+ " '''\n",
+ " ############################################################################\n",
+ " # Q: forward propagation\n",
+ " ############################################################################\n",
+ " if self.add_bias:\n",
+ " X = self.__add_bias(X)\n",
+ " z = #INSERT CODE HERE\n",
+ " return #INSERT CODE HERE\n",
+ "\n",
+ " def predict(self, X, threshold=0.5):\n",
+ " ''' returns output classes\n",
+ " '''\n",
+ " return self.predict_probs(X) >= threshold\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#########################################################################\n",
+ "# Q: train our model\n",
+ "#########################################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#########################################################################\n",
+ "# Q: Evaluate the trained model - compute train and test accuracies\n",
+ "# You should get accuracies around 98-99%\n",
+ "#########################################################################\n",
+ "train_preds = #INSERT CODE HERE\n",
+ "logistic_train_acc = #INSERT CODE HERE\n",
+ "print('Train accuracy = {:.2f}%'.format(logistic_train_acc))\n",
+ "\n",
+ "test_preds = #INSERT CODE HERE\n",
+ "logistic_test_acc = #INSERT CODE HERE\n",
+ "print('Test accuracy = {:.2f}%'.format(logistic_test_acc))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#########################################################################\n",
+ "# Plot trained model params (weights) as an image of size (28x28)\n",
+ "#########################################################################\n",
+ "plt.imshow(model.theta[:-1].reshape(28,28))\n",
+ "plt.colorbar()\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "## Using explicit features for classification\n",
+ "\n",
+ "We have now seen how we can build a digit classifier using the raw pixel information as features. In some ML applications, it is possible (or even desired) to hand engineer the feature extraction stage. Here, we are exploring how far we can get with morphometric features extracted for MNIST digits, namely the area, length, thickness, slant, width, height."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import pandas as pd\n",
+ "\n",
+ "# Reload MNIST\n",
+ "train_data = np.array(train_set.train_data)\n",
+ "train_labels = np.array(train_set.train_labels)\n",
+ "\n",
+ "# test data\n",
+ "test_data = np.array(test_set.test_data)\n",
+ "test_labels = np.array(test_set.test_labels)\n",
+ "\n",
+ "# Read the meta data using pandas\n",
+ "train_features = pd.read_csv(data_dir + 'train-morpho.csv')\n",
+ "test_features = pd.read_csv(data_dir + 't10k-morpho.csv')\n",
+ "train_features.head() # show the first five data entries of the training set"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.scatter(test_features['area'],test_features['length'], marker='.', c=test_labels)\n",
+ "plt.grid()\n",
+ "plt.xlabel('feature i')\n",
+ "plt.ylabel('feature j')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Reformat the data\n",
+ "X_train = np.transpose(np.array([train_features['area'].values,train_features['length'].values,train_features['thickness'].values,train_features['slant'].values,train_features['width'].values,train_features['height'].values]))\n",
+ "X_test = np.transpose(np.array([test_features['area'].values,test_features['length'].values,test_features['thickness'].values,test_features['slant'].values,test_features['width'].values,test_features['height'].values]))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def sample_data(data, labels, labels_to_select):\n",
+ " selected_indexes = np.isin(labels, labels_to_select)\n",
+ " selected_data = data[selected_indexes]\n",
+ " selected_labels = labels[selected_indexes]\n",
+ "\n",
+ " # convert labels to binary: digit_0=False, digit_1=True\n",
+ " selected_labels = selected_labels==labels_to_select[1]\n",
+ " \n",
+ " # shuffle data\n",
+ " shuffle_index = np.random.permutation(len(selected_labels))\n",
+ " selected_data, selected_labels = selected_data[shuffle_index], selected_labels[shuffle_index]\n",
+ "\n",
+ " return selected_data, selected_labels"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Similar to above, let's pick data for a simple binary classification between two digits. Let's start with 0s and 8s."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "labels_to_select = [0,8]\n",
+ "selected_train_data, selected_train_labels = sample_data(X_train,train_labels,labels_to_select)\n",
+ "selected_test_data, selected_test_labels = sample_data(X_test,test_labels,labels_to_select)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "**Task**\n",
+ "\n",
+ "This time we use a logistic regression model from [scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).\n",
+ "\n",
+ "Train the logistic regression on the data and calculate the classification accuracy for both training and testing."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn import linear_model\n",
+ "\n",
+ "#########################################################################\n",
+ "# Q: Use scikit-learn's logistic regression model\n",
+ "#########################################################################\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#########################################################################\n",
+ "train_preds = model.predict(selected_train_data)\n",
+ "logistic_train_acc = #INSERT CODE HERE\n",
+ "print('Train accuracy = {:.2f}%'.format(logistic_train_acc))\n",
+ "\n",
+ "test_preds = model.predict(selected_test_data)\n",
+ "logistic_test_acc = #INSERT CODE HERE\n",
+ "print('Test accuracy = {:.2f}%'.format(logistic_test_acc))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "---\n",
+ "***Bonus (optional)***\n",
+ "\n",
+ "Extend all of the above to the full 10-class classification problem.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/02 - Intro-Medical-Image-Computing.ipynb b/02 - Intro-Medical-Image-Computing.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..6983623808d7c2da4c8f6ef2ccb5d5f4aadca3a9
--- /dev/null
+++ b/02 - Intro-Medical-Image-Computing.ipynb
@@ -0,0 +1,1715 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tutorial 2: Introduction to Medical Image Computing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on Colab"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! pip install SimpleITK==1.2.2 \n",
+ "! wget https://www.doc.ic.ac.uk/~bglocker/teaching/notebooks/mic-data.zip\n",
+ "! unzip mic-data.zip\n",
+ "\n",
+ "# data directory\n",
+ "data_dir = \"data/mic/\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on DoC lab machines"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data directory\n",
+ "data_dir = '/vol/lab/course/416/data/mic/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "print(os.listdir(data_dir))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Loading a 3D medical image\n",
+ "\n",
+ "We will be using a library called SimpleITK to handle medical image files. It is a simplified interface around the Insight Segmentation and Registration Toolkit (ITK), one of the most popular C++ image processing libraries for medical imaging."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import SimpleITK as sitk"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We will use it primarily for reading and saving medical volumes encoded in NIfTI format, with the functions `sitk.ReadImage()` and `sitk.WriteImage(, )`, respectively.\n",
+ "\n",
+ "**Task:** Try loading the image `\"ct-brain.nii.gz\"` in our data directory `data_dir`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img = sitk.ReadImage(data_dir + \"ct-brain.nii.gz\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Explore image information\n",
+ "\n",
+ "Medical image files typically store raw volumetric data as a three-dimensional array. In addition, formats such as NIfTI and DICOM include a header containing a wealth of meta-information, such as:\n",
+ "- size: number of voxels in each dimension\n",
+ "- resolution/spacing: physical size of a voxel (e.g. 1mm x 1mm x 1mm)\n",
+ "- data type: e.g. `int32`, `uint8`, `float64`, vectors (+ number of components)\n",
+ "- scanner's origin and direction of coordinate axes\n",
+ "- voxel coordinate transformation matrices\n",
+ "- ... and [much more](https://brainder.org/2012/09/23/the-nifti-file-format/).\n",
+ "\n",
+ "**Task:** Print the SimpleITK image object to see a summary of the loaded meta-information:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "# Print image object"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(img)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SimpleITK also allows us to access each field directly.\n",
+ "\n",
+ "**Task:** Let us have a look at the size and spacing of this image, with the methods `.GetSize()` and `.GetSpacing()`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "# Print image size and spacing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(img.GetSize())\n",
+ "print(img.GetSpacing())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Although SimpleITK offers great functionality for manipulating the raw image data, we will often need to convert it to a NumPy array so it plays well with other Python packages, such as Matplotlib, which we will use for visualisation. This is done with the function `sitk.GetArrayFromImage()` (and vice-versa with `sitk.GetImageFromArray()`).\n",
+ "\n",
+ "**Task:** Convert the SimpleITK image to a NumPy array"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_array = sitk.GetArrayFromImage(img) # Convert the SimpleITK image to a NumPy array\n",
+ "print(img_array.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you are curious about what else SimpleITK Image class offers, [this](http://simpleitk.github.io/SimpleITK-Notebooks/01_Image_Basics.html) is a good place to look. Additionally, you can run Python's `help` command on your image object to see all the available methods."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Visualisation\n",
+ "\n",
+ "The NumPy array you obtained from SimpleITK is three-dimensional, meaning we cannot visualise it directly. Fortunately, NumPy allows you to access entire 2D slices at a time, with `img_array[,:,:]`, `img_array[:,,:]` or `img_array[:,:,]`.\n",
+ "\n",
+ "**Task:** Try printing a slice of your choice:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Print an image slice"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(img_array[100])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As expected, we get a large two-dimensional array filled with numbers we cannot directly interpret.\n",
+ "\n",
+ "To graphically display volume slices, we will use a library called Matplotlib. It is the most widely-used Python plotting library, and offers, beside more advanced functionality, a simple command-based interface, similar to Matlab's."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To display images, Matplotlib offers the function `plt.imshow()`. It has [many options](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow), including which colour map to use to visualise the given data (`cmap`). We will typically want to visualise medical images in greyscale (`cmap='gray'`), but feel free to play with [any of the ones available](http://matplotlib.org/examples/color/colormaps_reference.html).\n",
+ "\n",
+ "By default, Matplotlib will place the origin of the y-axis at the top, increasing downwards. We can reverse it with the option `origin='lower'`.\n",
+ "\n",
+ "**Task:** Investigate how to visualise axial (xy), coronal (xz) and sagittal (yz) slices according to the radiological convention."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display image slices"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.imshow(img_array[100,:,:], cmap='gray');"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you tried coronal (xz) or sagittal (yz) slices, you will have noticed that the aspect ratio was wrong: the image seems 'squashed' in one dimension. This is because the resolution of this scan is *anisotropic*, i.e. the voxels do not have the same length in all directions.\n",
+ "\n",
+ "`plt.imshow` has an option that lets you rescale the axes: `extent=(, , , )`.\n",
+ "\n",
+ "**Task:** Using the `GetSize` and `GetSpacing` methods discussed earlier (note that SimpleITK's indexing convention is XYZ, whereas NumPy's is ZYX), try to set the display axes to the correct scale, in millimetres."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Obtain the pysical dimensions of the image, in millimetres\n",
+ "size = img.GetSize()\n",
+ "spacing = img.GetSpacing()\n",
+ "\n",
+ "width = size[0] * spacing[0]\n",
+ "height = size[1] * spacing[1]\n",
+ "depth = size[2] * spacing[2]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display axial slice with the correct dimensions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.imshow(img_array[100,:,:], cmap='gray', extent=(0, width, height, 0));"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display coronal slice with the correct dimensions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.imshow(img_array[:,180,:], origin='lower', cmap='gray', extent=(0, width, 0, depth));"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display sagittal slice with the correct dimensions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.imshow(img_array[:,:,150], origin='lower', cmap='gray', extent=(0, height, 0, depth));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Image statistics\n",
+ "\n",
+ "Since the image data is numerical, it often makes sense to look at some of its statistics. Many basic statistics are readily available in NumPy, e.g. `np.min`, `np.max`, `np.mean`, `np.std` (standard deviation), `np.percentile` etc.\n",
+ "\n",
+ "**Task:** Have a look at the minimum, maximum and mean values of your image array."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "# Print minimum, maximum and mean of image array"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(np.min(img_array), np.max(img_array), np.mean(img_array))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To get a better idea of the distribution of intensities, it is helpful to analyse the image *histogram*. It is a simple bar plot expressing the frequencies of pixel intensity values found in the data array, in regularly-spaced intervals ('bins').\n",
+ "\n",
+ "**Task:** Use Matplotlib's `plt.hist` function to display the distribution of intensities in your image:\n",
+ "```\n",
+ "plt.hist(.flatten(), bins=, normed=True)\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Plot the image histogram with values for the number of bins, e.g, 32, 64, 128, 256"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.hist(img_array.flatten(), bins=128, density=True);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You will notice two main peaks in the histogram with the left corresponding to values between -1000 and -500. This is the \"background\" area of the image corresponding to non-tissue or air. This area often takes a significant part of the image domain.\n",
+ "\n",
+ "**Task:** Try plotting the histogram again by excluding pixels from the background region, and once more plotting only values close to the second peak (in the range [-500,500]). Hint: use logical operators on the NumPy array."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.hist(img_array.flatten()[img_array.flatten() > -500], bins=256, density=True);"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.hist(img_array.flatten()[np.logical_and(img_array.flatten() > -500,img_array.flatten() < 500)], bins=256, density=True);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Window/Level\n",
+ "\n",
+ "In general, medical images have pixel intensity ranges beyond the typical 256 grayscales (8-bit) that we know from conventional digital images. The loaded CT scan, for example, has a range of about [-1000,2000].\n",
+ "\n",
+ "By default, `plt.imshow` will display the entire intensity range, mapping the minimum and maximum values in the array to the extremes of the colour scale. However, we can manually specify the display range, setting `clim=(, )` or independently `vmin=` and/or `vmax=`.\n",
+ "\n",
+ "**Task:** Using the concept of window/level, think about how to calculate parameters `clim=(, )` given a particular set of values for window and level. Also think about formulae for calculating the window and level that capture the full range of image intensities."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Calculate parameters low and high from window and level\n",
+ "def wl_to_lh(window, level):\n",
+ " low = level - window/2\n",
+ " high = level + window/2\n",
+ " return low,high\n",
+ "\n",
+ "print(wl_to_lh(160,70))\n",
+ "print(wl_to_lh(2000,300))\n",
+ "print(wl_to_lh(350,50))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Explore displaying the loaded brain CT scan `\"ct-brain.nii.gz\"` with different window/level settings. For example, using a window = 120 and level = 40 should give a good setting for displaying brain tissue. Other useful settings for CT images are a window = 2000 and level = 300 for good bone contrast, or window = 350 and level = 50 for abdominal organs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display image slices with different window/level settings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "low,high = wl_to_lh(120,40)\n",
+ "plt.imshow(img_array[100,:,:], cmap='gray', clim=(low, high));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Try also some of the other images provided in the data folder, such as `\"ct-abdomen.nii.gz\"` for an abdominal CT scan showing organs such as the liver or kidneys, or `\"mri-brain.nii.gz\"` for an example of a brain MRI scan. Try to find a good window/level setting for the MRI scan that shows good contrast for gray and white matter tissue (Hint: z-slice 130 shows a good cross-section through the ventricles)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "# Display other images with different window/level settings\n",
+ "print(os.listdir(data_dir))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_mri = sitk.ReadImage(data_dir + 'mri-brain.nii.gz')\n",
+ "\n",
+ "low,high = wl_to_lh(300,200)\n",
+ "plt.imshow(sitk.GetArrayFromImage(img_mri)[130,:,:], cmap='gray', clim=(low, high));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Multiplanar Image Viewer\n",
+ "\n",
+ "**Task:** Now try writing a function to visualise an arbitrary medical volume, based on what you have done so far. It should take as input a SimpleITK image object (`img`), the slice indices (`x,y,z`) and window/level parameters (`window,level`), and display the specified axial (`z`), coronal (`y`) and sagittal (`x`) slices in grayscale and with correct axis dimensions and window/level contrast.\n",
+ "\n",
+ "Note: If (`x,y,z`) are not specified (`=None`), the function should display the centre slices. If (`window,level`) are not specified, the function should calculate the window/level setting to cover the full intensity range."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def display_image(img, x=None, y=None, z=None, window=None, level=None):\n",
+ " # Convert SimpleITK image to NumPy array\n",
+ " img_array = sitk.GetArrayFromImage(img)\n",
+ " \n",
+ " # Get image dimensions in millimetres\n",
+ " size = img.GetSize()\n",
+ " spacing = img.GetSpacing()\n",
+ " width = size[0] * spacing[0]\n",
+ " height = size[1] * spacing[1]\n",
+ " depth = size[2] * spacing[2]\n",
+ " \n",
+ " if x is None:\n",
+ " x = np.floor(size[0]/2).astype(int)\n",
+ " if y is None:\n",
+ " y = np.floor(size[1]/2).astype(int)\n",
+ " if z is None:\n",
+ " z = np.floor(size[2]/2).astype(int)\n",
+ " \n",
+ " if window is None:\n",
+ " window = np.max(img_array) - np.min(img_array)\n",
+ " \n",
+ " if level is None:\n",
+ " level = window / 2 + np.min(img_array)\n",
+ " \n",
+ " low,high = wl_to_lh(window,level)\n",
+ "\n",
+ " # Display the orthogonal slices\n",
+ " fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(10, 4))\n",
+ "\n",
+ " ax1.imshow(img_array[z,:,:], cmap='gray', clim=(low, high), extent=(0, width, height, 0))\n",
+ " ax2.imshow(img_array[:,y,:], origin='lower', cmap='gray', clim=(low, high), extent=(0, width, 0, depth))\n",
+ " ax3.imshow(img_array[:,:,x], origin='lower', cmap='gray', clim=(low, high), extent=(0, height, 0, depth))\n",
+ "\n",
+ " # Additionally display crosshairs\n",
+ " ax1.axhline(y * spacing[1], lw=1)\n",
+ " ax1.axvline(x * spacing[0], lw=1)\n",
+ " \n",
+ " ax2.axhline(z * spacing[2], lw=1)\n",
+ " ax2.axvline(x * spacing[0], lw=1)\n",
+ " \n",
+ " ax3.axhline(z * spacing[2], lw=1)\n",
+ " ax3.axvline(y * spacing[1], lw=1)\n",
+ "\n",
+ " plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The code below should give you an interactive way of displaying 3D medical images based on your `display_image` function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": false
+ },
+ "outputs": [],
+ "source": [
+ "from ipywidgets import interact, fixed\n",
+ "from IPython.display import display\n",
+ "\n",
+ "def interactive_view(img):\n",
+ " size = img.GetSize()\n",
+ " img_array = sitk.GetArrayFromImage(img)\n",
+ " interact(display_image,img=fixed(img),\n",
+ " x=(0, size[0] - 1),\n",
+ " y=(0, size[1] - 1),\n",
+ " z=(0, size[2] - 1),\n",
+ " window=(0,np.max(img_array) - np.min(img_array)),\n",
+ " level=(np.min(img_array),np.max(img_array)));\n",
+ "\n",
+ "interactive_view(img)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Accessing voxel values\n",
+ "\n",
+ "In Python, we can access a single element of a sequence with an integer index (`[index]`), or we can retrieve a contiguous sub-sequence with so-called *slice notation*: `[start:stop]`. Either or both arguments can be left blank: `[start:]` (`stop` defaults to the end), `[:stop]` (`start` defaults to `0`) or `[:]` (entire sequence).\n",
+ "\n",
+ "A multi-dimensional array (such as our CT volume) can be indexed simultaneously in all dimensions, allowing us to access not only 2D slices, but also lines, rectangles, cuboids or individual voxels. A complete reference for NumPy array indexing can be found [here](https://docs.scipy.org/doc/numpy-1.11.0/reference/arrays.indexing.html).\n",
+ "\n",
+ "**Task:** Explore the indexing options for extracting 1D, 2D and 3D sub-arrays, and check the resulting shapes, e.g. `img_array[50, :, 100:120].shape`, comparing with the shape of the original array:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(img_array.shape)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Print the shape of indexed sub-arrays\n",
+ "print(img_array[50, :, 100:120].shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Indexing is not just useful for reading data in an array, we can also set values in entire regions without having to explicitly write iterative loops.\n",
+ "\n",
+ "**Task:** Select any 2D slice, try setting a rectangular region to an arbitrary value and visualise the result with `plt.imshow`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Extract a 2D slice\n",
+ "# Set a rectangular region to a constant (e.g. 0)\n",
+ "# Visualise the result with plt.imshow\n",
+ "\n",
+ "img = sitk.ReadImage(data_dir + 'mri-brain.nii.gz')\n",
+ "slice = sitk.GetArrayFromImage(img)[120,:,:];\n",
+ "slice[100:150,50:150] = 400\n",
+ "plt.imshow(slice, cmap='gray')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Image arithmetic\n",
+ "\n",
+ "Beside simply computing statistics, we can also perform arithmetic operations with image intensities, such as multiplication by scalars, contrast adjustment or any pointwise function (e.g. [gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) also known as Power-Law Transform.\n",
+ "\n",
+ "**Task:** Load the image `\"mri-brain-t1-contrast.nii.gz\"` and visualise slices (`x=105, y=170, z=95`) with window/level (`window=800, level=400`) using your `display_image` function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img1 = sitk.ReadImage(data_dir + 'mri-brain-t1-contrast.nii.gz')\n",
+ "display_image(img1, 105, 170, 95, 800, 400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Implement a function for performing gamma correction, and apply it to the `\"mri-brain-t1-contrast.nii.gz\"` image (e.g., using (`c=10, gamma=3`)).\n",
+ "\n",
+ "Note, for gamma correction you should first normalise the image intensity range to [0,1], apply gamma correction, and transform back to the original range. Some operations require the image data type to be floating point (`float`). NumPy arrays can be easily converted using `astype()`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def gamma_correction(img, c, gamma):\n",
+ " img_array = sitk.GetArrayFromImage(img).astype(float)\n",
+ " min_value = np.min(img_array)\n",
+ " max_value = np.max(img_array) \n",
+ " img_array = (img_array - min_value) / (max_value - min_value)\n",
+ " img_array = c * np.power(img_array,gamma)\n",
+ " img_array = img_array * (max_value - min_value) + min_value\n",
+ " return sitk.GetImageFromArray(img_array) "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img1_corrected = gamma_correction(img1, 10, 3)\n",
+ "display_image(img1_corrected, 105, 170, 95, 800, 400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For the image above, an intravenous contrast agent has been used to highlight blood and tumourous tissue. In this case, we also have a corresponding brain scan taken before the contrast agent has been administered.\n",
+ "\n",
+ "**Task:** Load the image `\"mri-brain-t1.nii.gz\"` and compare the two MRI scans with and without contrast using your `display_image` function. Hint: use the same window/level for both scans so you can better see the differences."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img2 = sitk.ReadImage(data_dir + 'mri-brain-t1.nii.gz')\n",
+ "\n",
+ "display_image(img1, 105, 170, 95, 800, 400)\n",
+ "display_image(img2, 105, 170, 95, 800, 400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Note that these two images of the same patient are registered (i.e. spatially aligned).\n",
+ "\n",
+ "**Task:** Now try displaying `(img1-img2)` (SimpleITK supports arithmetics with image objects) to see what was highlighted by the contrast agent. Hint: you might need to adjust the window/level for better visibility."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display the difference image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_image(img1-img2, 105, 170, 95, 200, 100)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Try the same with gamma corrected version of the contrast image. Note, you might need to convert the data type of the non-contrast image to be able to subtract it from the gamma corrected image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Display the difference image of the gamma corrected images"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img2_corrected = gamma_correction(img2, 1, 1)\n",
+ "display_image(img1_corrected-img2_corrected, 105, 170, 95, 200, 100)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Intensity normalisation\n",
+ "\n",
+ "When performing statistical analysis or running machine learning methods on images, it is often useful to first *standardise* the intensities, i.e. make them have zero mean and unit variance. This is achieved by subtracting the mean and dividing by the standard deviation of the intensities. In mathematical terms, the standardised image $\\tilde{I}$ is computed as\n",
+ "$$\\tilde{I} = \\frac{I - \\mu}{\\sigma}, \\qquad\\qquad \\mu = \\frac{1}{N} \\sum_{n=1}^N I_n, \\quad \\sigma = \\sqrt{\\frac{1}{N} \\sum_{n=1}^N (I_n - \\mu)^2},$$\n",
+ "where $I$ is the original image, $N$ is its total number of voxels and $I_n$ is the intensity of voxel $n$.\n",
+ "\n",
+ "**Task:** Try standardising the intensities of `img1`, using `np.mean` and `np.std`, and plot the resulting histogram:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "img1_array = sitk.GetArrayFromImage(img1) # Convert img1 to a NumPy array\n",
+ "img1_array = img1_array[img1_array > 0] # Exclude the background voxels, with intensity 0\n",
+ "\n",
+ "# img1_array is now an unstructured 'flat' array containing only the intensities of the brain voxels\n",
+ "\n",
+ "# Compute its mean and standard deviation\n",
+ "# Standardise the intensity array\n",
+ "# Plot the histogram before and after normalisation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.hist(img1_array, bins=100, density=True);"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.hist((img1_array-np.mean(img1_array)) / np.std(img1_array), bins=100, density=True);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Image enhancement\n",
+ "\n",
+ "We can also automatically enhance the contrast with [*histogram equalisation*](https://en.wikipedia.org/wiki/Histogram_equalization). It is a transformation applied to the image intensities which makes their distribution roughly uniform, i.e. all values occur approximately equally often in the image. Although it is not ideal for every application, the method is fast, simple to implement and useful for visualisation.\n",
+ "\n",
+ "Below we define a function, `hist_eq`, which equalises the histogram of a given array, taking as input also the desired number of histogram bins (defaults to the maximum). The resulting array will have its values distributed almost uniformly between the original minimum and maximum values. The helper function `hist_eq_img` does exactly the same but can be called directly on SimpleITK images.\n",
+ "\n",
+ "**Task:** Try to understand the individual steps of the implementation of `hist_eq`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def hist_eq(data, nbins=None):\n",
+ " shape = data.shape\n",
+ " data_min, data_max = data.min(), data.max()\n",
+ " data = data.flatten()\n",
+ "\n",
+ " # nbins defaults to the integer range of values\n",
+ " if nbins is None:\n",
+ " nbins = int(data_max - data_min)\n",
+ "\n",
+ " # Compute image histogram\n",
+ " counts, bins = np.histogram(data, bins=nbins)\n",
+ " \n",
+ " # Compute cumulative distribution function (CDF)\n",
+ " cdf = counts.cumsum() / counts.sum()\n",
+ " \n",
+ " # Use linear interpolation of CDF to find new intensity values\n",
+ " data_eq = np.interp(data, bins[:-1], (data_max - data_min) * cdf + data_min)\n",
+ " \n",
+ " return data_eq.reshape(shape)\n",
+ "\n",
+ "def hist_eq_img(img, nbins=None):\n",
+ " data = sitk.GetArrayFromImage(img)\n",
+ " data_eq = hist_eq(data, nbins)\n",
+ " img_eq = sitk.GetImageFromArray(data_eq)\n",
+ " img_eq.CopyInformation(img)\n",
+ " return img_eq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Now apply `hist_eq_img` to one of the SimpleITK images you have loaded and visualise it next to its original with your `display_image`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Apply histogram equalisation to an image\n",
+ "# Display original image\n",
+ "# Display equalised image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img = sitk.ReadImage(data_dir + 'ct-brain.nii.gz')\n",
+ "img_orig = img\n",
+ "img_eq = hist_eq_img(img_orig)\n",
+ "print(\"Before histogram equalisation:\")\n",
+ "display_image(img_orig)\n",
+ "print(\"After histogram equalisation:\")\n",
+ "display_image(img_eq)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The following plots compare the intensity distributions before and after histogram equalisation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data_orig = sitk.GetArrayFromImage(img_orig).flatten()\n",
+ "data_eq = sitk.GetArrayFromImage(img_eq).flatten()\n",
+ "\n",
+ "plt.figure(figsize=(10, 4))\n",
+ "plt.subplot(121)\n",
+ "plt.hist(data_orig, bins=128, density=True, histtype='step', cumulative=False)\n",
+ "plt.hist(data_eq, bins=128, density=True, histtype='step', cumulative=False)\n",
+ "plt.subplot(122)\n",
+ "plt.hist(data_orig, bins=128, density=True, histtype='step', cumulative=True)\n",
+ "plt.hist(data_eq, bins=128, density=True, histtype='step', cumulative=True);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Filtering with SimpleITK\n",
+ "\n",
+ "#### Smoothing\n",
+ "\n",
+ "Occasionally we will acquire medical scans which include some amount of undesired noise.\n",
+ "\n",
+ "One such noisy image might look like this:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "# We convert it to `float32` for compatibility with some functions we'll use later\n",
+ "img = sitk.Cast(sitk.ReadImage(data_dir + 'mri-brain-noisy.nii.gz'), sitk.sitkFloat32)\n",
+ "display_image(img)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "A basic denoising technique is to *convolve* the image with a smoothing filter.\n",
+ "\n",
+ "We can achieve this with SimpleITK using `sitk.DiscreteGaussian()` (it has a `variance` option, default `=1.0`).\n",
+ "\n",
+ "**Task:** Try applying a Gaussian filter to the loaded image. Try out different values for the `variance` parameter:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_gauss = sitk.DiscreteGaussian(img, 1)\n",
+ "display_image(img_gauss)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Comparing to the original noisy image, we can see that most of the visible noise is gone, but the image edges also lose their sharpness, in particular with larger values for the variance.\n",
+ "\n",
+ "Interestingly, Gaussian smoothing can be interpreted as *isotropic diffusion*, i.e. image intensities are 'diffused' (think heat conduction) homogeneously in all directions for a length of 'time' proportional to the variance of the Gaussian filter. Extending on this idea, another popular approach for denoising is *anisotropic diffusion*, which adjusts the local 'conductivity' based on the image gradients. In other words, it attempts to smooth out flat regions while preserving the edges.\n",
+ "\n",
+ "**Task:** Try out the SimpleITK function `sitk.GradientAnisotropicDiffusion()` (can take a few seconds). Play around with different values for the parameters of this filter:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_diffusion = sitk.GradientAnisotropicDiffusion(img)\n",
+ "display_image(img_diffusion)\n",
+ "#help(sitk.GradientAnisotropicDiffusion)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's visualise all three together. Pay attention to the overall noise level and the sharpness of the edges. You might want to adjust the window/level setting for better contrast:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_image(img, window=400, level=200)\n",
+ "display_image(img_gauss, window=400, level=200)\n",
+ "display_image(img_diffusion, window=400, level=200)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Now try the same two smoothing approaches with `'ct-brain-noisy.nii.gz'`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img2 = sitk.Cast(sitk.ReadImage(data_dir + 'ct-brain-noisy.nii.gz'), sitk.sitkFloat32)\n",
+ "display_image(img2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img2_gauss = sitk.DiscreteGaussian(img2)\n",
+ "display_image(img2_gauss)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img2_diffusion = sitk.GradientAnisotropicDiffusion(img2)\n",
+ "display_image(img2_diffusion)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Again, visualise all three (noisy, blurred and diffused) in the CT intensity range for gray/white matter (`window=120, level=40`) and compare noise levels and edge sharpness."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_image(img2, window=120, level=40)\n",
+ "display_image(img2_gauss, window=120, level=40)\n",
+ "display_image(img2_diffusion, window=120, level=40)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You will notice that each method gives better results for one image but worse for the other. Why do you think that is the case?\n",
+ "\n",
+ "*Hint:* Think of the magnitude of the noise compared to the amplitude of the true image variations. If the signal-to-noise ratio if low, the algorithm has no way of knowing what needs to be preserved or smoothed out."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Edge detection\n",
+ "\n",
+ "Another common application of image filtering is estimating image gradients for edge detection.\n",
+ "\n",
+ "Let us compute some spatial derivatives with `sitk.Derivative(, direction=)`, where `` is 0, 1 or 2 for X, Y or Z, respectively."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_dx = sitk.Derivative(img, direction=0)\n",
+ "img_dy = sitk.Derivative(img, direction=1)\n",
+ "img_dz = sitk.Derivative(img, direction=2)\n",
+ "display_image(img_dx, level=0)\n",
+ "display_image(img_dy, level=0)\n",
+ "display_image(img_dz, level=0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "When we compute the magnitude of the gradients of an image, we obtain what is called an *edge map*, which is simply a local measure of the 'strength' of an edge.\n",
+ "\n",
+ "This operation is readily available in SimpleITK with `sitk.SobelEdgeDetection()`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_sobel = sitk.SobelEdgeDetection(img)\n",
+ "display_image(img_sobel)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Note how the derivatives look quite 'grainy', as we are differentiating the superimposed noise as well.\n",
+ "\n",
+ "**Task**: How could you improve the edge detection on a noisy image?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# We run edge detection on the smoothed image\n",
+ "img_sobel2 = sitk.SobelEdgeDetection(img_gauss)\n",
+ "display_image(img_sobel2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Edge sharpening\n",
+ "\n",
+ "**Task:** Here we implement a `sharpen()` function which applies unsharp masking to an image. It uses `sitk.DiscreteGaussian`, as above, and takes the following arguments:\n",
+ "- `scale`: the standard deviation of the Gaussian filter\n",
+ "- `strength`: the scaling factor for the smoothed image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def sharpen(img, scale=1, strength=2): \n",
+ " # Apply unsharp masking\n",
+ " img_smooth = sitk.DiscreteGaussian(img, scale)\n",
+ " img_sharpened = img + (img - img_smooth) * strength\n",
+ " return img_sharpened\n",
+ "\n",
+ "img_sharp = sharpen(img_gauss, 1, 2)\n",
+ "display_image(img_gauss, window=400, level=200)\n",
+ "display_image(img_sharp, window=400, level=200)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Resampling\n",
+ "\n",
+ "Many image processing or computer vision algorithms adopt a multi-scale approach, for example making a coarse search over the entire image and then refining it at smaller scales. This can be achieved efficiently through the use of so-called *image pyramids*, which are formed by the base image and itself downsampled to lower resolutions, typically by a factor of 2 at each level.\n",
+ "\n",
+ "**Task:** Implement a naïve downsampling function, by simply taking one every `` (integer-valued) pixels in each dimension.\n",
+ "\n",
+ "*Hint:* SimpleITK image objects also support Python's indexing notation: `[start:stop:step]`, potentially omitting any of the arguments."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def downsample_naive(img, factor=2):\n",
+ " return img[::factor, ::factor, ::factor]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now let us test with the MRI volume from before:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img = sitk.ReadImage(data_dir + \"mri-brain-noisy.nii.gz\")\n",
+ "display_image(img)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_down_naive_1 = downsample_naive(img)\n",
+ "img_down_naive_2 = downsample_naive(img_down_naive_1)\n",
+ "img_down_naive_3 = downsample_naive(img_down_naive_2)\n",
+ "\n",
+ "display_image(img)\n",
+ "display_image(img_down_naive_1)\n",
+ "display_image(img_down_naive_2)\n",
+ "display_image(img_down_naive_3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "It is known that naïve undersampling can produce *aliasing*, i.e. exacerbate high spatial frequencies (edges and noise) in the downsampled image, making it look jagged and 'blocky'. We usually circumvent this issue by first smoothing the image with a low-pass filter (e.g. Gaussian) before resampling the pixel values.\n",
+ "\n",
+ "**Task:** Now try implementing a `downsample` method which first applies a Gaussian smoothing and then downsamples by an integer factor (no interpolation needed).\n",
+ "\n",
+ "*Hint:* Recall the `variance` option for `sitk.DiscreteGaussian`. A Gaussian standard deviation of `0.5*factor` works well in practice, but feel free to experiment!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def downsample(img, factor=2):\n",
+ " smoothed = sitk.DiscreteGaussian(img, (.5 * factor) ** 2) \n",
+ " return smoothed[::factor, ::factor, ::factor]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's have a look at the results for this approach:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_down_1 = downsample(img)\n",
+ "img_down_2 = downsample(img_down_1)\n",
+ "img_down_3 = downsample(img_down_2)\n",
+ "\n",
+ "display_image(img)\n",
+ "display_image(img_down_1)\n",
+ "display_image(img_down_2)\n",
+ "display_image(img_down_3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Resampling with SimpleITK"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "SimpleITK offers advanced resampling features that can be useful in many scenarios, e.g. when downsampling by non-integer factors or resampling to a specific resolution (e.g. isotropic).\n",
+ "\n",
+ "Have a look at the `resample()` function we have implemented below and try to understand the role of each of the arguments to the `sitk.ResampleImageFilter`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def resample(img, new_size=None, new_spacing=None):\n",
+ " old_size = img.GetSize()\n",
+ " old_spacing = img.GetSpacing()\n",
+ " \n",
+ " if new_size is None and new_spacing is None:\n",
+ " return img\n",
+ " \n",
+ " if new_size is None:\n",
+ " # Compute new image dimensions based on the desired rescaling of the voxel spacing\n",
+ " new_size = [int(np.ceil(old_size[d] * old_spacing[d] / new_spacing[d])) for d in range(3)]\n",
+ "\n",
+ " if new_spacing is None:\n",
+ " # Compute new voxel spacing based on the desired rescaling of the image dimensions\n",
+ " new_spacing = [old_spacing[d] * old_size[d] / new_size[d] for d in range(3)]\n",
+ "\n",
+ " # Smooth the input image with anisotropic Gaussian filter\n",
+ " img_smoothed = img\n",
+ " for d in range(3):\n",
+ " # Note how the blurring strength can be different in each direction,\n",
+ " # if the scaling factors are different.\n",
+ " factor = new_spacing[d] / old_spacing[d]\n",
+ " sigma = 0.2 * factor\n",
+ " img_smoothed = sitk.RecursiveGaussian(img_smoothed, sigma=sigma, direction=d)\n",
+ "\n",
+ " # Finally, apply the resampling operation\n",
+ " img_resampled = sitk.ResampleImageFilter().Execute(\n",
+ " img_smoothed, # Input image\n",
+ " new_size, # Output image dimensions\n",
+ " sitk.Transform(), # Coordinate transformation. sitk.Transform() is a dummy identity transform,\n",
+ " # as we want the brain to be in exactly the same place. When we do image registration,\n",
+ " # for example, this can be a linear or nonlinear transformation.\n",
+ " sitk.sitkLinear, # Interpolation method (cf. also sitk.sitkNearestNeighbor and many others)\n",
+ " img.GetOrigin(), # Output image origin (same)\n",
+ " new_spacing, # Output voxel spacing\n",
+ " img.GetDirection(), # Output image orientation (same)\n",
+ " 0, # Fill value for points outside the input domain\n",
+ " img.GetPixelID()) # Voxel data type (same)\n",
+ "\n",
+ " return img_resampled"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's resample the MR image to an element spacing of 2x4x8mm."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_resampled = resample(img, new_spacing=[2, 4, 8])\n",
+ "\n",
+ "print(\"Spacing: {}\".format(img.GetSpacing()))\n",
+ "display_image(img, window=400, level=200)\n",
+ "\n",
+ "print(\"Spacing: {}\".format(img_resampled.GetSpacing()))\n",
+ "display_image(img_resampled, window=400, level=200)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Medical imaging data is often anisotropic. For many image analysis algorithms, however, it is easier to work with isotropic input data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img3 = sitk.ReadImage(data_dir + 'mri-brain-anisotropic.nii.gz')\n",
+ "display_image(img3, z=10, window=800, level=400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task:** Try resampling the above image to an isotropic 1mm resolution, and display the result showing approximately the same xy-plane by setting an appropriate value for the `z` parameter in the `display_image` function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img3_resampled = resample(img3, new_spacing=[1, 1, 1])\n",
+ "\n",
+ "print(\"Spacing: {}\".format(img3.GetSpacing()))\n",
+ "display_image(img3, z=10, window=800, level=400)\n",
+ "\n",
+ "print(\"Spacing: {}\".format(img3_resampled.GetSpacing()))\n",
+ "display_image(img3_resampled, z=50, window=800, level=400)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Segmentation\n",
+ "\n",
+ "First let us define some helper functions to overlay an image with a segmentation, for good visualisation. `display_overlay()` takes as input the base image, a segmentation image (binary) and all the usual arguments for `display_image()`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def label_overlay(img, seg):\n",
+ " minmax = sitk.MinimumMaximumImageFilter()\n",
+ " minmax.Execute(img)\n",
+ " low, high = minmax.GetMinimum(), minmax.GetMaximum()\n",
+ " img_norm = (img - low) / (high - low)\n",
+ " img_uint8 = sitk.Cast(256 * img_norm, sitk.sitkUInt8)\n",
+ " return sitk.LabelOverlay(img_uint8, seg)\n",
+ "\n",
+ "def display_overlay(img, seg, *args, **kwargs):\n",
+ " display_image(label_overlay(img, seg), *args, **kwargs)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "#### Thresholding\n",
+ "\n",
+ "The simplest form of segmentation is just based on a single cutoff in the intensities, with a manually specified threshold. Alternatively, we can specify an upper (U) and a lower (L) thresholds.\n",
+ "\n",
+ "**Task:** Try to find a good UL thresholding to segment the lesion in the noisy CT scan.\n",
+ "\n",
+ "*Hint 1:* SimpleITK images support comparison (`<`, `>`, `<=`, `>=`, `==`, `!=`) and logical ('and' `&`, 'or' `|`, 'xor' `^`, 'not' `~`) operators to produce binary images.\n",
+ "\n",
+ "*Hint 2:* Image noise causes major problems for thresholding approaches. Try removing noise before thresholding the image and compare the results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img = sitk.ReadImage(data_dir + 'ct-brain-noisy.nii.gz')\n",
+ "display_image(img, x=70, y=100, z=90, window=120, level=40)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "seg = (img > 50) & (img < 100)\n",
+ "display_overlay(img, seg, x=70, y=100, z=90)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_gauss = sitk.DiscreteGaussian(img)\n",
+ "seg = (img_gauss > 50) & (img_gauss < 100)\n",
+ "display_overlay(img_gauss, seg, x=70, y=100, z=90)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Here we can observe one of the main shortcomings of purely intensity-based segmentation methods: we have no control over the location or spatial contiguity of the segmented regions. So even if we manage to segment most of the lesion, there are lot of areas included with the same intensity range outside the structure of interest."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Region growing\n",
+ "\n",
+ "Region growing offers an alternative to simple thresholding that addresses the issue of spatial contiguity: it only labels the voxels which are *reachable* from a set of manually-specified *seed points*. In its simplest form, it is equivalent to thresholding, but limited to regions connected to the seeds.\n",
+ "\n",
+ "**Task:** Implement region growing with UL thresholding.\n",
+ "\n",
+ "*Hints:*\n",
+ "- To initialise your zero-filled segmentation image, use `sitk.Image(.GetSize(), sitk.sitkUInt8)`. Don't forget to also call `.CopyInformation()` to copy the meta-data (spacing, origin, orientation) from the input image.\n",
+ "- You can use Python's `collections.deque` (double-ended queue). Use `.append()` or `.extend()` to enqueue elements and `.popleft()` to dequeue an element (`.pop()` would work as a stack instead).\n",
+ "- Your algorithm should remember which voxels have already been visited. This can be achieved with a Python set (`set()`), with which you can do `.add()` and `if in :`.\n",
+ "- Use tuples to represent the voxel locations (e.g. `point=(x,y,z)`), then you can index directly into the SimpleITK image with them (e.g. `image[point]`).\n",
+ "- `neighbours(, )` returns the list of immediate neighbours of ``, clipped at the image borders."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def neighbours(x, size):\n",
+ " nbs = []\n",
+ " for d in range(len(x)):\n",
+ " if x[d] > 0:\n",
+ " nb = list(x)\n",
+ " nb[d] -= 1\n",
+ " nbs.append(tuple(nb))\n",
+ " if x[d] < size[d] - 1:\n",
+ " nb = list(x)\n",
+ " nb[d] += 1\n",
+ " nbs.append(tuple(nb))\n",
+ " return nbs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from collections import deque\n",
+ "\n",
+ "def region_growing(img, seeds, low, high):\n",
+ " size = img.GetSize()\n",
+ " seg = sitk.Image(size, sitk.sitkUInt8)\n",
+ " queue = deque()\n",
+ " queue.extend(seeds)\n",
+ " visited = set()\n",
+ " while len(queue) > 0:\n",
+ " x = queue.popleft()\n",
+ " if x in visited:\n",
+ " continue\n",
+ " visited.add(x)\n",
+ " if low <= img[x] < high:\n",
+ " seg[x] = 1\n",
+ " queue.extend(neighbours(x, size))\n",
+ " seg.CopyInformation(img)\n",
+ " return seg"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "display_image(img_gauss, x=70, y=100, z=90, window=120, level=40)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "seed = (70, 100, 90)\n",
+ "low, high = 50, 100\n",
+ "seg2 = region_growing(img_gauss, [seed], low, high)\n",
+ "display_overlay(img_gauss, seg2, *seed)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's visually compare the results to a manual reference segmentation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ref = sitk.ReadImage(data_dir + 'ct-brain-lesion.nii.gz')\n",
+ "\n",
+ "print(\"Thresholding\")\n",
+ "display_image(label_overlay(img_gauss, seg), *seed)\n",
+ "print(\"Region growing\")\n",
+ "display_image(label_overlay(img_gauss, seg2), *seed)\n",
+ "print(\"Reference segmentation\")\n",
+ "display_image(label_overlay(img_gauss, ref), *seed)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Segmentation Evaluation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's now quantitatively evaluate the segmentations using different performance measures. SimpleITK has many important measures already implemented.\n",
+ "\n",
+ "First, we extract the surfaces from the segmentations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "seg_contour = sitk.LabelContour(seg)\n",
+ "seg2_contour = sitk.LabelContour(seg2)\n",
+ "ref_contour = sitk.LabelContour(ref)\n",
+ "\n",
+ "print(\"Thresholding - Surface\")\n",
+ "display_image(label_overlay(img_gauss, seg_contour), *seed)\n",
+ "print(\"Region growing - Surface\")\n",
+ "display_image(label_overlay(img_gauss, seg2_contour), *seed)\n",
+ "print(\"Reference segmentation - Surface\")\n",
+ "display_image(label_overlay(img_gauss, ref_contour), *seed)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Then we can use SimpleITK to compute overlap measures such as Jaccard Index (JI) and Dice Similarity Coefficient (DSC). We can also directly compute the Hausdorff Distance (HD) and Symmetric Average Surface Distance (ASD) from the segmentation contours using SimpleITK's `HausdorffDistanceImageFilter`. (Note, ASD is called AverageHausdorffDistance in SimpleITK)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "overlap_measures_filter = sitk.LabelOverlapMeasuresImageFilter()\n",
+ "hausdorff_distance_filter = sitk.HausdorffDistanceImageFilter()\n",
+ "\n",
+ "overlap_measures_filter.Execute(ref, seg)\n",
+ "hausdorff_distance_filter.Execute(ref_contour, seg_contour)\n",
+ "print('\\nThresholding')\n",
+ "print('JI\\t' + str(overlap_measures_filter.GetJaccardCoefficient()))\n",
+ "print('DSC\\t' + str(overlap_measures_filter.GetDiceCoefficient()))\n",
+ "print('HD\\t' + str(hausdorff_distance_filter.GetHausdorffDistance()))\n",
+ "print('ASD\\t' + str(hausdorff_distance_filter.GetAverageHausdorffDistance()))\n",
+ "\n",
+ "overlap_measures_filter.Execute(ref, seg2)\n",
+ "hausdorff_distance_filter.Execute(ref_contour, seg2_contour)\n",
+ "print('\\nRegion growing')\n",
+ "print('JI\\t' + str(overlap_measures_filter.GetJaccardCoefficient()))\n",
+ "print('DSC\\t' + str(overlap_measures_filter.GetDiceCoefficient()))\n",
+ "print('HD\\t' + str(hausdorff_distance_filter.GetHausdorffDistance()))\n",
+ "print('ASD\\t' + str(hausdorff_distance_filter.GetAverageHausdorffDistance()))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/03 - Neural-Nets-and-CNNs.ipynb b/03 - Neural-Nets-and-CNNs.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..4acb73eae2ef906a3e127d14b91fad718a3c5039
--- /dev/null
+++ b/03 - Neural-Nets-and-CNNs.ipynb
@@ -0,0 +1,1039 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tutorial 3: Neural Networks & CNNs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on Colab"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! wget https://www.doc.ic.ac.uk/~bglocker/teaching/notebooks/supervised-data.zip\n",
+ "! unzip supervised-data.zip\n",
+ "\n",
+ "# data directory\n",
+ "data_dir = 'data/mnist/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on DoC lab machines"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data directory\n",
+ "data_dir = '/vol/lab/course/416/data/mnist/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Aims of this tutorial**:\n",
+ "- Introduce you PyTorch, a library made for working with neural networks.\n",
+ "- Implement Multi-Layer Perceptrons (MLP), starting from linear regression.\n",
+ "- Implement a small Convolutional Neural Network (CNN).\n",
+ "Along the way you will implement forward and back-propagation as well as stochastic gradient descent.\n",
+ "\n",
+ "It may look long, but it should be easy to complete. Understanding and familiarizing yourselves with the core points of this tutorial and the PyTorch library is of **very high importance in order to be able to follow the next tutorials and the coursework**. Invest some time to study and understand it, and don't hesitate to ask if you don't understand something.\n",
+ "\n",
+ "**Prerequisites**:\n",
+ "- Familiar with python and numpy\n",
+ "- Familiar with logistic regression and MNIST\n",
+ "\n",
+ "\n",
+ "**Notes**:\n",
+ "- Docs for PyTorch's functions you will need: \n",
+ "https://pytorch.org/docs/stable/tensors.html \n",
+ "https://pytorch.org/docs/stable/nn.html \n",
+ "- Some helper functions are included below for loading and plotting data. They will be used out of the box below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preliminary: Loading and refreshing MNIST\n",
+ "\n",
+ "We will be using MNIST data again in this tutorial. Because the images are small, the database allows small networks to be quickly trained using CPU. Anything larger afterwards will require GPUs.\n",
+ "\n",
+ "Important point to understand is the structure of the loaded data. Especially the **shape** of the loaded numpy arrays, because we need to manipulate it carefully, when processing it with neural networks.\n",
+ "\n",
+ "Lets load and inspect the data..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import torchvision.datasets as dset\n",
+ "\n",
+ "\n",
+ "def get_mnist(data_dir, train):\n",
+ " # data_dir: path to local directory where data is, or should be stored.\n",
+ " # train: if True, return training data. If False, return test data.\n",
+ " # download: if data not in data_dir, download it.\n",
+ " data_set = dset.MNIST(root=data_dir, train=train, transform=None, download=True)\n",
+ " # Students should only deal with numpy arrays, so that it's easy to follow.\n",
+ " if train:\n",
+ " data_x = np.asarray(data_set.train_data, dtype='uint8')\n",
+ " data_y = np.asarray(data_set.train_labels, dtype='int16') # int64 by default\n",
+ " else:\n",
+ " data_x = np.asarray(data_set.test_data, dtype='uint8')\n",
+ " data_y = np.asarray(data_set.test_labels, dtype='int16') # int64 by default\n",
+ " return data_x, data_y\n",
+ "\n",
+ "\n",
+ "def make_lbls_onehot(lbls, num_classes):\n",
+ " # lbls: np.array of shape [N]\n",
+ " lbls_onehot = np.zeros(shape=(lbls.shape[0], num_classes ) )\n",
+ " lbls_onehot[ np.arange(lbls_onehot.shape[0]), lbls ] = 1\n",
+ " return lbls_onehot\n",
+ "\n",
+ "\n",
+ "# If datasets are not at specified location, they will be downloaded.\n",
+ "train_imgs, train_lbls = get_mnist(data_dir=data_dir, train=True)\n",
+ "test_imgs, test_lbls = get_mnist(data_dir=data_dir, train=False)\n",
+ "\n",
+ "print(\"[train_imgs] Type: \", type(train_imgs), \"|| Shape:\", train_imgs.shape, \"|| Data type: \", train_imgs.dtype )\n",
+ "print(\"[train_lbls] Type: \", type(train_lbls), \"|| Shape:\", train_lbls.shape, \"|| Data type: \", train_lbls.dtype )\n",
+ "print('Class labels in train = ', np.unique(train_lbls))\n",
+ "\n",
+ "print(\"[test_imgs] Type: \", type(test_imgs), \"|| Shape:\", test_imgs.shape, \" || Data type: \", test_imgs.dtype )\n",
+ "print(\"[test_lbls] Type: \", type(test_lbls), \"|| Shape:\", test_lbls.shape, \" || Data type: \", test_lbls.dtype )\n",
+ "print('Class labels in test = ', np.unique(test_lbls))\n",
+ "\n",
+ "S_tr_samples = train_imgs.shape[0] # S hereafter. Number of training samples in database.\n",
+ "H_height = train_imgs.shape[1] # H hereafter\n",
+ "W_width = train_imgs.shape[2] # W hereafter\n",
+ "C_classes = len(np.unique(train_lbls)) # C hereafter"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Above we see that data have been loaded in *numpy arrays*. \n",
+ "Arrays with images have **shape ( S = number of samples, H = height, W = width )**. \n",
+ "Arrays with labels have **shape ( S = number of samples)**, holding one integer per image, the digit's class.\n",
+ "\n",
+ "MNIST comprises of a **train set (S_tr = 60000) images** and a **test set (S_te = 10000) images**. \n",
+ "Model development uses the train set. Finally, a model is evaluated on the test set.\n",
+ "\n",
+ "Lets plot a few of them in one collage to have a better look..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "\n",
+ "def plot_image(image, interpol=\"nearest\", cmap=\"gray\", vminmax=[None,None], figsize=None):\n",
+ " # image: np.array of one of the following shapes:\n",
+ " # grayscale image: (height, width)\n",
+ " # grayscale image: (height, width, 1)\n",
+ " # rgb image: (height, width, 3)\n",
+ " print(\"Plotting image of shape: \", image.shape)\n",
+ " plt.figure(figsize=figsize) # (figsize=(n_imgs_per_row*0.5, n_rows*0.5)) # size (width, height), in inches.\n",
+ " if len(image.shape) == 2:\n",
+ " fig = plt.imshow(image, cmap=cmap, interpolation=interpol, vmin=vminmax[0], vmax=vminmax[1]) # imshow: (w,h) or (w,h,3)\n",
+ " plt.colorbar(fig)\n",
+ " elif len(image.shape) == 3 and image.shape[2] == 1:\n",
+ " fig = plt.imshow(image[:,:,0], cmap=cmap, interpolation=interpol, vmin=v_minmax[0], vmax=vminmax[1]) # imshow: (w,h) or (w,h,3)\n",
+ " plt.colorbar(fig)\n",
+ " elif len(image.shape) == 3 and image.shape[2] == 3 :\n",
+ " _ = plt.imshow(image, interpolation=interpol)\n",
+ " else:\n",
+ " raise Error(\"Wrong shape of given image for plotting.\")\n",
+ "\n",
+ "\n",
+ "def plot_grid_of_images(imgs, n_imgs_per_row=10, interpol=\"nearest\", cmap=\"gray\", vminmax=[None,None]):\n",
+ " # imgs: numpy array of one of the following shapes:\n",
+ " # grayscales images: (number-of-images, height, width)\n",
+ " # grayscales images: (number-of-images, height, width, 1)\n",
+ " # color images: (number-of-images, height, width, 3)\n",
+ " n_rows = imgs.shape[0]//n_imgs_per_row + 1*int(imgs.shape[0]%n_imgs_per_row > 0)\n",
+ " print(\"n_rows=\",n_rows)\n",
+ " # Append empty images if the last row is not complete\n",
+ " n_empty_imgs = n_rows * n_imgs_per_row - imgs.shape[0]\n",
+ " imgs_to_plot = np.concatenate( [imgs, np.zeros((n_empty_imgs, imgs.shape[1], imgs.shape[2]))], axis=0)\n",
+ " \n",
+ " # draw row by row\n",
+ " row_images = [] # each element will be (image-height, image-width X n_imgs_per_row)\n",
+ " for current_row in range(n_rows):\n",
+ " tmp_row_images = imgs_to_plot[current_row * n_imgs_per_row : (current_row + 1) * n_imgs_per_row]\n",
+ " row_images.append( np.concatenate(tmp_row_images, axis=1) )\n",
+ " # draw all row-images in one image\n",
+ " collage_of_images = np.concatenate(row_images, axis=0) # array.shape: (height X n_imgs_per_row, width X n_imgs_per_row)\n",
+ " \n",
+ " plot_image(collage_of_images, interpol=interpol, cmap=cmap, vminmax=[None,None])\n",
+ "\n",
+ "\n",
+ "plot_grid_of_images(train_imgs[0:100], n_imgs_per_row=10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Notice that the intensities in the images take **values from 0 to 255**."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preliminary: Data pre-processing\n",
+ "\n",
+ "A first step in almost all pipelines is to pre-process the data, to make them more appropriate for a model.\n",
+ "\n",
+ "Below, we will perform 3 points: \n",
+ "a) Change the labels from an integer representation to a **one-hot representation** of the **C=10 classes**.\n",
+ "b) Normalize the **intensities** in the images (from 0-255) to have a **zero-mean** and **unary standard deviation**.\n",
+ "c) **Vectorise the 2D images into 1D vectors for the MLP**, which only gets vector-input per sample. The CNN later will use the original 2D images. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# a) Change representation of labels to one-hot vectors of length C=10.\n",
+ "train_lbls_onehot = np.zeros(shape=(train_lbls.shape[0], C_classes ) )\n",
+ "train_lbls_onehot[ np.arange(train_lbls_onehot.shape[0]), train_lbls ] = 1\n",
+ "test_lbls_onehot = np.zeros(shape=(test_lbls.shape[0], C_classes ) )\n",
+ "test_lbls_onehot[ np.arange(test_lbls_onehot.shape[0]), test_lbls ] = 1\n",
+ "print(\"BEFORE: [train_lbls] Type: \", type(train_lbls), \"|| Shape:\", train_lbls.shape, \" || Data type: \", train_lbls.dtype )\n",
+ "print(\"AFTER : [train_lbls_onehot] Type: \", type(train_lbls_onehot), \"|| Shape:\", train_lbls_onehot.shape, \" || Data type: \", train_lbls_onehot.dtype )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# b) Normalize image intensities of the database to have mean=0 and std=1 across.\n",
+ "# This commonly facilitates learning:\n",
+ "# - A zero-centered signal with small magnitude allows avoiding exploding/vanishing problems easier.\n",
+ "# - Features of arbitrary & heterogeneous ranges are harder to learn how to model.\n",
+ "def normalize_int_whole_database(data):\n",
+ " # data: shape [num_samples, H, W, C]\n",
+ " mu = np.mean(data, axis=(0,1,2), keepdims=True) # Mean int of channel C, over samples and pixels.\n",
+ " std = np.std(data, axis=(0,1,2), keepdims=True) # Returned shape: [1, 1, 1, C]\n",
+ " norm_data = (data - mu) / std\n",
+ " return norm_data\n",
+ "\n",
+ "train_imgs = normalize_int_whole_database(train_imgs)\n",
+ "test_imgs = normalize_int_whole_database(test_imgs)\n",
+ "\n",
+ "# Lets plot one image.\n",
+ "index = 0 # Try any, up to 60000\n",
+ "print(\"Plotting image of index: [\", index, \"]\")\n",
+ "print(\"Class label for this image is: \", train_lbls[index])\n",
+ "print(\"One-hot label representation: [\", train_lbls_onehot[index], \"]\")\n",
+ "plot_image(train_imgs[index])\n",
+ "# Notice the magnitude of intensities. Black is now negative and white is positive float."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# c) Flatten the images, from 2D matrices to 1D vectors. MLPs take feature-vectors as input, not 2D images.\n",
+ "train_imgs_flat = train_imgs.reshape([train_imgs.shape[0], -1]) # Preserve 1st dim (N = num samples), flatten others.\n",
+ "test_imgs_flat = test_imgs.reshape([test_imgs.shape[0], -1])\n",
+ "print(\"Shape of numpy array holding the training database:\")\n",
+ "print(\"Original : [S, H, W] = [\", train_imgs.shape , \"]\")\n",
+ "print(\"Flattened: [S, H*W] = [\", train_imgs_flat.shape , \"]\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Task 1: Stochastic Gradient Descent for Training Neural Nets\n",
+ "\n",
+ "Below you are given the main training function, gradient_descent. This will be called by all following parts of the tutorial. \n",
+ "The function takes a model and data and performs an iteration of gradient descent. Every few steps, it will also predict the testing data and report accuracy.\n",
+ "\n",
+ "If the training database is large, processing the whole database per iteration can be very slow. An efficient alternative is **stochastic** gradient descent, where a training batch is randomly sampled per iteration.\n",
+ "\n",
+ "In the below, change the code to make gradient descent stochastic, by sampling a **random** batch per iteration instead of constantly the same training samples.\n",
+ "\n",
+ "(*Task 1 can be completed independently from Task 2. Try both before and after, to observe the different behaviour.*)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def plot_train_progress(loss_l, acc_train_l, acc_test_l, iters_per_point, total_iters=None):\n",
+ "\n",
+ " fig, axes = plt.subplots(1, 2, sharex=False, sharey=False)\n",
+ " assert len(loss_l) == len(acc_train_l) == len(acc_test_l)\n",
+ " x_points = range(0, len(loss_l)*iters_per_point, iters_per_point)\n",
+ " \n",
+ " axes[0].plot(x_points, loss_l, color=\"black\", label=\"Training loss\", linewidth=5)\n",
+ " axes[0].set_title(\"Training loss\", fontsize=10, y=1.022)\n",
+ " axes[0].yaxis.grid(True, zorder=0)\n",
+ " axes[0].set_xlabel('Iteration', fontsize=10)\n",
+ " if total_iters is not None:\n",
+ " axes[0].set_xlim([0,total_iters])\n",
+ " axes[0].set_ylim([0,None])\n",
+ " axes[0].legend(loc='upper right')\n",
+ " \n",
+ " axes[1].set_title(\"Accuracy\", fontsize=10, y=1.022)\n",
+ " axes[1].plot(x_points, acc_train_l, color=\"blue\", label=\"Train\", linewidth=5)\n",
+ " axes[1].plot(x_points, acc_test_l, color=\"red\", label=\"Test\", linewidth=5)\n",
+ " axes[1].yaxis.grid(True, zorder=0)\n",
+ " axes[1].set_xlabel('Iteration', fontsize=10)\n",
+ " if total_iters is not None:\n",
+ " axes[1].set_xlim([0,total_iters])\n",
+ " axes[1].set_ylim([0,100])\n",
+ " axes[1].legend(loc='lower right')\n",
+ " \n",
+ " plt.show()\n",
+ " \n",
+ " \n",
+ "def get_random_batch(train_imgs, train_lbls, batch_size, rng):\n",
+ " # train_imgs: Images for training. Numpy array of shape [S, H, W]\n",
+ " # train_lbls: Labels of the training images. Numpy array [S], one integer for each of S samples.\n",
+ " # batch_size: integer. Size that the batch should have.\n",
+ " \n",
+ " ####### TODO: Sample a random batch of images for training. Fill in the blanks (???) ######### \n",
+ " #indices = rng.randint(low=??????, high=train_imgs.shape[???????], size=?????????, dtype='int32')\n",
+ " indices = range(0, batch_size)\n",
+ " ##############################################################################################\n",
+ " \n",
+ " train_imgs_batch = train_imgs[indices]\n",
+ " train_lbls_batch = train_lbls[indices]\n",
+ " return [train_imgs_batch, train_lbls_batch]\n",
+ "\n",
+ "def gradient_descent(net, loss_func, update_params_func, rng,\n",
+ " train_imgs, train_lbls, test_imgs, test_lbls,\n",
+ " batch_size, learning_rate, total_iters, iters_per_test=-1 ):\n",
+ " # net: Instance of a model. See classes: MLP_Numpy, MLP_Torch, MLP_Torch_Autograd, CNN_Torch_Autograd.\n",
+ " # loss_func: Function that computes the loss. See functions: cross_entropy_numpy/torch.\n",
+ " # update_params_func: Function performing SGD parameter updates. See: grad_descent_update_numpy/torch/autograd\n",
+ " # rng: numpy random number generator\n",
+ " # train_imgs: The training images. Numpy array, shape [S_tr, H, W]\n",
+ " # test_imgs: Save as above, for testing images. [S_te, H, W]\n",
+ " # train_lbls: One hot representation of labels corresponding to train_imgs. Numpy array, shape [S_tr, C]\n",
+ " # test_lbls: As above, but for testing data. [S_te, C]\n",
+ " # batch_size: Size N of the batch that should be processed per SGD iteration by a model.\n",
+ " # learning_rate: self explanatory.\n",
+ " # total_iters: how many iterations in total to perform.\n",
+ " # iters_per_test: Integer. Every that many iterations the model predicts the test data and accuracy is reported.\n",
+ " values_to_plot = {'loss':[], 'acc_train': [], 'acc_test': []}\n",
+ " \n",
+ " for t in range(total_iters):\n",
+ " # Sample batch for this SGD iteration\n",
+ " train_imgs_batch, train_lbls_batch = get_random_batch(train_imgs, train_lbls, batch_size, rng)\n",
+ " \n",
+ " # Forward pass\n",
+ " y_pred = net.forward_pass(train_imgs_batch)\n",
+ "\n",
+ " # Compute loss: Cross Entropy\n",
+ " y_real = train_lbls_batch\n",
+ " loss = loss_func(y_pred, y_real)\n",
+ "\n",
+ " # Backwards pass. Compute gradients.\n",
+ " grads = net.backward_pass(loss, y_real)\n",
+ "\n",
+ " # Update weights with gradient descent\n",
+ " update_params_func(net.params, grads, learning_rate=learning_rate)\n",
+ " \n",
+ " \n",
+ " # ==== Report training loss and accuracy ======\n",
+ " # y_pred and loss can be either np.array, or torch.tensor (see later). If tensor, make it np.array.\n",
+ " y_pred_numpy = y_pred if type(y_pred) is np.ndarray else y_pred.detach().numpy()\n",
+ " y_pred_lbls = np.argmax(y_pred_numpy, axis=1) # y_pred is soft/probability. Make it a hard one-hot label.\n",
+ " y_real_lbls = np.argmax(y_real, axis=1)\n",
+ " \n",
+ " acc_train = np.mean(y_pred_lbls == y_real_lbls) * 100. # percentage\n",
+ " \n",
+ " loss_numpy = loss if type(loss) is type(float) else loss.item()\n",
+ " print(\"[iter:\", t, \"]: Training Loss: {0:.2f}\".format(loss), \"\\t Accuracy: {0:.2f}\".format(acc_train))\n",
+ " \n",
+ " # =============== Every few iterations, predict the testing-database ================#\n",
+ " if t==total_iters-1 or t%iters_per_test == 0:\n",
+ " y_pred_test = net.forward_pass(test_imgs)\n",
+ " # ==== Report test accuracy ======\n",
+ " y_pred_test_numpy = y_pred_test if type(y_pred_test) is np.ndarray else y_pred_test.detach().numpy()\n",
+ " y_pred_lbls_test = np.argmax(y_pred_test_numpy, axis=1)\n",
+ " y_real_lbls_test = np.argmax(test_lbls, axis=1)\n",
+ " acc_test = np.mean(y_pred_lbls_test == y_real_lbls_test) * 100.\n",
+ " print(\"\\t\\t\\t\\t\\t\\t\\t\\t Testing Accuracy: {0:.2f}\".format(acc_test))\n",
+ " \n",
+ " # Keep list of metrics to plot progress.\n",
+ " values_to_plot['loss'].append(loss_numpy)\n",
+ " values_to_plot['acc_train'].append(acc_train)\n",
+ " values_to_plot['acc_test'].append(acc_test)\n",
+ " # In the end of the process, plot loss accuracy on training and testing data.\n",
+ " plot_train_progress(values_to_plot['loss'], values_to_plot['acc_train'], values_to_plot['acc_test'], iters_per_test)\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Task 2: Classification with Multi-Layer Perceptron (MLP) in Numpy\n",
+ "\n",
+ "We will use Numpy to build a *Multi-Layer Perceptron* for classification, i.e. a neural network (NN) with hidden layer(s). \n",
+ "In previous tutorial we had used *logistic regression*. An MLP is very similar, but also contains at least a hidden layer.\n",
+ "Lets build this in Numpy. Please consult previous tutorial for Logistic regression.\n",
+ "\n",
+ "*(If you get stuck implementing the backward pass in Task 2 or Task 3, you can continue straight with Task 4.)*\n",
+ "\n",
+ "**Hint:** To debug all the below sections, it is very useful to print the shapes of arrays/tensors (with `print(tensor_name.shape)`). Follow how they change layer after layer, to double-check that behaviour is as expected."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# -*- coding: utf-8 -*- \n",
+ "import numpy as np\n",
+ " \n",
+ "class MLP_Numpy():\n",
+ " def __init__(self, rng):\n",
+ " # Construct and initialize network parameters\n",
+ " D_in = H_height*W_width # Dimension of input feature-vectors. Length of a vectorised image.\n",
+ " D_hid = 1000 # Dimension of Hidden layer.\n",
+ " D_out = C_classes # Dimension of Output layer. Number of classes \n",
+ " \n",
+ " ###### TODO: Initialize parameters of MLP by sampling from N(0,0.01). Fill the blanks #####\n",
+ " # Also see forward_pass(), to see how they are used. It will help you infer correct shapes.\n",
+ " w1 = rng.normal(loc=0.0, scale=0.01, size=(D_in+1, ?????)) # D_in+1 because we add bias in forward pass\n",
+ " w2 = rng.normal(loc=0.0, scale=0.01, size=(??????, D_out)) # Params of 2nd layer\n",
+ " self.params = [w1, w2]\n",
+ " #################################################################################\n",
+ " \n",
+ " def forward_pass(self, batch_imgs):\n",
+ " # This function applies the model to the input. It essentially defines the model's architecture.\n",
+ " # Input: batch_imgs: Input batch. Numpy.array of [N=number of samples, H=height, W=width] floats.\n",
+ " # Returns: y_pred: array of [N, D_out] floats. y_pred[i] contains class posterior (probs) for sample i.\n",
+ " [w1, w2] = self.params\n",
+ " \n",
+ " unary_feature_for_bias = np.ones(shape=(batch_imgs.shape[0], 1)) # [N, 1] column vector.\n",
+ " x = np.concatenate((batch_imgs, unary_feature_for_bias), axis=1) # Extra feature=1 for bias.\n",
+ " \n",
+ " ################ TODO: Fill the blanks (???) ##############################################\n",
+ " # Build network of form: x -> fully connected -> relu -> fully connected -> softmax \n",
+ " # Hidden layer\n",
+ " h1 = x.dot(?????)\n",
+ " h1_relu = np.maximum(????, ????)\n",
+ " # Fully-connected classifier.\n",
+ " h1_ext = np.concatenate((?????, unary_feature_for_bias), axis=1) # Extend for bias. Output shape: [N, D_hid+1]\n",
+ " h2 = ????.????(w2)\n",
+ " ????? = h2\n",
+ " ############################################################################################\n",
+ " \n",
+ " # Softmax activation function.\n",
+ " exp_logits = np.exp(logits)\n",
+ " y_pred = exp_logits / np.sum(exp_logits, axis=1, keepdims=True) \n",
+ " # sum with Keepdims=True returns [N,1] array. It would be [N] if keepdims=False.\n",
+ " # Numpy broadcasts [N,1] to [N,D_out] via repetition, to divide elementwise exp_h2 (which is [N,D_out]).\n",
+ " \n",
+ " # Store activations. Will be needed for backprop.\n",
+ " self.__activations = [x, h1, h1_relu, h1_ext, h2, y_pred]\n",
+ " \n",
+ " return y_pred\n",
+ " \n",
+ " \n",
+ " def backward_pass(self, loss, y_real): # NOTE: Loss value is not directly needed for computing grads in closed form.\n",
+ " # Performs back-propagation in closed form.\n",
+ " # Computes the gradient of the loss with respect to trainable parameters.\n",
+ " # y_real: Array [N, D_out]. y_real[i] is one-hot representation of real training label for i-th sample.\n",
+ " # returns grads: list of arrays, one for each trainable parameter. \n",
+ " \n",
+ " [x, h1, h1_relu, h1_ext, h2, y_pred] = self.__activations\n",
+ " [w1, w2] = self.params\n",
+ " \n",
+ " N = y_pred.shape[0]\n",
+ " D_out = y_pred.shape[1]\n",
+ " \n",
+ " # Derivative of cross entropy wrt input to softmax (straight, without intermediate computes)\n",
+ " # Difficult derivation. Nice explanation at: https://deepnotes.io/softmax-crossentropy\n",
+ " grad_logits = (y_pred - y_real) / N \n",
+ " \n",
+ " ##### TODO: Fill in the blanks (???) to complete the back-prop for the 2-layers MLP ###########\n",
+ " # Consult Tutorial 1 to compare with Logistic Regression.\n",
+ " grad_w2 = h1_ext.transpose().dot(?????) # shape: [D_hid+1, D_out]\n",
+ " grad_h1_ext = grad_logits.dot(w2.transpose()) # shape: [N, D_hid+1]\n",
+ " grad_h1_relu = grad_h1_ext[:, :-1] # Drop the extra feature dimension for the bias [N, D_hid]\n",
+ " grad_h1 = grad_h1_relu.copy() # Derivatives of activation functions.\n",
+ " grad_h1[ ????? < 0] = 0 # Derivatives of activation functions.\n",
+ " grad_w1 = ?????.transpose().dot(grad_h1) # Shape [D_in+1, D_hid]\n",
+ " ###############################################################################################\n",
+ " \n",
+ " # Elements of below list must be in same order as in self.params.\n",
+ " # Each of the subarrays has the same shape as that of corresponding trainable parameter in self.params.\n",
+ " grads = [grad_w1, grad_w2]\n",
+ " return grads\n",
+ " \n",
+ " \n",
+ "def cross_entropy_numpy(y_pred, y_real, eps=1e-7):\n",
+ " # Cross entropy\n",
+ " # y_pred: Predicted class-posterior probabilities, returned by forward_pass. Matrix of shape [N, D_out]\n",
+ " # y_real: One-hot representation of real training labels. Same shape as y_pred.\n",
+ " \n",
+ " ################# TODO: Complete the calculation of cross-entropy for each sample ################\n",
+ " x_entr_per_sample = - np.????( ???? * np.????( ????? + eps), axis=1) # Sum over classes, axis=1\n",
+ " ##################################################################################################\n",
+ " \n",
+ " loss = np.mean(x_entr_per_sample, axis=0) # Expectation of loss: Mean over samples (axis=0).\n",
+ " return loss\n",
+ "\n",
+ "\n",
+ "def grad_descent_update_numpy(params, grads, learning_rate):\n",
+ " # params: list of trainable parameters, created in initializer of the model.\n",
+ " # grads: list of gradients, returned by backward pass of model. grads[i] corresponds to params[i]\n",
+ " # learning_rate: a float.\n",
+ " assert len(params)==len(grads)\n",
+ " for i in range(len(params)):\n",
+ " ##### TODO: Perform the updates of the parameters by filling the blanks (????) ###########\n",
+ " params[i] = ????? - ????? * grads[i]\n",
+ " ##########################################################################################\n",
+ " \n",
+ "# Create the network\n",
+ "SEED = 42\n",
+ "rng = np.random.RandomState(seed=SEED) # Random number generator\n",
+ "net = MLP_Numpy(rng=rng)\n",
+ "# Start training\n",
+ "gradient_descent(net,\n",
+ " cross_entropy_numpy, # We are using the Numpy version we wrote above.\n",
+ " grad_descent_update_numpy, # We are using the Numpy version we wrote above.\n",
+ " rng,\n",
+ " train_imgs_flat,\n",
+ " train_lbls_onehot,\n",
+ " test_imgs_flat,\n",
+ " test_lbls_onehot,\n",
+ " batch_size=40, \n",
+ " learning_rate=1e-2,\n",
+ " total_iters=400,\n",
+ " iters_per_test=10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you build the MLP correctly, the above should result in a report that shows the model is learning, with training *loss* decreasing towards zero, while training and testing *accuracy* increasing.\n",
+ "\n",
+ "When process finishes (400 iterations), metrics will be plotted in the bottom of the report (**scroll down if needed**).\n",
+ "\n",
+ "If you have *not* changed gradient_descent() in Task 1, you will see the model overfitting, with training accuracy reaching 100\\%, while testing accuracy finishes at 59.57\\%.\n",
+ "\n",
+ "**Q: Why the overfit, if Task 1 has not been addressed?** \n",
+ "\n",
+ "If you have made gradient descent *stochastic* (random), overfitting will not occur with this small network. But you will notice instability in the plotted metrics. Training and testing accuracy should finish at 87.50\\% and 88.15\\% respectively.\n",
+ "\n",
+ "**Q: What factors cause the instability in the observed metrics, if Task1 has been addressed?** "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 3: MLP in PyTorch\n",
+ "\n",
+ "We now introduce a library built for development of deep neural networks, PyTorch. It will be used for the rest of the tutorials and the coursework. \n",
+ "So, it is very important to understand how it works.\n",
+ "\n",
+ "In the below we will write *exactly the same* MLP as in Task 2, but translate it to PyTorch. The changes you should do correspond to those in Task 2, just different (very similar) language. This will show you how similar it is to write Numpy and Pytorch."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# -*- coding: utf-8 -*-\n",
+ "import torch\n",
+ "\n",
+ "\n",
+ "class MLP_Torch():\n",
+ " def __init__(self, rng):\n",
+ " # Construct and initialize network parameters\n",
+ " D_in = H_height*W_width # Dimension of input feature-vectors. Length of a vectorised image.\n",
+ " D_hid = 1000 # Dimension of Hidden layer.\n",
+ " D_out = C_classes # Dimension of Output layer. Number of classes \n",
+ " \n",
+ " ######## TODO: Initialize parameters of the MLP by sampling them from N(0,0.01). ###########\n",
+ " # For Pytorch, we first make a numpy array with the *initial* weights...\n",
+ " w1_init = rng.normal(loc=0.0, scale=0.01, size=(D_in+1, ?????)) # Similar to Task1\n",
+ " w2_init = rng.normal(loc=0.0, scale=0.01, size=(??????, ?????)) # Similar to Task1\n",
+ " # ... and then we initialize with them the actual *tensor*. This one will change with training.\n",
+ " w1 = torch.tensor(w1_init, dtype=torch.float)\n",
+ " w2 = torch.tensor(w2_init, dtype=torch.float)\n",
+ " self.params = [w1, w2]\n",
+ " ##########################################################################################\n",
+ " \n",
+ " def forward_pass(self, batch_imgs):\n",
+ " # compute predicted y\n",
+ " [w1, w2] = self.params\n",
+ " \n",
+ " batch_imgs_t = torch.tensor(batch_imgs, dtype=torch.float)\n",
+ " \n",
+ " unary_feature_for_bias = torch.ones(size=(batch_imgs.shape[0], 1)) # [N, 1] column vector.\n",
+ " x = torch.cat((batch_imgs_t, unary_feature_for_bias), dim=1) # Extra feature=1 for bias.\n",
+ " \n",
+ " ################ TODO: Fill the blanks (???) ##############################################\n",
+ " # Build network of form: x -> fully connected -> relu -> fully connected -> softmax \n",
+ " # Hidden layer\n",
+ " h1 = x.mm(?????)\n",
+ " h1_relu = ????.clamp(min=????)\n",
+ " # Fully-connected classifier (aka Densely connected classifier)\n",
+ " h1_ext = torch.cat((?????, unary_feature_for_bias), dim=????)\n",
+ " h2 = ????.????(w2)\n",
+ " ????? = h2\n",
+ " ######################################################################################\n",
+ " \n",
+ " # Softmax activation function.\n",
+ " exp_logits = torch.exp(logits)\n",
+ " y_pred = exp_logits / torch.sum(exp_logits, dim=1, keepdim=True) \n",
+ " # sum with Keepdim=True returns [N,1] array. It would be [N] if keepdim=False.\n",
+ " # Torch broadcasts [N,1] to [N,D_out] via repetition, to divide elementwise exp_h2 (which is [N,D_out]).\n",
+ "\n",
+ " # Store activations. Will be needed for backprop.\n",
+ " self.__activations = [x, h1, h1_relu, h1_ext, h2, y_pred]\n",
+ " \n",
+ " return y_pred \n",
+ " \n",
+ " \n",
+ " def backward_pass(self, loss, y_real): # NOTE: It does not require the loss. Grads computed closed form.\n",
+ " # Performs back-propagation in closed form.\n",
+ " # Computes the gradient of the loss with respect to trainable parameters.\n",
+ " # y_real: Array [N, D_out]. y_real[i] is one-hot representation of real training label for i-th sample.\n",
+ " # returns grads: list of arrays, one for each trainable parameter. \n",
+ " \n",
+ " [x, h1, h1_relu, h1_ext, h2, y_pred] = self.__activations\n",
+ " [w1, w2] = self.params\n",
+ " \n",
+ " y_real = torch.tensor(y_real, dtype=torch.float) if type(y_real) is np.ndarray else y_real\n",
+ " \n",
+ " N = y_pred.shape[0]\n",
+ " D_out = y_pred.shape[1]\n",
+ " \n",
+ " # Backprop from loss all the way back, to compute gradients of trainable parameters with respect to loss.\n",
+ " \n",
+ " # Derivative of cross entropy wrt input to softmax (straight, without intermediate computes)\n",
+ " # Difficult derivation. Nice explanation at: https://deepnotes.io/softmax-crossentropy\n",
+ " grad_logits = (y_pred - y_real) / N \n",
+ " \n",
+ " ##### TODO: Fill in the blanks (???) to complete the back-prop for the 2-layers MLP ###########\n",
+ " # Compare with Numpy, it may help.\n",
+ " grad_w2 = h1_ext.t().mm(?????) # shape: [D_hid+1, D_out]\n",
+ " grad_h1_ext = grad_logits.mm(????.????()) # shape: [N, D_hid+1]. Compare with Numpy, it may help.\n",
+ " grad_h1_relu = grad_h1_ext[:, :-1] # Drop the extra feature dimension for the bias [N, D_hid]\n",
+ " grad_h1 = grad_h1_relu.clone()\n",
+ " grad_h1[????? < 0] = 0 # Derivatives of activation functions.\n",
+ " grad_w1 = ????.t().mm(grad_h1)\n",
+ " ################################################################################################\n",
+ " \n",
+ " # Elements of below list must be in same order as in self.params.\n",
+ " # Each of the subarrays has the same shape as that of corresponding trainable parameter in self.params.\n",
+ " grads = [grad_w1, grad_w2]\n",
+ " return grads\n",
+ " \n",
+ "\n",
+ "# Same used by Autograd and CNN\n",
+ "def cross_entropy_torch(y_pred, y_real, eps=1e-7):\n",
+ " # Cross entropy\n",
+ " # y_pred: Predicted class-posterior probabilities, returned by forward_pass. Numpy array of shape [N, D_out]\n",
+ " # y_real: One-hot representation of real training labels. Same shape as y_pred.\n",
+ " \n",
+ " # If number array is given, change it to a Torch tensor.\n",
+ " y_pred = torch.tensor(y_pred, dtype=torch.float) if type(y_pred) is np.ndarray else y_pred\n",
+ " y_real = torch.tensor(y_real, dtype=torch.float) if type(y_real) is np.ndarray else y_real\n",
+ " \n",
+ " ######## TODO: (similar to numpy) Complete the calculation of cross-entropy for each sample ###########\n",
+ " x_entr_per_sample = - torch.????( ???? * torch.????( ???? + eps ), dim=1) # Sum over classes, axis=1\n",
+ " #######################################################################################################\n",
+ " \n",
+ " loss = torch.mean(x_entr_per_sample, dim=0) # Expectation of loss: Mean over samples (axis=0).\n",
+ " return loss\n",
+ "\n",
+ "\n",
+ "def grad_descent_update_torch(params, grads, learning_rate):\n",
+ " # params: list of trainable parameters, created in initializer of the model.\n",
+ " # grads: list of gradients, returned by backward pass of model. grads[i] corresponds to params[i]\n",
+ " # learning_rate: a float.\n",
+ " assert len(params)==len(grads)\n",
+ " for i in range(len(params)):\n",
+ " ############## TODO: Perform the updates of the parameters ###############################\n",
+ " # Similar to Numpy, but for torch tensors you should use the param.data and grad.data for assignments.\n",
+ " params[i].data = ??????.data - ??????? * grads[i].data\n",
+ " ##########################################################################################\n",
+ " \n",
+ "# Create the network\n",
+ "rng = np.random.RandomState(seed=SEED) # Random number generator\n",
+ "net = MLP_Torch(rng=rng)\n",
+ "# Start training\n",
+ "gradient_descent(net,\n",
+ " cross_entropy_torch, # We are using the Torch version we wrote above.\n",
+ " grad_descent_update_torch, # We are using the Torch version we wrote above.\n",
+ " rng,\n",
+ " train_imgs_flat,\n",
+ " train_lbls_onehot,\n",
+ " test_imgs_flat,\n",
+ " test_lbls_onehot,\n",
+ " batch_size=40, \n",
+ " learning_rate=1e-2,\n",
+ " total_iters=400,\n",
+ " iters_per_test=10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If the correct changes are done, you should get *exactly* the same results as in Task 2.\n",
+ "\n",
+ "**For more material on transitioning from Numpy to PyTorch, you can check this tutorial: https://pytorch.org/tutorials/beginner/pytorch_with_examples.html**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Task 4: Auto-differentiation (aka Autograd in PyTorch)\n",
+ "\n",
+ "Hopefully Task 3 convinced you it's similar writing in Numpy and PyTorch. But then, why bother with PyTorch?\n",
+ "\n",
+ "Because it offers auto-differentiation (called autograd in PyTorch). In Task 2 we wrote manually the back-propagation. For this, we need to derive the derivative of each of a network's component, and implement it. It can get extremely tedious for larger nets. Libraries build for Deep Learning (like PyTorch, Tensorflow, Theano, Chainer, etc), offer auto-differentiation: When the forward pass is implemented, the framework keeps track of the operations performed on the tensors. The backend has already implementations of each operation's derivative. To do the backprop, the library's back end will take care of applying the chain-rule automatically.\n",
+ "\n",
+ "Below, we implement a new MLP class, inheritting from Task 3 class, and make only the minor changes required to use Autograd in the backwards pass..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# -*- coding: utf-8 -*-\n",
+ "import torch\n",
+ "\n",
+ "class MLP_Torch_Autograd(MLP_Torch):\n",
+ " def __init__(self, rng):\n",
+ " # Construct and initialize network parameters\n",
+ " D_in = H_height*W_width # Dimension of input feature-vectors. Length of a vectorised image.\n",
+ " D_hid = 1000 # Dimension of Hidden layer.\n",
+ " D_out = C_classes # Dimension of Output layer. Number of classes \n",
+ " \n",
+ " ######## TODO: Add a hidden layer, to change this perceptron to an MLP ##########\n",
+ " # EXACTLY as above for MLP_Pytorch (copy-paste)...\n",
+ " w1_init = rng.normal(loc=0.0, scale=0.01, size=(D_in+1, ????)) # Similar to task 1/2\n",
+ " w2_init = rng.normal(loc=0.0, scale=0.01, size=(?????, ????)) # Similar to task 1/2\n",
+ " # ... EXCEPT for passing one more argument!\n",
+ " # requires_grad=True tells Torch that it will later need to automatically compute grads for this!\n",
+ " w1 = torch.tensor(w1_init, dtype=torch.float, requires_grad=????) # <------- Important. Set it!\n",
+ " w2 = torch.tensor(w2_init, dtype=torch.float, requires_grad=????) # <------- Important Set it!\n",
+ " self.params = [w1, w2]\n",
+ " ###########################################################################\n",
+ " \n",
+ " \n",
+ " def forward_pass(self, batch_imgs):\n",
+ " return MLP_Torch.forward_pass(self, batch_imgs) # Calls parent's (MLP_Torch). No change.\n",
+ " \n",
+ " def backward_pass(self, loss, y_real):\n",
+ " ######### TODO: Return automatically computed gradients in a list (ala task1/2) ################\n",
+ " # We no longer need to write down the backward pass, for parameters were requires_grads=True, \n",
+ " # Calling loss.backward(), torch's Autograd automatically computes grads of loss wrt each parameter p,...\n",
+ " # ... and puts them in p.grad. Return them in a list.\n",
+ " loss.backward()\n",
+ " grads = [param.????? for param in self.params]\n",
+ " return grads\n",
+ " ################################################################################################\n",
+ " \n",
+ "\n",
+ "def grad_descent_update_autograd(params, grads, learning_rate):\n",
+ " # params: Type: Tensor. List of trainable parameters, created in initializer of the model. \n",
+ " # grads: Type: Tensor. List of gradients, returned by backward pass. grads[i] corresponds to params[i].\n",
+ " # learning_rate: a float.\n",
+ " assert len(params)==len(grads)\n",
+ " for i in range(len(params)):\n",
+ " \n",
+ " ############## TODO: Perform the updates of the parameters ###############################\n",
+ " # Same as function grad_descent_update_torch() previously (copy paste).\n",
+ " ?????????.data -= ??????????? * ?????????.data\n",
+ " # IMPORTANT: Weirdness of Torch's Autograd: Need to manually set gradients to zero at end of iteration.\n",
+ " # ... otherwise loss.backward() keeps accumulating them, and you would get wrong results.\n",
+ " ?????[i].zero_() # *** IMPORTANT. REMEMBER THIS FOR FUTURE & COURSEWORK! ***\n",
+ " ##########################################################################################\n",
+ "\n",
+ "# Create the network\n",
+ "rng = np.random.RandomState(seed=SEED) # Random number generator\n",
+ "net = MLP_Torch_Autograd(rng=rng)\n",
+ "# Start training\n",
+ "gradient_descent(net,\n",
+ " cross_entropy_torch, # Same as before.\n",
+ " grad_descent_update_autograd, # Use the above version, with auto-differentiation.\n",
+ " rng,\n",
+ " train_imgs_flat,\n",
+ " train_lbls_onehot,\n",
+ " test_imgs_flat,\n",
+ " test_lbls_onehot,\n",
+ " batch_size=40, \n",
+ " learning_rate=1e-2,\n",
+ " total_iters=400,\n",
+ " iters_per_test=10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If completed correctly, the results should be *exactly* the same as in Task 2 and 3. \n",
+ "**Please note** the implementation detail of zeroing the gradients in the end of function grad_descent_update_autograd(), needed when Autograd is used.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Task 5: Building a basic Convolutional Neural Network in PyTorch\n",
+ "\n",
+ "The predominant neural networks used in computer vision are Convolutional Neural Networks (CNNs). Their most popular components are convolutions, for processing information and deriving more complex features, and pooling layers for aggregating & downsampling the information.\n",
+ "\n",
+ "We will below familiarize ourselves with convolutions and pooling in PyTorch. We will change the 2-layers MLP of the previous tasks to a CNN of the form: \n",
+ "` x -> conv -> bias -> pool -> flatten -> bias -> fully connected layer -> y_pred` \n",
+ "\n",
+ "Aims:\n",
+ "- Understand the corresponding functions of PyTorch.\n",
+ "- Understand how each convolutional/pooling layer changes the dimensions of the input feature maps. \n",
+ "\n",
+ "These points are important for building more complex models in later tutorials/courseworks, so spend some time experimenting."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# -*- coding: utf-8 -*-\n",
+ "import torch\n",
+ "from torch.nn import functional as F\n",
+ "\n",
+ "\"\"\"\n",
+ "We provide the main API of the conv2d and max_pool2d operations here for convenience:\n",
+ "\n",
+ "conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1)\n",
+ " input: input tensor of shape: (minibatch, in_channels, Height, Width)\n",
+ " weight: filters of shape: (out_channels, in_channels, kernel's Height, kernel's Width)\n",
+ " bias: optional bias tensor of shape: (out_channels).\n",
+ " stride: stride of the convolving kernel. single scalar or (sH, sW).\n",
+ " padding: zero paddings on both sides of input. Single number or tuple (padH, padW).\n",
+ " dilation: dilation rate for dilated convolutions (though we don't discuss it here).\n",
+ " \n",
+ "For docs, search for *conv2d* at: https://pytorch.org/docs/stable/_modules/torch/nn/functional.html\n",
+ "... or write create a new cell in this notebook, and after the above import, write and run: F.conv2d?\n",
+ "\n",
+ "F.max_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False)\n",
+ " input: input tensor of shape: (minibatch, in_channels, Height, Width)\n",
+ " kernel_size: size of the pooling kernel. (Kernel's Height, kernel's Width)\n",
+ " stride: stride of the pooling kernel as it convolves the input. single scalar or (stride-H, stride-W).\n",
+ " padding: zero paddings on both sides of input. Single number or tuple (padH, padW).\n",
+ " ceil_mode: If True, will use `ceil` instead of `floor` to compute output shape when kernel partly runs out of input borders.\n",
+ "\n",
+ "For docs, search for *max_pool2d* at: https://pytorch.org/docs/stable/_modules/torch/nn/functional.html\n",
+ "... or write create a new cell in this notebook, and after the above import, write and run: F.max_pool2d?\n",
+ "\"\"\"\n",
+ "\n",
+ "class CNN_Torch_Autograd():\n",
+ " def __init__(self, rng):\n",
+ " # Construct and initialize network parameters, to build a CNN of the form:\n",
+ " # x -> conv[5x5] -> bias -> maxpool k[2x2],s[2,2] -> flatten -> bias -> fully conn. layer -> y_pred\n",
+ " D_in = 1 # Channels of input image.\n",
+ " D_hid = 10\n",
+ " ######### QUESTION: How is the below number calculated for this architecture? ###################\n",
+ " # D_in_fc is the length of hidden feature vector per sample, given as input to last fully conn. layer.\n",
+ " # Why this number? It's architecture dependent.\n",
+ " # To answer, check the rest of this function and forward_pass() to understand the architecture.\n",
+ " # Consider how a conv and a pool change the shape of the input. Print shapes in forward_pass to investigate.\n",
+ " D_in_fc = 1440\n",
+ " #################################################################################################\n",
+ " D_out = C_classes # Dimension of Output layer. Number of classes \n",
+ " \n",
+ " ######## TODO: Initialize [5x5] convolutional kernels and biases ############\n",
+ " # Initialize conv kernels and biases for CNN of form:\n",
+ " # x -> conv[5x5] -> bias -> maxpool k[2x2],s[2,2] -> flatten -> bias -> fully conn. layer -> y_pred\n",
+ " # Consult the forward pass, to get an idea about the architecture, to help filling this section.\n",
+ " # Notice: biases are implemented differently in CNNs than MLPs....\n",
+ " # ... They are implemented by adding a different *trainable* bias to each channel (feature dimension).\n",
+ " w1_init = rng.normal(loc=0.0, scale=0.01, size=(?????, D_in, 5, ?????)) # See Conv's API for expected shape.\n",
+ " b1_init = rng.normal(loc=0.0, scale=0.01, size=(D_hid)) # One bias per channel of hidden layer\n",
+ " w2_init = rng.normal(loc=0.0, scale=0.01, size=(D_in_fc, ?????)) # Classification layer, similar to MLP.\n",
+ " b2_init = rng.normal(loc=0.0, scale=0.01, size=(?????)) # One bias per channel of hidden layer\n",
+ " w1 = torch.tensor(w1_init, dtype=torch.float, requires_grad=True)\n",
+ " b1 = torch.tensor(b1_init, dtype=torch.float, requires_grad=True)\n",
+ " w2 = torch.tensor(w2_init, dtype=torch.float, requires_grad=True)\n",
+ " b2 = torch.tensor(b2_init, dtype=torch.float, requires_grad=True)\n",
+ " self.params = [w1, b1, w2, b2]\n",
+ " #############################################################################################\n",
+ " \n",
+ " \n",
+ " def forward_pass(self, batch_imgs):\n",
+ " # compute predicted y\n",
+ " [w1, b1, w2, b2] = self.params\n",
+ " \n",
+ " x = torch.tensor(batch_imgs, dtype=torch.float)\n",
+ " \n",
+ " ####################### TODO: Fill in the blanks to create a basic CNN ######################\n",
+ " # Make CNN: x -> conv[5x5] -> bias -> maxpool k[2x2],s[2,2] -> flatten -> bias -> fully conn. layer -> y_pred\n",
+ " # Hidden layer\n",
+ " print(\"[x] shape: \", x.shape)\n",
+ " h1 = F.conv2d(input=??????, ?????=w1, bias=None, stride=1, padding=0, dilation=1)\n",
+ " print(\"[h1] shape: \", h1.shape)\n",
+ " b1_resh = b1.reshape([1, b1.shape[0], 1, 1]) # Add unary dims, so that afterwards it can get broadcasted\n",
+ " h1_bias = ?????? + b1_resh #Unary dims (N,-,W,H) get broadcasted for the addition.\n",
+ " h1_relu = ??????.clamp(min=0)\n",
+ " print(\"[h1_relu] shape: \", h1_relu.shape)\n",
+ " h1_pool = F.max_pool2d(????, kernel_size=[2,2], stride=[2,2], padding=0, ceil_mode=True)\n",
+ " print(\"[h1_pool] shape: \", h1_pool.shape)\n",
+ " # Fully-connected classifier (aka Densely connected classifier)\n",
+ " h1_relu_flat = h1_pool.reshape( [h1_pool.shape[0], -1] ) # Flatten activations, to pass to fully conn layer.\n",
+ " print(\"[h1_relu_flat] shape: \", h1_relu_flat.shape)\n",
+ " h2 = h1_relu_flat.mm(??????)\n",
+ " logits = ????? + ??????.reshape([1, b2.shape[0]]) # Add bias to activation, after reshape for broadcasting.\n",
+ " print(\"[logits] shape: \", logits.shape)\n",
+ " ##############################################################################################\n",
+ " \n",
+ " # Softmax activation function\n",
+ " exp_logits = torch.exp(logits)\n",
+ " y_pred = exp_logits / torch.sum(exp_logits, dim=1, keepdim=True) \n",
+ " # sum with Keepdim=True returns [N,1] array. It would be [N] if keepdim=False.\n",
+ " # Torch broadcasts [N,1] to [N,D_out] via repetition, to divide elementwise exp_h2 (which is [N,D_out]).\n",
+ " \n",
+ " return y_pred\n",
+ " \n",
+ " def backward_pass(self, loss, y_real):\n",
+ " # Same as MLP in autograd.\n",
+ " loss.backward()\n",
+ " grads = [param.grad for param in self.params]\n",
+ " return grads\n",
+ " \n",
+ " \n",
+ "########################## IMPORTANT NOTE (nothing to do) ###############################################\n",
+ "# The MLP receives each sample as a vector. Thus we had vectorized/flattened the 2D images prior to Task 1.\n",
+ "# The CNN processes 2D inputs directly. Vectorizing them is not needed.\n",
+ "# BUT PyTorch receives input to conv/pool of Torch needs to be of shape [N, Channels, H, W], with...\n",
+ "# ... second dimension being the number of input channels (number of input feature maps if hidden layer).\n",
+ "# For grayscale images, there is only 1 image channel. For RGB images, this would be 3. We add this dimension:\n",
+ "IMAGE_CHANNELS = 1\n",
+ "train_imgs_cnn = train_imgs.reshape([train_imgs.shape[0], IMAGE_CHANNELS, train_imgs.shape[1], train_imgs.shape[2]])\n",
+ "test_imgs_cnn = test_imgs.reshape([test_imgs.shape[0], IMAGE_CHANNELS, test_imgs.shape[1], test_imgs.shape[2]])\n",
+ "#######################################################################################################\n",
+ "\n",
+ "# Create the network\n",
+ "rng = np.random.RandomState(seed=SEED) # Random number generator\n",
+ "net = CNN_Torch_Autograd(rng=rng)\n",
+ "# Start training\n",
+ "gradient_descent(net,\n",
+ " cross_entropy_torch, # Same as MLP.\n",
+ " grad_descent_update_autograd, # Same as MLP.\n",
+ " rng,\n",
+ " train_imgs_cnn,\n",
+ " train_lbls_onehot,\n",
+ " test_imgs_cnn,\n",
+ " test_lbls_onehot,\n",
+ " batch_size=40, \n",
+ " learning_rate=1e-2,\n",
+ " total_iters=400,\n",
+ " iters_per_test=10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If completed successfully, this basic CNN should be trainable in less than a minute on a CPU. Its testing accuracy should finish at 87.61\\% for the stochastic version of gradient descent, similar to that of the MLP.\n",
+ "\n",
+ "\n",
+ "**Q: How do we compute the number of input neurons to the last hidden layer?** \n",
+ "D_in_fc has been given equal to 1440. Why? How is this number computed? It is architecture dependent. To answer, consider how the shape of a tensor changes as it propagates through the CNN. How do the convs and pool operators change the shape of a tensor, with respect to their kernel size, stride, use of padding etc? This is very important to understand, to be able to implement larger and more complex networks.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Bonus (optional):\n",
+ " \n",
+ "Experiment yourselves with adding more conv/pool modules, to build a deeper net. Deeper networks (ala LeNet) can exceed 99\\% test accuracy. Can you approach this by adding more layers? \n",
+ "You will find that deeper nets not only quickly become expensive to train, but are also not easy to train. They require careful hyperparameter configuration (learning rate, batch size, initialization, total training iterations, etc). How hyper-parameters change behaviour during training is not well understood theoretically, and currently depends on practical experience. Get some if you can.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## This notebook:\n",
+ "Copyright 2020, Imperial College London \n",
+ "For issues e-mail: konstantinos.kamnitsas12@imperial.ac.uk"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/04 - FCNs-for-segmentation.ipynb b/04 - FCNs-for-segmentation.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..17182f4167f6a1a0ef5bfa4a767b28c38dd48735
--- /dev/null
+++ b/04 - FCNs-for-segmentation.ipynb
@@ -0,0 +1,1145 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tutorial 4: Fully Convolutional Nets for Image Segmentation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on Colab"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! wget https://www.doc.ic.ac.uk/~bglocker/teaching/notebooks/supervised-data.zip\n",
+ "! unzip supervised-data.zip\n",
+ "\n",
+ "! wget https://www.doc.ic.ac.uk/~bglocker/teaching/notebooks/brain2d-data.zip\n",
+ "! unzip brain2d-data.zip\n",
+ "\n",
+ "# data directories\n",
+ "data_dir = 'data/mnist/'\n",
+ "data_brain_dir = 'data/brain2d/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on DoC lab machines"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data directories\n",
+ "data_dir = '/vol/lab/course/416/data/mnist/'\n",
+ "data_brain_dir = '/vol/lab/course/416/data/brain2d/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Aims of this tutorial**:\n",
+ "- Introduce Fully Convolutional Networks.\n",
+ "- Showcase that fully connected/dense layers can be equally turned to convolutional.\n",
+ "- Show how Fully Convolutional Networks can be used for segmentation.\n",
+ "\n",
+ "The tutorial contains material that you will need for the **Coursework**.\n",
+ "Along the way you will implement a CNN using the nn.module of Pytorch for extra simplicity. \n",
+ "You will also get to see how ML can help segmenting brain tumours! \n",
+ "\n",
+ "**Prerequisites**:\n",
+ "- Familiar with python and numpy and Pytorch\n",
+ "- Familiar with logistic regression and MNIST\n",
+ "- Familiar with SGD, Cross entropy\n",
+ "\n",
+ "\n",
+ "**Notes**:\n",
+ "- Docs for Pytorch's nn.Module you will use: \n",
+ "https://pytorch.org/docs/stable/_modules/torch/nn/modules/module.html\n",
+ "- Some helper functions are included below for loading and plotting data. They will be used out of the box below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Preliminaries: Loading and pre-processing MNIST (from Tutorial 3)\n",
+ "We will use MNIST as in previous tutorial. Below, we prepare the data for a CNN, *exactly* as in Tutorial 3. \n",
+ "Make sure to refresh your memory about **shape of the data** involved. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import torchvision.datasets as dset\n",
+ "\n",
+ "\n",
+ "def get_mnist(data_dir, train):\n",
+ " # data_dir: path to local directory where data is, or should be stored.\n",
+ " # train: if True, return training data. If False, return test data.\n",
+ " # download: if data not in data_dir, download it.\n",
+ " data_set = dset.MNIST(root=data_dir, train=train, transform=None, download=True)\n",
+ " # Students should only deal with numpy arrays, so that it's easy to follow.\n",
+ " if train:\n",
+ " data_x = np.asarray(data_set.train_data, dtype='uint8')\n",
+ " data_y = np.asarray(data_set.train_labels, dtype='int16') # int64 by default\n",
+ " else:\n",
+ " data_x = np.asarray(data_set.test_data, dtype='uint8')\n",
+ " data_y = np.asarray(data_set.test_labels, dtype='int16') # int64 by default\n",
+ " return data_x, data_y\n",
+ "\n",
+ "\n",
+ "def make_lbls_onehot(lbls, num_classes):\n",
+ " # lbls: np.array of shape [N]\n",
+ " lbls_onehot = np.zeros(shape=(lbls.shape[0], num_classes ) )\n",
+ " lbls_onehot[ np.arange(lbls_onehot.shape[0]), lbls ] = 1\n",
+ " return lbls_onehot\n",
+ "\n",
+ "\n",
+ "# If datasets are not at specified location, they will be downloaded.\n",
+ "train_imgs, train_lbls = get_mnist(data_dir=data_dir, train=True)\n",
+ "test_imgs, test_lbls = get_mnist(data_dir=data_dir, train=False)\n",
+ "\n",
+ "print(\"[train_imgs] Type: \", type(train_imgs), \"|| Shape:\", train_imgs.shape, \"|| Data type: \", train_imgs.dtype )\n",
+ "print(\"[train_lbls] Type: \", type(train_lbls), \"|| Shape:\", train_lbls.shape, \"|| Data type: \", train_lbls.dtype )\n",
+ "print('Class labels in train = ', np.unique(train_lbls))\n",
+ "\n",
+ "print(\"[test_imgs] Type: \", type(test_imgs), \"|| Shape:\", test_imgs.shape, \" || Data type: \", test_imgs.dtype )\n",
+ "print(\"[test_lbls] Type: \", type(test_lbls), \"|| Shape:\", test_lbls.shape, \" || Data type: \", test_lbls.dtype )\n",
+ "print('Class labels in test = ', np.unique(test_lbls))\n",
+ "\n",
+ "S_tr_samples = train_imgs.shape[0] # S hereafter. Number of training samples in database.\n",
+ "H_height = train_imgs.shape[1] # H hereafter\n",
+ "W_width = train_imgs.shape[2] # W hereafter\n",
+ "C_classes = len(np.unique(train_lbls)) # C hereafter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Change representation of labels to one-hot vectors of length C=10. (Lec.5, sl.16)\n",
+ "train_lbls_onehot = np.zeros(shape=(train_lbls.shape[0], C_classes ) )\n",
+ "train_lbls_onehot[ np.arange(train_lbls_onehot.shape[0]), train_lbls ] = 1\n",
+ "test_lbls_onehot = np.zeros(shape=(test_lbls.shape[0], C_classes ) )\n",
+ "test_lbls_onehot[ np.arange(test_lbls_onehot.shape[0]), test_lbls ] = 1\n",
+ "print(\"BEFORE: [train_lbls] Type: \", type(train_lbls), \"|| Shape:\", train_lbls.shape, \" || Data type: \", train_lbls.dtype )\n",
+ "print(\"AFTER : [train_lbls_onehot] Type: \", type(train_lbls_onehot), \"|| Shape:\", train_lbls_onehot.shape, \" || Data type: \", train_lbls_onehot.dtype )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# (Same as in Tut. 3) \n",
+ "# Normalize intensities (from 0-255) to have 0-mean and 1-stdev. (Lec.5, sl.68) \n",
+ "def normalize_int_whole_database(data):\n",
+ " # data: shape [num_samples, H, W, C]\n",
+ " mu = np.mean(data, axis=(0,1,2), keepdims=True) # Mean int of channel C, over samples and pixels.\n",
+ " std = np.std(data, axis=(0,1,2), keepdims=True) # Returned shape: [1, 1, 1, C]\n",
+ " norm_data = (data - mu) / std\n",
+ " return norm_data\n",
+ "\n",
+ "train_imgs = normalize_int_whole_database(train_imgs)\n",
+ "test_imgs = normalize_int_whole_database(test_imgs)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Cross entropy for Classifier (from Tutorial 3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For better understanding, we use our own implementation. Of course, PyTorch has the cross entropy loss already implemented: https://pytorch.org/docs/stable/nn.html#crossentropyloss"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def cross_entropy(y_pred, y_real, eps=1e-7):\n",
+ " # y_pred: Tensor of shape [N, D_out]. Predicted class-posterior probabilities from forward.\n",
+ " # y_real: Same shape as y_pred. One-hot representation of real labels.\n",
+ " x_entr_per_sample = - torch.sum( y_real*torch.log(y_pred+eps), dim=1) # Sum over classes, axis=1\n",
+ " loss = torch.mean(x_entr_per_sample, dim=0) # Expectation of loss: Mean over samples (axis=0).\n",
+ " return loss"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Stochastic Gradient Descent (from Tutorial 3)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "PyTorch has also very useful functionality for handling datasets and sampling of training data. However, to fully understand what's going, again, we make use of our own implementation here. More details about how PyTorch can help with handling data can be found here: https://pytorch.org/docs/stable/data.html"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Helpers\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "def plot_train_progress(loss_l, acc_train_l, acc_test_l, iters_per_point, total_iters=None):\n",
+ "\n",
+ " fig, axes = plt.subplots(1, 2, sharex=False, sharey=False)\n",
+ " assert len(loss_l) == len(acc_train_l) == len(acc_test_l)\n",
+ " x_points = range(0, len(loss_l)*iters_per_point, iters_per_point)\n",
+ " \n",
+ " axes[0].plot(x_points, loss_l, color=\"black\", label=\"Training loss\", linewidth=5)\n",
+ " axes[0].set_title(\"Training loss\", fontsize=10, y=1.022)\n",
+ " axes[0].yaxis.grid(True, zorder=0)\n",
+ " axes[0].set_xlabel('Iteration', fontsize=10)\n",
+ " if total_iters is not None:\n",
+ " axes[0].set_xlim([0,total_iters])\n",
+ " axes[0].set_ylim([0,None])\n",
+ " axes[0].legend(loc='upper right')\n",
+ " \n",
+ " axes[1].set_title(\"Accuracy\", fontsize=10, y=1.022)\n",
+ " axes[1].plot(x_points, acc_train_l, color=\"blue\", label=\"Train\", linewidth=5)\n",
+ " axes[1].plot(x_points, acc_test_l, color=\"red\", label=\"Test\", linewidth=5)\n",
+ " axes[1].yaxis.grid(True, zorder=0)\n",
+ " axes[1].set_xlabel('Iteration', fontsize=10)\n",
+ " if total_iters is not None:\n",
+ " axes[1].set_xlim([0,total_iters])\n",
+ " axes[1].set_ylim([0,100])\n",
+ " axes[1].legend(loc='lower right')\n",
+ " \n",
+ " plt.show()\n",
+ " \n",
+ "\n",
+ "def plot_image(image, interpol=\"nearest\", cmap=\"gray\", vminmax=[None,None], figsize=None):\n",
+ " # image: np.array of one of the following shapes:\n",
+ " # grayscale image: (height, width)\n",
+ " # grayscale image: (height, width, 1)\n",
+ " # rgb image: (height, width, 3)\n",
+ " print(\"Plotting image of shape: \", image.shape)\n",
+ " plt.figure(figsize=figsize) # (figsize=(n_imgs_per_row*0.5, n_rows*0.5)) # size (width, height), in inches.\n",
+ " if len(image.shape) == 2:\n",
+ " fig = plt.imshow(image, cmap=cmap, interpolation=interpol, vmin=vminmax[0], vmax=vminmax[1]) # imshow: (w,h) or (w,h,3)\n",
+ " plt.colorbar(fig)\n",
+ " elif len(image.shape) == 3 and image.shape[2] == 1:\n",
+ " fig = plt.imshow(image[:,:,0], cmap=cmap, interpolation=interpol, vmin=v_minmax[0], vmax=vminmax[1]) # imshow: (w,h) or (w,h,3)\n",
+ " plt.colorbar(fig)\n",
+ " elif len(image.shape) == 3 and image.shape[2] == 3 :\n",
+ " _ = plt.imshow(image, interpolation=interpol)\n",
+ " else:\n",
+ " raise Error(\"Wrong shape of given image for plotting.\")\n",
+ "\n",
+ "\n",
+ "def display_image_dynamically(image, interpol=\"nearest\", cmap=\"gray\", vminmax=[None,None], figsize=None):\n",
+ " # image: np.array of one of the following shapes:\n",
+ " # grayscale image: (height, width)\n",
+ " # grayscale image: (height, width, 1)\n",
+ " \n",
+ " print(\"Plotting image of shape: \", image.shape)\n",
+ " plt.figure(figsize=figsize) # (figsize=(n_imgs_per_row*0.5, n_rows*0.5)) # size (width, height), in inches.\n",
+ " if len(image.shape) == 2:\n",
+ " fig = plt.imshow(image, cmap=cmap, interpolation=interpol, vmin=vminmax[0], vmax=vminmax[1]) # imshow: (w,h) or (w,h,3)\n",
+ " plt.colorbar(fig)\n",
+ " elif len(image.shape) == 3 and image.shape[2] == 1:\n",
+ " fig = plt.imshow(image[:,:,0], cmap=cmap, interpolation=interpol, vmin=v_minmax[0], vmax=vminmax[1]) # imshow: (w,h) or (w,h,3)\n",
+ " plt.colorbar(fig)\n",
+ " elif len(image.shape) == 3 and image.shape[2] == 3 :\n",
+ " _ = plt.imshow(image, interpolation=interpol)\n",
+ " else:\n",
+ " raise Error(\"Wrong shape of given image for plotting.\")\n",
+ " \n",
+ " #display.clear_output(wait=True)\n",
+ " display.display(plt.gcf())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def compute_accuracy(lbls_pred, lbls_real):\n",
+ " # lbls_pred, lbls_read: np.arrays of labels (after argmax), not probabilities/onehot.\n",
+ " acc = np.mean(lbls_pred == lbls_real) * 100.\n",
+ " return acc\n",
+ "\n",
+ "def get_random_batch(train_imgs, train_lbls, N_batch_size, rng):\n",
+ " # train_imgs: Images for training. Numpy array of shape [S, H, W]\n",
+ " # train_lbls: Labels of the training images.\n",
+ " # N_batch_size: integer. Size that the batch should have.\n",
+ " indices = rng.randint(low=0, high=train_imgs.shape[0], size=N_batch_size, dtype='int32')\n",
+ " train_imgs_batch = train_imgs[indices]\n",
+ " train_lbls_batch = train_lbls[indices]\n",
+ " return [train_imgs_batch, train_lbls_batch]\n",
+ "\n",
+ "def gradient_descent(net, loss_func, grad_descent_optimizer, rng,\n",
+ " train_imgs, train_lbls, test_imgs, test_lbls,\n",
+ " N_batch_size, total_iters, iters_per_test=-1,\n",
+ " is_tumor_segm=False ):\n",
+ " # net: Instance of a model.\n",
+ " # loss_func: Function that computes the loss. See functions: cross_entropy.\n",
+ " # grad_descent_optimizer: From torch.optim (see Task 2)\n",
+ " # rng: numpy random number generator\n",
+ " # train_imgs: The training images. Numpy array, shape [S_tr, H, W]\n",
+ " # test_imgs: Save as above, for testing images. [S_te, H, W]\n",
+ " # train_lbls: One hot representation of labels corresponding to train_imgs. Numpy array, shape [S_tr, C]\n",
+ " # test_lbls: As above, but for testing data. [S_te, C]\n",
+ " # N_batch_size: Size N of the batch that should be processed per SGD iteration by a model.\n",
+ " # learning_rate: self explanatory.\n",
+ " # total_iters: how many iterations in total to perform.\n",
+ " # iters_per_test: Integer. Every that many iterations the model predicts the test data and accuracy is reported.\n",
+ " values_to_plot = {'loss':[], 'acc_train': [], 'acc_test': []}\n",
+ " \n",
+ " for t in range(total_iters):\n",
+ " \n",
+ " x, y_real = get_random_batch(train_imgs, train_lbls, N_batch_size, rng)\n",
+ " \n",
+ " y_pred = net.forward( torch.tensor(x, dtype=torch.float) )\n",
+ " \n",
+ " loss = loss_func(y_pred, torch.tensor(y_real, dtype=torch.float)) # Cross Entropy\n",
+ "\n",
+ " loss.backward() # Computes grads with auto-differentiation. Stores them in each params.grads\n",
+ " \n",
+ " # Update weights with gradient descent. One of optimizers given by torch.optim\n",
+ " grad_descent_optimizer.step()\n",
+ " grad_descent_optimizer.zero_grad() # zero the parameter gradients.\n",
+ " \n",
+ " # ==== Report training loss and accuracy ======\n",
+ " lbls_pred = np.argmax(y_pred.detach().numpy(), axis=1) # Get labels from the probabilities.\n",
+ " lbls_real = np.argmax(y_real, axis=1) # Get labels from one-hot\n",
+ " acc_train = compute_accuracy(lbls_pred, lbls_real)\n",
+ " print(\"[iter:\", t, \"]: Training Loss: {0:.2f}\".format(loss.item()), \"\\t Accuracy: {0:.2f}\".format(acc_train))\n",
+ " \n",
+ " # =============== Every few iterations, predict the testing-database ================#\n",
+ " if t==total_iters-1 or t%iters_per_test == 0:\n",
+ " y_pred_test = net.forward(torch.tensor(test_imgs, dtype=torch.float))\n",
+ " y_pred_test_numpy = y_pred_test.detach().numpy()\n",
+ " # ==== Report test accuracy ======\n",
+ " lbls_pred_test = np.argmax(y_pred_test_numpy, axis=1)\n",
+ " lbls_real_test = np.argmax(test_lbls, axis=1)\n",
+ " acc_test = compute_accuracy(lbls_pred_test, lbls_real_test)\n",
+ " print(\"\\t\\t\\t\\t\\t\\t\\t\\t Testing Accuracy: {0:.2f}\".format(acc_test))\n",
+ " \n",
+ " # Keep list of metrics to plot progress.\n",
+ " values_to_plot['loss'].append(loss.item())\n",
+ " values_to_plot['acc_train'].append(acc_train)\n",
+ " values_to_plot['acc_test'].append(acc_test)\n",
+ " \n",
+ " if is_tumor_segm:\n",
+ " display_image_dynamically(y_pred_test_numpy[0,1,:,:])\n",
+ " #scipy.misc.imsave('./pred_prob.png', y_pred_test_numpy[0,1,:,:])\n",
+ " #scipy.misc.imsave('./pred_segm.png', lbls_pred_test[0,:,:])\n",
+ " \n",
+ " \n",
+ " # In the end of the process, plot loss accuracy on training and testing data.\n",
+ " plot_train_progress(values_to_plot['loss'], values_to_plot['acc_train'], values_to_plot['acc_test'], iters_per_test)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 1: Build LeNet\n",
+ "\n",
+ "In tutorial 3 we had built a tiny CNN, with just 1 convolution and 1 fully connected layer. \n",
+ "Here, we will build the legendary LeNet. \n",
+ "\n",
+ "**Replicate** the code given to you in the lecture **CNNs for Image Segmentation** to build the architecture below:\n",
+ "\n",
+ "We will build this by using the **nn** and **nn.Module** of PyTorch.\n",
+ "\n",
+ "- **nn**: Helps developing nets easier by providing building blocks such as *layers*. For example, a **nn.Conv2d** layer internally initializes a weight kernel for its convolution operator, encapsulates a bias term, can apply padding to its input, etc. If we would do this using the basic operators given by *torch.nn.functional* (what we did in the last turorial), we have more transparency, but we got to do everything ourselves. Choose wisely. \n",
+ "- **nn.Module**: A further abstraction module. This module should be used as a **parent class** for classes that represent whole networks. Then, Pytorch automatically keeps track of the net's parameters within **self.parameters**. All you need is to define the **forward** pass. This makes things clean as you will see... "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "import torch.nn as nn\n",
+ "import torch.nn.functional as F\n",
+ "\n",
+ "# Docs of nn.Conv2d and nn.Linear and max_pool2d:\n",
+ "# https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv2d\n",
+ "# https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear\n",
+ "# https://pytorch.org/docs/stable/nn.html#torch.nn.functional.max_pool2d\n",
+ "\n",
+ "class LeNet(nn.Module):\n",
+ " def __init__(self, num_classes):\n",
+ " super(LeNet, self).__init__()\n",
+ " ######################### TODO: Build LeNet ########################\n",
+ " # Make the above architecture of LeNet.\n",
+ " # bias=True makes the layer create a bias internally. Nice and clean.\n",
+ " # Conv2d gets args: (num input channs, num out channs, kernel_size ...)\n",
+ " # Linear gets args: (num input neurons, num out neurons, ...)\n",
+ " self.conv1 = nn.Conv2d(1, ???, kernel_size=???, bias=True, padding=0) # Params are initialized internally.\n",
+ " self.conv2 = nn.Conv2d(???, ???, kernel_size=???, bias=True, padding=0)\n",
+ " self.fc1 = nn.Linear(??? * ??? * ???, ???, bias=True) # nn.Linear is a fully connected layer.\n",
+ " self.fc2 = nn.Linear(???, ???, bias=True)\n",
+ " self.fc3 = nn.Linear(???, num_classes, bias=True)\n",
+ " # All parameters of a nn.Module are afterwards accessible by self.parameters()\n",
+ " # Each layer's weights and biases are accessible by eg by self.conv1.weight & self.conv1.bias\n",
+ " \n",
+ " def forward(self, x):\n",
+ " # x: Input tensor (batch of images) of shape [N, Channels, H, W]\n",
+ " # returns: tensor of shape [N, classes]. The class posterior probabilities.\n",
+ " # Make the forward pass.\n",
+ " x = F.max_pool2d(F.relu(self.conv1(x)), kernel_size=2, stride=2, padding=0, ceil_mode=False)\n",
+ " x = F.???(F.relu(self.???(x)), kernel_size=2, stride=2, padding=0, ceil_mode=False)\n",
+ " x = x.reshape(x.shape[0], -1)\n",
+ " x = F.relu(self.???(x))\n",
+ " x = F.???(self.???(x))\n",
+ " x = self.???(x)\n",
+ " y_pred = F.softmax(x, dim=1) # y_pred.shape = [N, 10]\n",
+ " ####################################################################\n",
+ " return y_pred\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 2: Train LeNet classifier"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# (From Tutorial 3)\n",
+ "# Pytorch needs input to conv/pool to be of shape [N, Channels, H, W].\n",
+ "# For grayscale images, there is only 1 image channel. We add this *channels dimension*:\n",
+ "train_imgs_cnn = train_imgs.reshape([train_imgs.shape[0], 1, train_imgs.shape[1], train_imgs.shape[2]])\n",
+ "test_imgs_cnn = test_imgs.reshape([test_imgs.shape[0], 1, test_imgs.shape[1], test_imgs.shape[2]])\n",
+ "\n",
+ "################ TODO: Pad images to fit to LeNet correctly ###################\n",
+ "# MNIST is 28x28 images. LeNet **by construction** requires images of shape 32x32 !\n",
+ "# with 2 voxels before and after H & W dimensions, to make them 32x32!\n",
+ "train_imgs_cnn = np.pad(train_imgs_cnn, ((0,0), (0,0), (2,???), (???,2)), mode='edge')\n",
+ "test_imgs_cnn = np.pad(test_imgs_cnn, ((0,0), (0,0), (???,???), (???,???)), mode='edge')\n",
+ "print(\"train_imgs_cnn.shape should be (60000, 1, 32, 32). It is: \", train_imgs_cnn.shape)\n",
+ "###############################################################################\n",
+ "\n",
+ "# Create the network.\n",
+ "lenet = LeNet(num_classes=C_classes)\n",
+ "\n",
+ "############## NOTE: Pytorch's optimizers (nothing to do here) #################\n",
+ "# Last time we wrote our own function for optimizing weights: w'=w-lr*grads.\n",
+ "# There are more complex ones (RMSProp, Adam, etc)\n",
+ "# Pytorch provides out-of-the-box optimizers that do this, so we dont need to write them.\n",
+ "# Create one, then call optimizer.step() & optimizer.zero_grad(). See SGD function above.\n",
+ "import torch.optim as optim\n",
+ "grad_descent_optimizer = optim.SGD(lenet.parameters(), lr=0.03, momentum=0.0)\n",
+ "################################################################################\n",
+ "\n",
+ "# Start training\n",
+ "SEED = 42\n",
+ "rng = np.random.RandomState(seed=SEED) # Random number generator\n",
+ "gradient_descent(lenet,\n",
+ " cross_entropy,\n",
+ " grad_descent_optimizer,\n",
+ " rng,\n",
+ " train_imgs_cnn,\n",
+ " train_lbls_onehot,\n",
+ " test_imgs_cnn,\n",
+ " test_lbls_onehot,\n",
+ " N_batch_size=80,\n",
+ " total_iters=400,\n",
+ " iters_per_test=20,\n",
+ " is_tumor_segm=False)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If done correctly, accuracy should surpass 90% both on training and testing samples. \n",
+ "You should see a pretty plot at the bottom of the output when process finishes."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 3: Build Fully Convolutional LeNet\n",
+ "\n",
+ "Fully convolutional networks are networks that do not have dense/fully-connected layers. Any network with dense layers has an **exactly equivalent** fully-convolutional architecture. A dense layer can be turned to a convolution with mathematically exact same behaviour. See lecture slides...\n",
+ "\n",
+ "Explanation (check together with the slides):\n",
+ "- Assume **input X** to a dense layer is a matrix of shape **(num_in_channs, height, width)**, which is the output from a previous conv layer that has (num_inp_channs) **feature maps**, each of **dimensions** \\[height, width\\]. \n",
+ "- A dense layer with (num-out) neurons has **weight matrix W** of shape: \\[num-out, num-in-channs * height * width\\]. \n",
+ "- **Each of the output neurons** in a dense layer has (num_inp_channs * height_of_inp * width_of_inp) weights connecting it to the above input. \n",
+ "- Out of these, there are (height*width) weights connecting each output neuron to each input feature map. Each of these weights is different.\n",
+ "- Thus the above (height*width) weights can be seen as a kernel of shape \\[height, width\\].\n",
+ "- The weights connecting each output neuron to all input feature maps can be seen as a convolutional kernel of shape \\[1, num_in_channs, height, width\\].\n",
+ "- The whole weight matrix W can be reorganized as a convolution kernel \\[out_channs, num_in_channs, height, width\\]\n",
+ "- The whole **dense layer with num-out-neurons** can be reorganized as a **convolutional layer with num-out feature-maps**, where **ach output feature-map gives only 1 activation** (is of height=1 and width=1). \n",
+ "- The operation applied by a dense layer, **dot_product(X,W)**, can be now cast as a **convolution(X,W_reorganized)**, with the two being **mathematically equivallent**. \n",
+ "\n",
+ "(Note: Won't be needed here, but in some implementations of conv/cross-correlation, you would need to flip the kernel. Check difference between conv and cross-correlation.)\n",
+ "\n",
+ "Below, we are going to build a Fully Convolutional LeNet, that corresponds exactly to the previous LeNet..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class FCLeNet(nn.Module):\n",
+ " def __init__(self, num_classes):\n",
+ " super(FCLeNet, self).__init__()\n",
+ " ################ TODO: Build a Fully Convolutional LeNet ####################\n",
+ " # Adjust LeNet's code, and replace all nn.Linear layers with appropriate nn.Conv2d\n",
+ " # Conv2d gets args: (num input channs, num out channs, kernel_size ...)\n",
+ " self.conv1 = nn.Conv2d(1, ???, kernel_size=5, bias=True, padding=0)\n",
+ " self.conv2 = nn.Conv2d(???, ???, kernel_size=5, bias=True, padding=0)\n",
+ " self.conv3 = nn.Conv2d(???, 120, kernel_size=???, bias=True, padding=0)\n",
+ " self.conv4 = nn.Conv2d(???, 84, kernel_size=1, bias=True, padding=0)\n",
+ " self.conv5 = nn.Conv2d(???, num_classes, kernel_size=???, bias=True, padding=0)\n",
+ " \n",
+ " def forward(self, x):\n",
+ " # x: Input tensor (batch of images) of shape [N, Channels, H, W]\n",
+ " # returns: tensor of shape [N, classes] if input is of shape 32x32...\n",
+ " # ... or Tensor of shape [N, classes, H_out, W_out if input >= 36x36.\n",
+ " # Make the forward pass.\n",
+ " x = F.???(F.relu(self.conv1(x)), kernel_size=2, stride=2, padding=0, ceil_mode=False)\n",
+ " x = F.???(F.relu(self.???(x)), kernel_size=2, stride=2, padding=0, ceil_mode=False)\n",
+ " x = F.relu(self.???(x))\n",
+ " x = F.???(self.???(x))\n",
+ " x = self.???(x)\n",
+ " \n",
+ " y_pred = F.softmax(x, dim=1) # y_pred.shape = [N, 10, 1, 1]\n",
+ " #############################################################################\n",
+ " \n",
+ " # The output of LeNet was [N, 10] for input x of size 32x32.\n",
+ " # y_pred here will be [N,10,1,1] if input x of size 32x32, ...\n",
+ " # ... Or of shape [N, 10, H_out, W_out] if input larger than 32x32 is given.\n",
+ " # If shape is [N,10,1,1], drop unary dimensions to have same behaviour as LeNet.\n",
+ " if y_pred.shape[2] == 1 and y_pred.shape[3] == 1:\n",
+ " y_pred = y_pred.reshape([y_pred.shape[0], y_pred.shape[1]])\n",
+ " \n",
+ " return y_pred"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We above hopefully created a Fully Convolutional LeNet that we claimed can be mathematically equivalent to a LeNet with dense layers. Is it? Lets check...\n",
+ "\n",
+ "Below, we will **transfer parameters** of the **pre-trained** LeNet (from Task 2) to FCLeNet, after we **reorganize them in convolutional kernels** of appropriate shape.\n",
+ "\n",
+ "Then, we will apply both LeNet and FCLeNet, to check if we get the same result..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def transfer_weights_from_LeNet_to_FCLeNet(lenet, fclenet):\n",
+ " # Docs on conv and linear layers, to see their .weight and .bias attributes:\n",
+ " # https://pytorch.org/docs/stable/_modules/torch/nn/modules/conv.html#Conv2d\n",
+ " # https://pytorch.org/docs/stable/_modules/torch/nn/modules/linear.html#Linear\n",
+ "\n",
+ " \n",
+ " ############# TODO: Complete the below #######################################\n",
+ " # Match which layer of lenet should be transfered to which of FCLeNet.\n",
+ " # Reshape the weights of dense layers of LeNet to the shape needed by the convs.\n",
+ " \n",
+ " # Shape of weights : conv [out channels, in channels, kh, kw] linear: [out ch, in ch]\n",
+ " # Shape of biases: conv & linear have same shape: [out channels]\n",
+ " # NOTE: Shape of conv.weight is switched in comparison to how declared in nn.Conv2d(...) above.\n",
+ " # ... Here, [out_chans, in_chans, H, W]. Above at nn.Conv2d(in_chans, out_chans, H, W)\n",
+ " fclenet.conv1.weight.data = lenet.conv1.weight.data.clone()\n",
+ " fclenet.conv1.bias.data = lenet.conv1.bias.data.clone()\n",
+ " fclenet.conv2.weight.data = lenet.???.weight.data.clone()\n",
+ " fclenet.conv2.bias.data = lenet.???.bias.data.clone()\n",
+ " fclenet.conv3.weight.data = lenet.???.weight.data.clone().reshape([120,16,5,5])\n",
+ " fclenet.conv3.bias.data = lenet.???.bias.data.clone()\n",
+ " fclenet.conv4.weight.data = lenet.???.weight.data.clone().reshape([84,120,1,1])\n",
+ " fclenet.conv4.bias.data = lenet.???.bias.data.clone()\n",
+ " fclenet.conv5.weight.data = lenet.???.weight.data.clone().reshape([-1,84,1,1])\n",
+ " fclenet.conv5.bias.data = lenet.???.bias.data.clone()\n",
+ " ##################################################################################\n",
+ " \n",
+ "# Initialize an FCLeNet, and then transfer the weights from pre-trained LeNet to FCLeNet.\n",
+ "fclenet = FCLeNet(num_classes=C_classes)\n",
+ "transfer_weights_from_LeNet_to_FCLeNet(lenet, fclenet)\n",
+ "\n",
+ "# Test with LeNet\n",
+ "y_pred_lenet = lenet.forward(torch.tensor(test_imgs_cnn, dtype=torch.float))\n",
+ "lbls_pred_lenet = np.argmax(y_pred_lenet.detach().numpy(), axis=1)\n",
+ "acc_lenet = compute_accuracy(lbls_pred_lenet, test_lbls)\n",
+ "# Test with FCLeNet\n",
+ "y_pred_fclenet = fclenet.forward(torch.tensor(test_imgs_cnn, dtype=torch.float))\n",
+ "lbls_pred_fclenet = np.argmax(y_pred_fclenet.detach().numpy(), axis=1)\n",
+ "acc_fclenet = compute_accuracy(lbls_pred_fclenet, test_lbls)\n",
+ "\n",
+ "print(\"Accuracy of LeNet {0:.2f}\".format(acc_lenet), \" and FCLeNet {0:.2f}\".format(acc_fclenet))\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If done correctly, you should get **exactly** the same result from LeNet and FCLeNet. \n",
+ "It should also be the same as the final test-accuracy when training LeNet above."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 4: Beyond classification with Fully Conv. Nets\n",
+ "\n",
+ "If we convert a fully connected layer to a conv layer appropriately, we get the same output. What's useful about it then?\n",
+ "\n",
+ "*As all operations are convolutional, the network can receive input larger than its receptive field.* \n",
+ "\n",
+ "LeNet had to receive 32x32 size of input by construction, and would return 1 output.\n",
+ "We say that it has a **receptive field** of size 32x32 by construction. *Size of receptive field* is how many pixels the net processes for its output neurons to give one output activation (prediction). The size of the receptive field is defined by the size of the kernels, their strides, and the numbers of layers.\n",
+ "\n",
+ "FCLeNet has exactly the same size of receptive field with LeNet, since as we saw the two architectures are equivalent. It processes 32x32 pixels via convs and pools, and gives one output. But... \n",
+ "\n",
+ "... As whole FCLeNet is made of kernels convolving an output (pooling is the same), nothing stops it from receiving a larger input. If given an input larger than 32x32, kernels at each layer will be applied as normally, they will simply output more activations at the output feature maps. All feature maps will expand, *including the output feature maps*, that used to be 1x1 for 32x32 input, which will give more than one prediction.\n",
+ "\n",
+ "Lets see this in practice..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Choose an index of an image to test on...\n",
+ "index_of_img = 4\n",
+ "image_for_cnn = test_imgs_cnn[ index_of_img : index_of_img+1, :, :, : ] # [1, 1, 32, 32]\n",
+ "real_lbl = test_lbls[index_of_img]\n",
+ "plot_image(image_for_cnn[0,0,:,:], cmap=\"gray\")\n",
+ "\n",
+ "######### TODO: Run inference on the ORIGINAL 32x32 image. ###################\n",
+ "# Do a forward pass with fclenet\n",
+ "pred_probs = fclenet.???( torch.tensor(image_for_cnn, dtype=torch.float) ) # Outp shape: [1, 10]\n",
+ "pred_probs = pred_probs.detach().numpy() # make tensor numpy\n",
+ "pred_lbls = np.argmax(pred_probs, axis=???) # Shape: [1]\n",
+ "#############################################################################\n",
+ "print(\"For input of shape:\", image_for_cnn.shape, \" FCLeNet gave output of shape \", pred_probs.shape)\n",
+ "print(\"Predicted class posterior probabilities:\", pred_probs)\n",
+ "print(\"Predicted class (argmax) is:\", pred_lbls)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Now make a larger image to apply FCLeNet on...\n",
+ "# Pad the 32x32 image in the H and W dimension enough to become 64 x 64\n",
+ "image_for_cnn_padded = np.pad(image_for_cnn, ((0,0),(0,0),(5,27),(27,5)), mode='edge')\n",
+ "print(\"image_for_cnn_padded.shape :\", image_for_cnn_padded.shape)\n",
+ "plot_image(image_for_cnn_padded[0,0,:,:])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "######### TODO: Run inference with FCLeNet on the padded 64 x 64 image. ###############\n",
+ "pred_probs_padded = fclenet.???( torch.tensor(image_for_cnn_padded, dtype=torch.float) ) # shape: [1, 10, H_out, W_out]\n",
+ "pred_probs_padded = pred_probs_padded.detach().numpy() # make tensor numpy\n",
+ "pred_lbls_padded = np.argmax(pred_probs_padded, axis=1) # Shape: [1, H_out, W_out]\n",
+ "########################################################################################\n",
+ "print(\"For input of shape:\", image_for_cnn_padded.shape, \" FCLeNet gave output of shape:\", pred_probs_padded.shape)\n",
+ "print(\"Plotting output of Channel #\", real_lbl, \"(real class) of the last conv layer (classification layer):\")\n",
+ "plot_image(pred_probs_padded[0,real_lbl,:,:], cmap=\"bwr\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For a 64 x 64 input, FCLeNet.forward() should output a \\[1,10,9,9\\] matrix. For each of the 10 classes, it outputs a *probability map* of size \\[9,9\\] (the feature map output by the last conv). Above you should see plotted the probability map for the *real* class of the digit in the input.\n",
+ "\n",
+ "**Q: Why 9x9 pixels in the output? (It has to do with the strides of the pooling layers and the size of input.)** \n",
+ "**A: In the lecture slides...**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 5: Upsample the output to the original resolution\n",
+ "\n",
+ "Comparing the output with the padded 64x64 image, it looks as if the network **localizes** the digit in the image. But the output is in low resolution. Why? Because of the x4 downsampling done inside the network by the 2 pooling layers (each downsamples x2).\n",
+ "\n",
+ "Lets **upsample** it back to original resolution with **interpolation**. We will do this following the method discussed in the lecture slides.\n",
+ "- **Repeat elements x4** in the spatial dimensions (H,W). \n",
+ "- Then **convolve with a uniform kernel** in the spatial dimensions (H,W)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from scipy import signal\n",
+ "\n",
+ "############# TODO: Repeat elements x4 along the two spatial dimensions #####################\n",
+ "pred_probs_padded_ups = np.repeat(pred_probs_padded, repeats=4, axis=???) # [N,Class,H,W]\n",
+ "pred_probs_padded_ups = np.repeat(pred_probs_padded_ups, repeats=???, axis=3)\n",
+ "print(\"Plotting image after repetition....................\")\n",
+ "plot_image(pred_probs_padded_ups[0,real_lbl,:,:], cmap=\"bwr\")\n",
+ "print(\"Upsampled image after repetition has shape:\", pred_probs_padded_ups.shape) # (1, 10, 33, 33)\n",
+ "############################################################################################\n",
+ "# Convolve with uniform kernel:\n",
+ "kernel = np.ones([1,1,4,4])/16.\n",
+ "pred_probs_padded_ups = signal.convolve(pred_probs_padded_ups, kernel, mode='valid')\n",
+ "print(\"Plotting image after convolution...................\")\n",
+ "plot_image(pred_probs_padded_ups[0,4], cmap=\"bwr\")\n",
+ "\n",
+ "print(\"Upsampled image after convolution has shape:\", pred_probs_padded_ups.shape) # (1, 10, 33, 33)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If done right, you will notice it looks like a very good localisation of the digit.\n",
+ "\n",
+ "Notice the shape of the upsampled image. It should be 33x33. \n",
+ "- The initial 9x9 voxels were repeated x4, up to 36x36. \n",
+ "- The convolution with mode='valid' requires that the kernel is fully contained in the image for a response. \n",
+ "- Thus, convolving a 36x36 image with a 4x4 kernel gives 33x33 responses. \n",
+ "- The whole process can be seen as if we interpolated 3 voxels in between each of the original 9x9 voxels.\n",
+ "\n",
+ "*Each 'valid' convolution with kernel of size k reduces the shape of the input by k-1.* \n",
+ "**Why do we care?** Because this is what happens within the neural network as well with every conv/pool. \n",
+ "\n",
+ "**Eventhough we upsampled, the output is not same size as the original input, 64x64. Why?** \n",
+ "Exactly because of the above behaviour of the **'valid' convolutions within the net (and pools with ceiling=True)**. \n",
+ "With every conv or pool with kernel of size k, the feature maps get reduced in size by (k-1). **Except** if we use **padding=True** in the net, though this may cause problems for segmentation (cf. lecture slides). \n",
+ "Padding was quite irrelevant for classification nets, but becomes important for segmentation.\n",
+ "\n",
+ "**Q: The output (33x33) is 31 pixels smaller in each spatial dimension in comparison to the input (64x64). Can you come up with an equation that computes how much is the output's size, as a function of the input's size and the size of the receptive field of the network?** \n",
+ "**A: ???**\n",
+ "\n",
+ "Note: Very good material is in Theano's documentation (discontinued, original DL library): http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Task 6: Segmentation of Brain Tumors with FCLeNet\n",
+ "\n",
+ "Recap of what we have seen:\n",
+ "- Fully convolutional nets can be used to **localise** something in an image.\n",
+ "- Their **output** is of **lower resolution** due to conv/pool with *strides*.\n",
+ "- **Output is smaller** than the input image, shrinking around the borders with every **'valid' conv/pool**.\n",
+ "\n",
+ "In this last, most important task, we will combine all we have learned:\n",
+ "- We will use FCLeNet for **segmentation** of an image.\n",
+ "- We will extend FCLeNet for segmentation, to **learn to upsample within the net**.\n",
+ "- We will ensure to segment the whole image, by **padding the input** as required.\n",
+ "\n",
+ "For this, we will use a new database. We will segment brain tumors, from 2D slices of brain MRIs. \n",
+ "\n",
+ "**Note: This is related to the task you will get for the coursework, so invest time to understand it.**\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Preliminary: Load data & make one-hot ground truth labels\n",
+ "\n",
+ "The below will load brain images and segmentation labels for training and testing. \n",
+ "It is exactly the same as the steps we were taking for MNIST, except that we don't normalize intentisities. **They are already loaded normalized**. \n",
+ "\n",
+ "The only **important difference** are the ground truth labels for segmentation..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import imageio\n",
+ "\n",
+ "\n",
+ "def get_tumor_data(data_dir, train):\n",
+ " if train:\n",
+ " paths_to_imgs =[data_dir + \"image_0.png\",\n",
+ " data_dir + \"image_1.png\"]\n",
+ " paths_to_gts = [data_dir + \"seg_0.png\",\n",
+ " data_dir + \"seg_1.png\"]\n",
+ " else: # test\n",
+ " paths_to_imgs =[data_dir + \"image_2.png\"]\n",
+ " paths_to_gts = [data_dir + \"seg_2.png\"]\n",
+ "\n",
+ " assert len(paths_to_imgs) == len(paths_to_gts)\n",
+ " data_x = None\n",
+ " data_y = None\n",
+ " for i in range(len(paths_to_imgs)):\n",
+ " img = imageio.imread(paths_to_imgs[i]) # [H, W]\n",
+ " img = np.reshape(img, newshape=[1]+list(img.shape)) # [1, H, W]\n",
+ " gt = imageio.imread(paths_to_gts[i]) # [H, W]\n",
+ " gt = np.reshape(gt, newshape=[1]+list(gt.shape)) # [1, H, W]\n",
+ " if data_x is None:\n",
+ " data_x = img.copy()\n",
+ " data_y = gt.copy()\n",
+ " else:\n",
+ " data_x = np.concatenate((data_x, img), axis=0) # [N, H, W]\n",
+ " data_y = np.concatenate((data_y, gt), axis=0) # [N, H, W]\n",
+ " \n",
+ " data_x = (data_x - np.mean(data_x, axis=(1,2), keepdims=True))/np.std(data_x, axis=(1,2), keepdims=True)\n",
+ " data_y[data_y>0] = 1\n",
+ " return data_x, data_y\n",
+ "\n",
+ "\n",
+ "# Load the data. NOTE: Their intensities are already normalized.\n",
+ "train_imgs, train_lbls = get_tumor_data(data_dir=data_brain_dir, train=True)\n",
+ "test_imgs, test_lbls = get_tumor_data(data_dir=data_brain_dir, train=False)\n",
+ "\n",
+ "print(\"[train_imgs] Type: \", type(train_imgs), \"|| Shape:\", train_imgs.shape, \"|| Data type: \", train_imgs.dtype )\n",
+ "print(\"[train_lbls] Type: \", type(train_lbls), \"|| Shape:\", train_lbls.shape, \"|| Data type: \", train_lbls.dtype )\n",
+ "print('[train] Intensities in images [min,max]:', np.min(train_imgs), np.max(train_imgs))\n",
+ "print('[train] Values in class labels:', np.unique(train_lbls))\n",
+ "print(\"[test_imgs] Type: \", type(test_imgs), \"|| Shape:\", test_imgs.shape, \"|| Data type: \", test_imgs.dtype )\n",
+ "print(\"[test_lbls] Type: \", type(test_lbls), \"|| Shape:\", test_lbls.shape, \"|| Data type: \", test_lbls.dtype )\n",
+ "print('[test] Intensities in images [min,max]:', np.min(test_imgs), np.max(test_imgs))\n",
+ "print('[test] Values in class labels:', np.unique(test_lbls))\n",
+ "\n",
+ "C_classes = len(np.unique(train_lbls))\n",
+ "\n",
+ "# Change representation of labels to one-hot vectors of length C=2.\n",
+ "def make_NHW_lbls_onehot(lbls, num_classes):\n",
+ " # lbls: np.array [N, H, W], ints\n",
+ " # returns: np.array [N, H, W, C], 0 or 1\n",
+ " onehot_shape = (lbls.shape[0], num_classes, lbls.shape[1], lbls.shape[2])\n",
+ " lbls_onehot = np.zeros(shape=onehot_shape)\n",
+ " for n in range(lbls.shape[0]):\n",
+ " for x in range(lbls.shape[1]):\n",
+ " for y in range(lbls.shape[2]):\n",
+ " for c in range(num_classes):\n",
+ " lbls_onehot[n,c,x,y] = 1 if lbls[n,x,y] == c else 0\n",
+ " return lbls_onehot\n",
+ "\n",
+ "train_lbls_onehot = make_NHW_lbls_onehot(train_lbls, C_classes)\n",
+ "test_lbls_onehot = make_NHW_lbls_onehot(test_lbls, C_classes)\n",
+ "print(\"[train_lbls] Type: \", type(train_lbls), \"|| Shape:\", train_lbls.shape, \" || Data type: \", train_lbls.dtype )\n",
+ "print(\"[train_lbls_onehot] Type: \", type(train_lbls_onehot), \"|| Shape:\", train_lbls_onehot.shape, \" || Data type: \", train_lbls_onehot.dtype )\n",
+ "print(\"[test_lbls] Type: \", type(test_lbls), \"|| Shape:\", test_lbls.shape, \" || Data type: \", test_lbls.dtype )\n",
+ "print(\"[test_lbls_onehot] Type: \", type(test_lbls_onehot), \"|| Shape:\", test_lbls_onehot.shape, \" || Data type: \", test_lbls_onehot.dtype )\n",
+ "\n",
+ "# Plot the test image...\n",
+ "print(\"Plotting the test image and its ground truth segmentation label.\")\n",
+ "plot_image(test_imgs[0,:,:])\n",
+ "plot_image(test_lbls[0,:,:])\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Observe in the output of the above:\n",
+ "- We have 2 training images, each of shape 169 x 129. (train_imgs.shape = (2, 169, 129)) \n",
+ "- We have 1 testing image, also of shape 169 x 129. (test_imgs.shape = (1, 169, 129)) \n",
+ "\n",
+ "Each has an associated ground truth **segmentation mask**:\n",
+ "- Each segmentation mask is also of shape 169 x 129. **Aligned** with corresponding image.\n",
+ "- Each segmentation mask is **binary**. A pixel has value 0 if it is **backround**, 1 if it is **tumor**.\n",
+ "- We have 2 segmentation masks for training (train_lbls.shape=(2,169,129))\n",
+ "- We have 1 segmentation mask for testing (test_lbls.shape=(2,169,129))\n",
+ "- The one-hot segmentation masks are respectively of shape (2, 2, 169, 129) and (1, 2, 169, 129).\n",
+ "\n",
+ "For example: \n",
+ "train_lbls_onehot\\[0,1,50,60\\]==1 if in the 1st training image, the pixel at position \\[50,60\\] is a tumor.\n",
+ "\n",
+ "**The segmentation masks can be perceived as if they provide classification labels per pixel.** \n",
+ "Subsequently, we are going to train a **CNN for segmentation**. \n",
+ "It can be perceived as if we train it to do **pixel-wise classification**..."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Build FCLeNet with learnt upsampling for segmentation\n",
+ "\n",
+ "For segmentation, we need fully-convolutional networks, but with output that is at the same resolution as their input. So that they give us a predicted class posterior probability for each pixel.\n",
+ "\n",
+ "Below, we will extend FCLeNet by performing upsampling x4 within the network. \n",
+ "This is to reverse the 2x downsampling made by the 2 pooling layers. \n",
+ "We will do this similarly to how we upsampled the output of FCLeNet in task 6: \n",
+ "- Repeat by x4 in each spatial dimension the feature maps of the last hidden layer.\n",
+ "- Convolve with a kernel of \\[height,width\\]=\\[4,4\\]\n",
+ "- We are going to *learn* the kernel for the convolution of the upsampling module, instead for a fixed one.\n",
+ "\n",
+ "**Note: The above procedure is similar to what transpose-convolution internally implements (often called deconvolution in DL literature).**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Helper function\n",
+ "def torch_repeat(in_tens, repeats, axis):\n",
+ " # Repeat FM in the 2 last dimensions, to upsample back to the normal resolution space.\n",
+ " # in_tens: [batch size, num of FMs, H, W]. Input/output of conv layers.\n",
+ " # repeat_per_dim: [repetitions axis H, repetitions axis W]\n",
+ " # returns: Tensor of size: [batch size, num of FMs, H*repeat[0], W*repeat[1] ]\n",
+ " \n",
+ " # NOTE: Exactly the same API and behaviour as numpy.repeat(...)\n",
+ " # Because torch.repeat is similar to numpy.tile(...), not what we want.\n",
+ " # See: https://stackoverflow.com/questions/35227224/torch-repeat-tensor-like-numpy-repeat\n",
+ " tens = in_tens\n",
+ " assert axis>0\n",
+ " # Combine dimension to previous\n",
+ " shape_flat = list(in_tens.shape)\n",
+ " shape_flat[axis-1] = in_tens.shape[axis-1]*in_tens.shape[axis]\n",
+ " shape_flat[axis] = 1\n",
+ " tens_flat = tens.reshape( shape_flat )\n",
+ " # Tile\n",
+ " repeat_per_dim = [1]*len(in_tens.shape)\n",
+ " repeat_per_dim[axis] = repeats\n",
+ " tens_rep_flat = tens_flat.repeat( repeat_per_dim ) # This is what numpy.tile(...) does.\n",
+ " # Reshape to what it should be.\n",
+ " shape_result = list(in_tens.shape)\n",
+ " shape_result[axis] = in_tens.shape[axis]*repeats\n",
+ " tens_rep = tens_rep_flat.reshape( shape_result )\n",
+ " return tens_rep"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class FCLeNet_Seg(nn.Module):\n",
+ " def __init__(self, num_classes):\n",
+ " super(FCLeNet_Seg, self).__init__()\n",
+ " ############## TODO: Complete the implementation ##################################\n",
+ " # Create exactly the same implementation as FCLeNet, but with one added layer:\n",
+ " # conv(5x5)->pool(2x2)->conv(5x5)->pool(2x2)->conv(5x5)->conv(1x1)->upsample->conv(1x1)\n",
+ " self.conv1 = nn.Conv2d(1, ???, kernel_size=5, bias=True, padding=0)\n",
+ " self.conv2 = nn.Conv2d(???, 16, kernel_size=???, bias=True, padding=0)\n",
+ " self.conv3 = nn.Conv2d(???, ???, kernel_size=???, bias=True, padding=0)\n",
+ " self.conv4 = nn.Conv2d(120, ???, kernel_size=1, bias=True, padding=0)\n",
+ " self.conv_ups = nn.Conv2d(84, 84, kernel_size=4, bias=False, padding=0) # Kernel for the upsampling step.\n",
+ " self.conv5 = nn.Conv2d(???, num_classes, kernel_size=???, bias=True, padding=0)\n",
+ "\n",
+ " def forward(self, x):\n",
+ " x = F.max_pool2d(F.relu(self.conv1(x)), kernel_size=2, stride=2, padding=0, ceil_mode=False)\n",
+ " x = F.max_pool2d(F.relu(self.conv2(x)), kernel_size=2, stride=2, padding=0, ceil_mode=False)\n",
+ " x = F.relu(self.conv3(x))\n",
+ " x = F.relu(self.conv4(x))\n",
+ " x = torch_repeat(x, repeats=4, axis=???) # Repeat 1 of upsampling step, ala Task 6.\n",
+ " x = torch_repeat(x, repeats=???, axis=3) # Repeat 2 of upsampling step, ala Task 6.\n",
+ " x = self.???(x) \n",
+ " x = self.conv5(x)\n",
+ " # Note: The repeats and conv_ups can be replaced with a single torch.nn.ConvTranspose2d ...\n",
+ " # ... but we prefer looking under the hood.\n",
+ " ###################################################################################\n",
+ " \n",
+ " y_pred = F.softmax(x, dim=1) # y_pred.shape = [N, Classes, H_out, W_out]\n",
+ " return y_pred\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Cross entropy for segmentation\n",
+ "\n",
+ "We below provide the cross-entropy loss that will be used in the case of segmentation. \n",
+ "It is *almost* the same as the one used for classification (preliminaries of this tutorial and Tutorial 3). \n",
+ "\n",
+ "As we previously mentioned, a segmentation network can be seen as if it is trained for pixel-wise classification. \n",
+ "As such, it can be perceived as if **in the loss, each pixel is treated as another sample**. \n",
+ "Thus the final torch.mean(...) is taken over the batch samples (images N) and each of the H_out*W_out pixels. \n",
+ "Compare with classification cross entropy for clarity (beginning of tutorial).\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def cross_entropy_segmentation(y_pred, y_real, eps=1e-7):\n",
+ " # Cross entropy for segmentation.\n",
+ " # y_pred: Tensor of shape [N, Classes, H_out, W_out]. Predicted class-posterior probabilities from forward().\n",
+ " # y_real: One-hot representation of segmentation masks. Same shape as y_pred.\n",
+ " \n",
+ " x_entr_per_sample = - torch.sum( y_real * torch.log(y_pred + eps), dim=1) # Sum over classes.\n",
+ " # x_entr_per_sample is now of shape [N, H_out, W_out]\n",
+ " loss = torch.mean(x_entr_per_sample, dim=(0,1,2)) # Mean over batch & over spatial positions.\n",
+ " \n",
+ " return loss"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Train FCLeNet for segmentation\n",
+ "\n",
+ "Finally, we will train the network.\n",
+ "\n",
+ "But don't forget. Our network's outputs are 31 pixels less in each spatial dimension (H,W) than the input. \n",
+ "So, to get a segmentation for the whole input, we will first **pad input with 31 pixels along dimensions (H,W)**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from IPython import display\n",
+ "# Pytorch needs input to conv/pool to be of shape [N, Channels, H, W].\n",
+ "# For grayscale images, there is only 1 image channel. We add this *channels dimension*:\n",
+ "IMAGE_CHANNELS = 1\n",
+ "train_imgs_cnn = train_imgs.reshape([train_imgs.shape[0], IMAGE_CHANNELS, train_imgs.shape[1], train_imgs.shape[2]])\n",
+ "test_imgs_cnn = test_imgs.reshape([test_imgs.shape[0], IMAGE_CHANNELS, test_imgs.shape[1], test_imgs.shape[2]])\n",
+ "\n",
+ "######## TODO: Fill the blanks (????) ###########################################################\n",
+ "# Pad 31 voxels along each of the spatial dimensions of the input.\n",
+ "# ... So that the final output is of same size as original input.\n",
+ "# Preferably (roughly) half padding before, half after.\n",
+ "train_imgs_cnn = np.pad(train_imgs_cnn, ((0,0), (0,0), (15,???), (???,16)), mode='edge')\n",
+ "test_imgs_cnn = np.pad(test_imgs_cnn, ((0,0), (0,0), (???,???), (???,???)), mode='edge')\n",
+ "print(\"train_imgs_cnn.shape=\", train_imgs_cnn.shape)\n",
+ "print(\"train_lbls.shape=\", train_lbls.shape)\n",
+ "##################################################################################################\n",
+ "\n",
+ "# Create the network\n",
+ "fclenet_seg = FCLeNet_Seg(num_classes=C_classes)\n",
+ "\n",
+ "import torch.optim as optim\n",
+ "grad_descent_optimizer = optim.SGD(fclenet_seg.parameters(), lr=0.05, momentum=0.0)\n",
+ "\n",
+ "# Start training\n",
+ "rng = np.random.RandomState(seed=SEED)\n",
+ "gradient_descent(fclenet_seg,\n",
+ " cross_entropy_segmentation,\n",
+ " grad_descent_optimizer,\n",
+ " rng,\n",
+ " train_imgs_cnn,\n",
+ " train_lbls_onehot,\n",
+ " test_imgs_cnn,\n",
+ " test_lbls_onehot,\n",
+ " N_batch_size=2, \n",
+ " total_iters=300,\n",
+ " iters_per_test=10,\n",
+ " is_tumor_segm=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If the whole task was performed correctly, you should see the network learning. \n",
+ "The probability map for the tumor class will be plotted every few iterations, for the test image. \n",
+ "If the task was performed correctly, you will see the tumor bright-up after ~200 iterations."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## This notebook:\n",
+ "Copyright 2020, Imperial College London \n",
+ "Author: Konstantinos Kamnitsas (konstantinos.kamnitsas12@imperial.ac.uk)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/05 - Clustering-and-PCA.ipynb b/05 - Clustering-and-PCA.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..700e5897b720080206dd22b52007a445327ae953
--- /dev/null
+++ b/05 - Clustering-and-PCA.ipynb
@@ -0,0 +1,1029 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Tutorial 5: Clustering & PCA"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on Colab"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! pip install SimpleITK==1.2.2 \n",
+ "! wget https://www.doc.ic.ac.uk/~bglocker/teaching/notebooks/unsupervised-data.zip\n",
+ "! unzip unsupervised-data.zip\n",
+ "\n",
+ "# data directory\n",
+ "data_dir = 'data/unsupervised/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Running on DoC lab machines"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# data directory\n",
+ "data_dir = '/vol/lab/course/416/data/unsupervised/'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "print(os.listdir(data_dir))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up the image viewer"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import SimpleITK as sitk\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "from ipywidgets import interact, fixed\n",
+ "from IPython.display import display\n",
+ "\n",
+ "%matplotlib inline\n",
+ "\n",
+ "# Calculate parameters low and high from window and level\n",
+ "def wl_to_lh(window, level):\n",
+ " low = level - window/2\n",
+ " high = level + window/2\n",
+ " return low,high\n",
+ "\n",
+ "def display_image(img, x=None, y=None, z=None, window=None, level=None):\n",
+ " # Convert SimpleITK image to NumPy array\n",
+ " img_array = sitk.GetArrayFromImage(img)\n",
+ " \n",
+ " # Get image dimensions in millimetres\n",
+ " size = img.GetSize()\n",
+ " spacing = img.GetSpacing()\n",
+ " width = size[0] * spacing[0]\n",
+ " height = size[1] * spacing[1]\n",
+ " depth = size[2] * spacing[2]\n",
+ " \n",
+ " if x is None:\n",
+ " x = np.floor(size[0]/2).astype(int)\n",
+ " if y is None:\n",
+ " y = np.floor(size[1]/2).astype(int)\n",
+ " if z is None:\n",
+ " z = np.floor(size[2]/2).astype(int)\n",
+ " \n",
+ " if window is None:\n",
+ " window = np.max(img_array) - np.min(img_array)\n",
+ " \n",
+ " if level is None:\n",
+ " level = window / 2 + np.min(img_array)\n",
+ " \n",
+ " low,high = wl_to_lh(window,level)\n",
+ "\n",
+ " # Display the orthogonal slices\n",
+ " fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))\n",
+ "\n",
+ " ax1.imshow(img_array[z,:,:], cmap='gray', clim=(low, high), extent=(0, width, height, 0))\n",
+ " ax2.imshow(img_array[:,y,:], origin='lower', cmap='gray', clim=(low, high), extent=(0, width, 0, depth))\n",
+ " ax3.imshow(img_array[:,:,x], origin='lower', cmap='gray', clim=(low, high), extent=(0, height, 0, depth))\n",
+ "\n",
+ " # Additionally display crosshairs\n",
+ " ax1.axhline(y * spacing[1], lw=1)\n",
+ " ax1.axvline(x * spacing[0], lw=1)\n",
+ " \n",
+ " ax2.axhline(z * spacing[2], lw=1)\n",
+ " ax2.axvline(x * spacing[0], lw=1)\n",
+ " \n",
+ " ax3.axhline(z * spacing[2], lw=1)\n",
+ " ax3.axvline(y * spacing[1], lw=1)\n",
+ "\n",
+ " plt.show()\n",
+ " \n",
+ "def interactive_view(img):\n",
+ " size = img.GetSize() \n",
+ " img_array = sitk.GetArrayFromImage(img)\n",
+ " interact(display_image,img=fixed(img),\n",
+ " x=(0, size[0] - 1),\n",
+ " y=(0, size[1] - 1),\n",
+ " z=(0, size[2] - 1),\n",
+ " window=(0,np.max(img_array) - np.min(img_array)),\n",
+ " level=(np.min(img_array),np.max(img_array)));"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Image segmentation via clustering\n",
+ "\n",
+ "We can segment an image into consistent regions using unsupervised clustering techniques. This is different to semantic segmentation, as in clustering a segmented region does not have necessarily have a semantic meaning. However, clustering can still be useful as sometimes it is relatively easy to infer which region corresponds to which anatomical structure, as we will see.\n",
+ "\n",
+ "Let us start by loading an MRI brain scan `\"mri-brain.nii.gz\"`. For this scan, we already obtained a brain mask using a neuroimaging toolkit which allows us to mask out non-brain structures."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img = sitk.ReadImage(data_dir + 'mri-brain.nii.gz')\n",
+ "msk = sitk.ReadImage(data_dir + 'mri-brain-mask.nii.gz')\n",
+ "print('MR image')\n",
+ "display_image(img, window=400, level=200)\n",
+ "print('Brain mask')\n",
+ "display_image(msk)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We use the brain mask to mask out non-brain regions by setting them to zero"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_array = sitk.GetArrayFromImage(img)\n",
+ "msk_array = sitk.GetArrayFromImage(msk)\n",
+ "\n",
+ "masked_array = img_array\n",
+ "masked_array[msk_array==0] = 0\n",
+ "\n",
+ "img_masked = sitk.GetImageFromArray(masked_array)\n",
+ "img_masked.CopyInformation(img)\n",
+ "\n",
+ "print('Masked image')\n",
+ "display_image(img_masked, window=400, level=200)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Data preparation for clustering\n",
+ "\n",
+ "We want to run different clustering methods on the masked brain data. As the clustering techniques we will be using don't take any spatial information into account, we can simply flatten the 3D imaging data into one big 1D vector $X$.\n",
+ "\n",
+ "Let's visualise the data by plotting a normalised histogram."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Take all non-zero voxels and flatten the data into a 1D numpy array\n",
+ "X = img_array[msk_array > 0].flatten().reshape(-1, 1)\n",
+ "\n",
+ "# Get the number of points\n",
+ "num_pts = len(X.flatten())\n",
+ "\n",
+ "# Extract the minimum and maximum intensity values and calculate the number of bins for the histogram\n",
+ "lim_low = np.min(X).astype(np.int)\n",
+ "lim_high = np.max(X).astype(np.int)\n",
+ "num_bins = (lim_high - lim_low + 1)\n",
+ "\n",
+ "plt.figure(figsize=(10, 4), dpi=100)\n",
+ "plt.hist(X, bins=num_bins, density=True, range=(lim_low, lim_high), color='lightgray');\n",
+ "plt.xlim([0,350]); # we limit the x-axis to the range of interest\n",
+ "plt.show()\n",
+ "\n",
+ "print('Number of points ' + str(num_pts))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Random subsampling\n",
+ "\n",
+ "We have about 1.4 million data points, which can make clustering computationally inefficient. A common strategy in these cases is to use a random subset of the original data.\n",
+ "\n",
+ "For example, we can randomly subsample the data with a given percentage, say 5%. The histogram is a bit noisier, but preserves the overall shape."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sampling = 0.05\n",
+ "X_subset = np.random.choice(X.flatten(),int(num_pts*sampling)).reshape(-1, 1)\n",
+ "\n",
+ "plt.figure(figsize=(10, 4), dpi=100)\n",
+ "plt.hist(X_subset, bins=num_bins, density=True, range=(lim_low, lim_high), color='lightgray');\n",
+ "plt.xlim([0,350]);\n",
+ "plt.show()\n",
+ "\n",
+ "print('Number of points ' + str(len(X_subset)))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### How many clusters do we want?\n",
+ "\n",
+ "That's the key question when applying clustering techniques. Sometimes we have a good idea about the number of clusters that we would like our data to be partitioned into. Sometimes we don't, and finding out the number of clusters is part of data exploration in unsupervised learning.\n",
+ "\n",
+ "For now, let's assume we know that our brain consists of mostly three tissue types, grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF). So we set the number of clusters to three."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_clusters = 3"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### K-means\n",
+ "\n",
+ "Let's start with one of the most popular and well known clustering techniques, k-means (as discussed in the lecture). `scikit-learn` provides us with an efficient implementation.\n",
+ "\n",
+ "**Task:** Go through the steps below, and try to understand what's happening in each step. You might want to check out the scikit-learn's [overview of clustering](http://scikit-learn.org/stable/modules/clustering.html) and the [k-means documentation](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sklearn.cluster as cluster\n",
+ "\n",
+ "# Create a k-means instance\n",
+ "kmeans = cluster.KMeans(n_clusters=num_clusters)\n",
+ "\n",
+ "# Running k-means via the fit function\n",
+ "kmeans.fit(X_subset)\n",
+ "\n",
+ "# Produce the clustering result for all image points\n",
+ "y = kmeans.predict(img_array.flatten().reshape(-1, 1))\n",
+ "\n",
+ "# K-means will produce labels between 0 and (k-1), we want 0 to be background, so we shift the labels by one\n",
+ "y = y + 1 # shift labels\n",
+ "y[(msk_array == 0).flatten()] = 0 # zero background\n",
+ "\n",
+ "# Construct a 3D label map\n",
+ "lab_array = y.reshape(img_array.shape).astype('uint8')\n",
+ "seg_kmeans = sitk.GetImageFromArray(lab_array)\n",
+ "seg_kmeans.CopyInformation(img)\n",
+ "\n",
+ "# Display the results using SimpleITK mapping of label maps to colours\n",
+ "display_image(sitk.LabelToRGB(seg_kmeans))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Gaussian Mixture Models\n",
+ "\n",
+ "There are many other clustering tecniques we can apply, and `scikit-learn` makes it very easy to swap between methods as they have a common interface based on `fit` and `predict` functions.\n",
+ "\n",
+ "**Task:** Try out the [GaussianMixture](http://scikit-learn.org/stable/modules/mixture.html#mixture) model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "scrolled": true
+ },
+ "outputs": [],
+ "source": [
+ "import sklearn.mixture as mixture\n",
+ "\n",
+ "#INSERT CODE HERE\n",
+ "gmm = "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Plotting the GMM result\n",
+ "\n",
+ "The cell below will plot the GMM on top of the image histogram"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.mlab as mlab\n",
+ "\n",
+ "def plot_gmm(x, gmm):\n",
+ " omega = gmm.weights_\n",
+ " mu = gmm.means_\n",
+ " sigma = np.sqrt(gmm.covariances_)\n",
+ " for ind in range(0,omega.shape[0]): \n",
+ " plt.plot(x,omega[ind]*mlab.normpdf(x, mu[ind], sigma[ind]), linewidth=2, label='GMM Component '+str(ind))\n",
+ "\n",
+ "plt.figure(figsize=(10, 4), dpi=100)\n",
+ "plt.hist(X, bins=num_bins, density=True, range=(lim_low, lim_high), label='Intensity histogram', color='lightgray');\n",
+ "x = np.linspace(lim_low,lim_high,num_bins).reshape(-1,1)\n",
+ "plot_gmm(x,gmm)\n",
+ "plt.plot(x,np.exp(gmm.score_samples(x)), linewidth=2, color='k', label='Gaussian Mixture Model')\n",
+ "plt.xlim([0,350])\n",
+ "plt.legend(loc=0, shadow=True, fontsize=12)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Other clustering methods\n",
+ "\n",
+ "**Optional Task:** Try out other [clustering methods](http://scikit-learn.org/stable/modules/clustering.html#clustering). For example, you can try out [BayesianGaussianMixture](http://scikit-learn.org/stable/modules/mixture.html#variational-bayesian-gaussian-mixture). You can also re-use the GMM plotting code from above to plot the results of the Bayesian GMM fitting."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Evaluation\n",
+ "\n",
+ "For the MRI brain scan, we also have some reference segmentation of GM, WM, and CSF."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ref = sitk.ReadImage(data_dir + 'mri-brain-tissues.nii.gz')\n",
+ "display_image(sitk.LabelToRGB(ref))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Task (a):** Devise a heuristic which allows you to identify which of the clusters obtained with a clustering method (e.g., k-means or GMMs) corresponds to which tissue class. In the reference segmentation we have labels as CSF=1, GM=2, and WM=3.\n",
+ "\n",
+ "**Task (b):** Using your heuristic, calculate the Dice Similarity Coefficients (DSC) for CSF, GM, and WM when comparing the clustering results to the reference segmentation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Lesion segmentation using clustering\n",
+ "\n",
+ "**Optional task:** Try out the above clustering methods with **different numbers of clusters** on the provided `\"ct-brain.nii.gz\"` image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img = sitk.ReadImage(data_dir + 'ct-brain.nii.gz')\n",
+ "msk = sitk.ReadImage(data_dir + 'ct-brain-mask.nii.gz')\n",
+ "\n",
+ "print('CT image')\n",
+ "display_image(img, x=70, y=100, z=90, window=120, level=40)\n",
+ "\n",
+ "print('Brain mask')\n",
+ "display_image(msk, x=70, y=100, z=90)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "img_array = sitk.GetArrayFromImage(img)\n",
+ "msk_array = sitk.GetArrayFromImage(msk)\n",
+ "\n",
+ "masked_array = img_array\n",
+ "masked_array[msk_array==0] = 0\n",
+ "\n",
+ "img_masked = sitk.GetImageFromArray(masked_array)\n",
+ "img_masked.CopyInformation(img)\n",
+ "\n",
+ "print('Masked image')\n",
+ "display_image(img_masked, x=70, y=100, z=90, window=120, level=40)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Take all non-zero voxels and flatten the data into a 1D numpy array\n",
+ "X = img_array[msk_array > 0].flatten().reshape(-1, 1)\n",
+ "\n",
+ "# Get the number of points\n",
+ "num_pts = len(X.flatten())\n",
+ "\n",
+ "# Extract the minimum and maximum intensity values and calculate the number of bins for the histogram\n",
+ "lim_low = -20 # manually set intensity range of interest\n",
+ "lim_high = 100 # manually set intensity range of interest\n",
+ "num_bins = (lim_high - lim_low + 1)\n",
+ "\n",
+ "plt.figure(figsize=(10, 4), dpi=100)\n",
+ "plt.hist(X, bins=num_bins, density=True, range=(lim_low,lim_high), color='lightgray');\n",
+ "plt.xlim([0,80]) # we limit the x-axis to the range of interest\n",
+ "plt.show()\n",
+ "\n",
+ "print('Number of points ' + str(num_pts))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "sampling = 0.05\n",
+ "X_subset = np.random.choice(X.flatten(),int(num_pts*sampling)).reshape(-1, 1)\n",
+ "\n",
+ "plt.figure(figsize=(10, 4), dpi=100)\n",
+ "plt.hist(X_subset, bins=num_bins, density=True, range=(lim_low, lim_high), color='lightgray');\n",
+ "plt.xlim([0,80]);\n",
+ "plt.show()\n",
+ "\n",
+ "print('Number of points ' + str(len(X_subset)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_clusters = 5"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sklearn.cluster as cluster\n",
+ "\n",
+ "# Create a k-means instance\n",
+ "kmeans = cluster.KMeans(n_clusters=num_clusters)\n",
+ "\n",
+ "# Running k-means via the fit function\n",
+ "kmeans.fit(X_subset)\n",
+ "\n",
+ "# Produce the clustering result for all image points\n",
+ "y = kmeans.predict(img_array.flatten().reshape(-1, 1))\n",
+ "\n",
+ "# K-means will produce labels between 0 and (k-1), we want 0 to be background, so we shift the labels by one\n",
+ "y = y + 1 # shift labels\n",
+ "y[(msk_array == 0).flatten()] = 0 # zero background\n",
+ "\n",
+ "# Construct a 3D label map\n",
+ "lab_array = y.reshape(img_array.shape).astype('uint8')\n",
+ "seg_kmeans = sitk.GetImageFromArray(lab_array)\n",
+ "seg_kmeans.CopyInformation(img)\n",
+ "\n",
+ "# Display the results using SimpleITK mapping of label maps to colours\n",
+ "display_image(sitk.LabelToRGB(seg_kmeans))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": true
+ },
+ "source": [
+ "### Principal Component Analysis\n",
+ "\n",
+ "As discussed in the unsupervised learning lecture, Principal Component Analysis (PCA) allows us to do dimensionality reduction and construct statistical models. The idea here is to use PCA to find the principal modes (or components) of largest variation in the data.\n",
+ "\n",
+ "This is illustrated below with a simple 2D toy example, similar to the one in the lecture slides."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "# generate 2D toy data, 1000 samples\n",
+ "m = 1000\n",
+ "x = np.random.normal(5,2,m)\n",
+ "y = 2*x + np.random.normal(0,3,m)\n",
+ "\n",
+ "plt.figure(figsize=(7, 7), dpi=100)\n",
+ "plt.scatter(x,y,marker='.')\n",
+ "plt.xlim([-10,20])\n",
+ "plt.ylim([-5,25])\n",
+ "plt.axis('equal');"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Running PCA\n",
+ "\n",
+ "We now perform the steps of PCA as outlined in the lecture."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Step 1:** Put the data together in one big matrix $X$"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X = np.vstack((x,y))\n",
+ "n, m = X.shape\n",
+ "print('Dimension:\\t' + str(n))\n",
+ "print('Samples:\\t' + str(m))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Step 2:** Normalise the data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mu_X = np.mean(X, axis=1)\n",
+ "X_prime = (1 / np.sqrt(m-1)) * (X - np.tile(mu_X, (m, 1)).T)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**Step 3:** Run singular value decomposition (SVD)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "U, D, V = np.linalg.svd(np.matmul(X_prime,X_prime.T))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We plot the results by overlaying the principal components (directions of largest variations) with three times the standard deviation (capturing 99.7% of values around the mean of a normal distribution, see also [68–95–99.7 rule](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# first principal component\n",
+ "p1a = mu_X + U[:,0] * np.sqrt(D[0])*3;\n",
+ "p1b = mu_X - U[:,0] * np.sqrt(D[0])*3;\n",
+ "\n",
+ "# second principal component\n",
+ "p2a = mu_X + U[:,1] * np.sqrt(D[1])*3;\n",
+ "p2b = mu_X - U[:,1] * np.sqrt(D[1])*3;\n",
+ "\n",
+ "plt.figure(figsize=(7, 7), dpi=100)\n",
+ "plt.scatter(X[0,:],X[1,:],marker='.', color='lightgray')\n",
+ "plt.plot((p1a[0],p1b[0]),(p1a[1],p1b[1]), linewidth=3)\n",
+ "plt.plot((p2a[0],p2b[0]),(p2a[1],p2b[1]), linewidth=3)\n",
+ "plt.xlim([-10,20])\n",
+ "plt.ylim([-5,25])\n",
+ "plt.axis('equal');"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can confirm the Gaussian nature of PCA by overlaying a contour plot generating from a multivariate "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from scipy.stats import multivariate_normal\n",
+ "covar = np.matmul(X_prime,X_prime.T)\n",
+ "gauss_2d = multivariate_normal(mu_X, covar)\n",
+ "\n",
+ "i, j = np.mgrid[-10:20:.01, -5:25:.01]\n",
+ "pos = np.empty(i.shape + (2,))\n",
+ "pos[:, :, 0] = i; pos[:, :, 1] = j\n",
+ "\n",
+ "plt.figure(figsize=(7, 7), dpi=100)\n",
+ "plt.scatter(X[0,:],X[1,:],marker='.', color='lightgray')\n",
+ "plt.plot((p1a[0],p1b[0]),(p1a[1],p1b[1]), linewidth=3)\n",
+ "plt.plot((p2a[0],p2b[0]),(p2a[1],p2b[1]), linewidth=3)\n",
+ "plt.contour(i, j, gauss_2d.pdf(pos), cmap='viridis')\n",
+ "plt.xlim([-10,20])\n",
+ "plt.ylim([-5,25])\n",
+ "plt.axis('equal');"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Running PCA using scikit-learn\n",
+ "\n",
+ "We can also utilise the [PCA implementation](http://scikit-learn.org/stable/modules/decomposition.html#pca) of scikit-learn which is easier than doing it manually as above. It also has some additional features such as whitening."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sklearn.decomposition as decomp\n",
+ "\n",
+ "# Create PCA instance\n",
+ "pca = decomp.PCA()\n",
+ "\n",
+ "# Fit the data\n",
+ "pca.fit(X.T)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's confirm that we get the same results. Note, scikit-learn's PCA will gives us the singular values which can be converted to eigenvalues $\\lambda_i=s_i^2 \\, / \\, (m-1)$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Get the mean from PCA\n",
+ "mu_X = pca.mean_\n",
+ "\n",
+ "# Get principal modes (a.k.a. components) from PCA\n",
+ "U = pca.components_.T\n",
+ "\n",
+ "# Get the eigenvalues from PCA's singular values\n",
+ "D = pca.singular_values_**2 / (m - 1)\n",
+ "\n",
+ "gauss_2d = multivariate_normal(mu_X, pca.get_covariance())\n",
+ "\n",
+ "# first principal component\n",
+ "p1a = mu_X + U[:,0] * np.sqrt(D[0])*3;\n",
+ "p1b = mu_X - U[:,0] * np.sqrt(D[0])*3;\n",
+ "\n",
+ "# second principal component\n",
+ "p2a = mu_X + U[:,1] * np.sqrt(D[1])*3;\n",
+ "p2b = mu_X - U[:,1] * np.sqrt(D[1])*3;\n",
+ "\n",
+ "plt.figure(figsize=(7, 7), dpi=100)\n",
+ "plt.scatter(X[0,:],X[1,:],marker='.', color='lightgray')\n",
+ "plt.plot((p1a[0],p1b[0]),(p1a[1],p1b[1]), linewidth=3)\n",
+ "plt.plot((p2a[0],p2b[0]),(p2a[1],p2b[1]), linewidth=3)\n",
+ "plt.contour(i, j, gauss_2d.pdf(pos), cmap='viridis')\n",
+ "plt.xlim([-10,20])\n",
+ "plt.ylim([-5,25])\n",
+ "plt.axis('equal');"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### PCA Shape Models of the Spine\n",
+ "\n",
+ "Now, let's use PCA to construct a statistical shape model of the spine.\n",
+ "\n",
+ "We load the spine data consisting of 200 examples of 26 3D coordinates corresponding to the centroids of the vertebral bodies."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "spine_data = np.genfromtxt(data_dir + 'spine-data.txt')\n",
+ "n, m = spine_data.shape\n",
+ "\n",
+ "# the data is stored as a one 78x200 matrix\n",
+ "# for visualisation purposes we split the data\n",
+ "# into three sets of x, y, and z coordinates\n",
+ "num_centroids = n//3;\n",
+ "x_ind = range(num_centroids)\n",
+ "y_ind = range(num_centroids,num_centroids*2)\n",
+ "z_ind = range(num_centroids*2,num_centroids*3)\n",
+ "\n",
+ "cx = spine_data[x_ind,:];\n",
+ "cy = spine_data[y_ind,:];\n",
+ "cz = spine_data[z_ind,:];\n",
+ "\n",
+ "print('Dimension:\\t' + str(n))\n",
+ "print('Samples:\\t' + str(m))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's visualise the raw input data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "from mpl_toolkits.mplot3d import Axes3D\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "def plot_spines(x,y,z,max_range=None,marker_size=10,figure_size=5):\n",
+ "\n",
+ " fig = plt.figure(figsize=(figure_size, figure_size), dpi=100)\n",
+ " ax = fig.gca(projection='3d')\n",
+ " for s in range(x.shape[1]):\n",
+ " ax.scatter(x[:,s], y[:,s], z[:,s], s=marker_size, marker='.')\n",
+ " ax.plot(x[:,s], y[:,s], z[:,s])\n",
+ "\n",
+ " if max_range == None:\n",
+ " max_range = np.array([x.max()-x.min(), y.max()-y.min(), z.max()-z.min()]).max() / 2.0\n",
+ "\n",
+ " mid_x = (x.max()+x.min()) * 0.5\n",
+ " mid_y = (y.max()+y.min()) * 0.5\n",
+ " mid_z = (z.max()+z.min()) * 0.5\n",
+ " ax.set_xlim(mid_x - max_range, mid_x + max_range)\n",
+ " ax.set_ylim(mid_y - max_range, mid_y + max_range)\n",
+ " ax.set_zlim(mid_z - max_range, mid_z + max_range)\n",
+ " ax.view_init(5,45)\n",
+ " ax.grid()\n",
+ " \n",
+ "plot_spines(cx, cy, cz)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We notice, the data is all over the place which is normal as the raw coordinates extracted from the original CT images are not in the same coordinate space. Let's spatially normalise the data a bit by centering all spine at their geometric mean."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# spatial normalisation\n",
+ "cx_norm = cx - np.tile(np.mean(cx,axis=0),(num_centroids,1))\n",
+ "cy_norm = cy - np.tile(np.mean(cy,axis=0),(num_centroids,1))\n",
+ "cz_norm = cz - np.tile(np.mean(cz,axis=0),(num_centroids,1))\n",
+ "\n",
+ "plot_spines(cx_norm, cy_norm, cz_norm)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's have a look how the average human spine looks like"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cx_mean = np.mean(cx_norm,axis=1)\n",
+ "cy_mean = np.mean(cy_norm,axis=1)\n",
+ "cz_mean = np.mean(cz_norm,axis=1)\n",
+ "\n",
+ "plot_spines(cx_mean.reshape(-1,1), cy_mean.reshape(-1,1), cz_mean.reshape(-1,1), marker_size=100)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Running PCA\n",
+ "\n",
+ "**Task:** Use scikit-learn's implementation to perform PCA on the normalised spine data. Make sure to assign the following variables correctly: `mu_X` (mean), `U` (eigenvectors), `D` eigenvalues"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sklearn.decomposition as decomp\n",
+ "\n",
+ "X = np.vstack((cx_norm, cy_norm, cz_norm))\n",
+ "n, m = X.shape\n",
+ "print('Dimension:\\t' + str(n))\n",
+ "print('Samples:\\t' + str(m))\n",
+ "\n",
+ "#INSERT CODE HERE"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Visualisation\n",
+ "\n",
+ "Let's visualise the PCA shape model. We plot the mean shape and positive and negative deviations from the mean along the first three principal modes."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_modes = 3\n",
+ "for i in range(num_modes):\n",
+ "\n",
+ " # add and subtract 2 times the standard deviation from the mean\n",
+ " sp = mu_X + U[:,i] * np.sqrt(D[i]) * 3\n",
+ " sn = mu_X - U[:,i] * np.sqrt(D[i]) * 3\n",
+ " \n",
+ " cx = np.vstack((mu_X[x_ind], sp[x_ind], sn[x_ind])).T\n",
+ " cy = np.vstack((mu_X[y_ind], sp[y_ind], sn[y_ind])).T\n",
+ " cz = np.vstack((mu_X[z_ind], sp[z_ind], sn[z_ind])).T\n",
+ " \n",
+ " plot_spines(cx, cy, cz)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### Interactive shape model viewer\n",
+ "\n",
+ "The interactive viewer below alows you to generate spines using our PCA shape model. Each slider controls the variation from the mean shape along the first six principal modes. Here, we allow variations of up to ten times the standard deviation which can result in quite extreme deformations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from ipywidgets import interact, fixed\n",
+ "\n",
+ "def plot_spine(mean_shape,modes,s1,s2,s3,s4,s5,s6):\n",
+ " spine = mu_X + U[:,0] * s1 + U[:,1] * s2 + U[:,2] * s3 + U[:,3] * s4 + U[:,4] * s5 + U[:,5] * s6\n",
+ " sx = spine[x_ind].reshape(-1,1)\n",
+ " sy = spine[y_ind].reshape(-1,1)\n",
+ " sz = spine[z_ind].reshape(-1,1)\n",
+ " plot_spines(sx, sy, sz, max_range=300, marker_size=100)\n",
+ "\n",
+ "def interactive_pca(mu_X,U,D):\n",
+ " interact(plot_spine,mean_shape=fixed(mu_X),modes=fixed(U),\n",
+ " **{'s%d' % (i+1): (-np.sqrt(D[i]) * 10, np.sqrt(D[i]) * 10, np.sqrt(D[i])) for i in range(6)});\n",
+ "\n",
+ "interactive_pca(mu_X,U,D)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}