Unverified Commit 387dde5c authored by Nick Pawlowski's avatar Nick Pawlowski
Browse files

add instructions to run on colab

parent ac02ecc8
......@@ -16,6 +16,27 @@
"By Ben Glocker, Biomedical Image Analysis Group, Imperial College London"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running on Colab\n",
"We need to install dependencies and grab the data if you're running on colab:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! git clone https://gitlab.doc.ic.ac.uk/np716/rcs-summer-school.git\n",
"! mv rcs-summer-school/utils .\n",
"! pip install SimpleITK==1.2.2 \n",
"! wget https://www.doc.ic.ac.uk/~bglocker/teaching/rcs/data.zip\n",
"! unzip data.zip"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -31,7 +52,10 @@
"source": [
"import os\n",
"\n",
"data_dir = \"../data/mic/\"\n",
"if 'data' in os.listdir('.'):\n",
" data_dir = \"data/mic/\"\n",
"else:\n",
" data_dir = \"../data/mic/\"\n",
"print(os.listdir(data_dir))"
]
},
......@@ -1679,7 +1703,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
"version": "3.7.0"
}
},
"nbformat": 4,
......
%% Cell type:markdown id: tags:
# RCS Summer School
%% Cell type:markdown id: tags:
## Introduction to Medical Image Computing
By Ben Glocker, Biomedical Image Analysis Group, Imperial College London
%% Cell type:markdown id: tags:
## Running on Colab
We need to install dependencies and grab the data if you're running on colab:
%% Cell type:code id: tags:
``` python
! git clone https://gitlab.doc.ic.ac.uk/np716/rcs-summer-school.git
! mv rcs-summer-school/utils .
! pip install SimpleITK==1.2.2
! wget https://www.doc.ic.ac.uk/~bglocker/teaching/rcs/data.zip
! unzip data.zip
```
%% Cell type:markdown id: tags:
### Data
%% Cell type:code id: tags:
``` python
import os
data_dir = "../data/mic/"
if 'data' in os.listdir('.'):
data_dir = "data/mic/"
else:
data_dir = "../data/mic/"
print(os.listdir(data_dir))
```
%% Cell type:markdown id: tags:
### Loading a 3D medical image
We will be using a library called SimpleITK to handle medical image files. It is a simplified interface around the Insight Segmentation and Registration Toolkit (ITK), one of the most popular C++ image processing libraries for medical imaging.
%% Cell type:code id: tags:
``` python
import SimpleITK as sitk
```
%% Cell type:markdown id: tags:
We will use it primarily for reading and saving medical volumes encoded in NIfTI format, with the functions `sitk.ReadImage(<file_path>)` and `sitk.WriteImage(<image object>, <file_path>)`, respectively.
**Task:** Try loading the image `"ct-brain.nii.gz"` in our data directory `data_dir`:
%% Cell type:code id: tags:
``` python
img = sitk.ReadImage(data_dir + "ct-brain.nii.gz")
```
%% Cell type:markdown id: tags:
### Explore image information
Medical image files typically store raw volumetric data as a three-dimensional array. In addition, formats such as NIfTI and DICOM include a header containing a wealth of meta-information, such as:
- size: number of voxels in each dimension
- resolution/spacing: physical size of a voxel (e.g. 1mm x 1mm x 1mm)
- data type: e.g. `int32`, `uint8`, `float64`, vectors (+ number of components)
- scanner's origin and direction of coordinate axes
- voxel coordinate transformation matrices
- ... and [much more](https://brainder.org/2012/09/23/the-nifti-file-format/).
**Task:** Print the SimpleITK image object to see a summary of the loaded meta-information:
%% Cell type:code id: tags:
``` python
# Print image object
```
%% Cell type:code id: tags:
``` python
print(img)
```
%% Cell type:markdown id: tags:
SimpleITK also allows us to access each field directly.
**Task:** Let us have a look at the size and spacing of this image, with the methods `<image>.GetSize()` and `<image>.GetSpacing()`:
%% Cell type:code id: tags:
``` python
# Print image size and spacing
```
%% Cell type:code id: tags:
``` python
print(img.GetSize())
print(img.GetSpacing())
```
%% Cell type:markdown id: tags:
Although SimpleITK offers great functionality for manipulating the raw image data, we will often need to convert it to a NumPy array so it plays well with other Python packages, such as Matplotlib, which we will use for visualisation. This is done with the function `sitk.GetArrayFromImage(<image>)` (and vice-versa with `sitk.GetImageFromArray(<array>)`).
**Task:** Convert the SimpleITK image to a NumPy array
%% Cell type:code id: tags:
``` python
img_array = sitk.GetArrayFromImage(img) # Convert the SimpleITK image to a NumPy array
```
%% Cell type:markdown id: tags:
If you are curious about what else SimpleITK Image class offers, [this](http://simpleitk.github.io/SimpleITK-Notebooks/01_Image_Basics.html) is a good place to look. Additionally, you can run Python's `help` command on your image object to see all the available methods.
%% Cell type:markdown id: tags:
### Visualisation
The NumPy array you obtained from SimpleITK is three-dimensional, meaning we cannot visualise it directly. Fortunately, NumPy allows you to access entire 2D slices at a time, with `img_array[<z index>,:,:]`, `img_array[:,<y index>,:]` or `img_array[:,:,<x index>]`.
**Task:** Try printing a slice of your choice:
%% Cell type:code id: tags:
``` python
# Print an image slice
```
%% Cell type:code id: tags:
``` python
print(img_array[100])
```
%% Cell type:markdown id: tags:
As expected, we get a large two-dimensional array filled with numbers we cannot directly interpret.
To graphically display volume slices, we will use a library called Matplotlib. It is the most widely-used Python plotting library, and offers, beside more advanced functionality, a simple command-based interface, similar to Matlab's.
%% Cell type:code id: tags:
``` python
import matplotlib.pyplot as plt
%matplotlib inline
```
%% Cell type:markdown id: tags:
To display images, Matplotlib offers the function `plt.imshow(<array>)`. It has [many options](https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.imshow), including which colour map to use to visualise the given data (`cmap`). We will typically want to visualise medical images in greyscale (`cmap='gray'`), but feel free to play with [any of the ones available](http://matplotlib.org/examples/color/colormaps_reference.html).
By default, Matplotlib will place the origin of the y-axis at the top, increasing downwards. We can reverse it with the option `origin='lower'`.
**Task:** Investigate how to visualise axial (xy), coronal (xz) and sagittal (yz) slices according to the radiological convention.
%% Cell type:code id: tags:
``` python
# Display image slices
```
%% Cell type:code id: tags:
``` python
plt.imshow(img_array[100,:,:], cmap='gray');
```
%% Cell type:markdown id: tags:
If you tried coronal (xz) or sagittal (yz) slices, you will have noticed that the aspect ratio was wrong: the image seems 'squashed' in one dimension. This is because the resolution of this scan is *anisotropic*, i.e. the voxels do not have the same length in all directions.
`plt.imshow` has an option that lets you rescale the axes: `extent=(<left>, <right>, <bottom>, <top>)`.
**Task:** Using the `GetSize` and `GetSpacing` methods discussed earlier (note that SimpleITK's indexing convention is XYZ, whereas NumPy's is ZYX), try to set the display axes to the correct scale, in millimetres.
%% Cell type:code id: tags:
``` python
# Obtain the pysical dimensions of the image, in millimetres
size = img.GetSize()
spacing = img.GetSpacing()
width = size[0] * spacing[0]
height = size[1] * spacing[1]
depth = size[2] * spacing[2]
```
%% Cell type:code id: tags:
``` python
# Display axial slice with the correct dimensions
```
%% Cell type:code id: tags:
``` python
plt.imshow(img_array[100,:,:], cmap='gray', extent=(0, width, height, 0));
```
%% Cell type:code id: tags:
``` python
# Display coronal slice with the correct dimensions
```
%% Cell type:code id: tags:
``` python
plt.imshow(img_array[:,180,:], origin='lower', cmap='gray', extent=(0, width, 0, depth));
```
%% Cell type:code id: tags:
``` python
# Display sagittal slice with the correct dimensions
```
%% Cell type:code id: tags:
``` python
plt.imshow(img_array[:,:,150], origin='lower', cmap='gray', extent=(0, height, 0, depth));
```
%% Cell type:markdown id: tags:
### Image statistics
Since the image data is numerical, it often makes sense to look at some of its statistics. Many basic statistics are readily available in NumPy, e.g. `np.min`, `np.max`, `np.mean`, `np.std` (standard deviation), `np.percentile` etc.
**Task:** Have a look at the minimum, maximum and mean values of your image array.
%% Cell type:code id: tags:
``` python
import numpy as np
# Print minimum, maximum and mean of image array
```
%% Cell type:code id: tags:
``` python
print(np.min(img_array), np.max(img_array), np.mean(img_array))
```
%% Cell type:markdown id: tags:
To get a better idea of the distribution of intensities, it is helpful to analyse the image *histogram*. It is a simple bar plot expressing the frequencies of pixel intensity values found in the data array, in regularly-spaced intervals ('bins').
**Task:** Use Matplotlib's `plt.hist` function to display the distribution of intensities in your image:
```
plt.hist(<array>.flatten(), bins=<no. bins>, normed=True)
```
%% Cell type:code id: tags:
``` python
# Plot the image histogram with values for the number of bins, e.g, 32, 64, 128, 256
```
%% Cell type:code id: tags:
``` python
plt.hist(img_array.flatten(), bins=128, density=True);
```
%% Cell type:markdown id: tags:
You will notice two main peaks in the histogram with the left corresponding to values between -1000 and -500. This is the "background" area of the image corresponding to non-tissue or air. This area often takes a significant part of the image domain.
**Task:** Try plotting the histogram again by excluding pixels from the background region, and once more plotting only values close to the second peak (in the range [-500,500]). Hint: use logical operators on the NumPy array.
%% Cell type:code id: tags:
``` python
plt.hist(img_array.flatten()[img_array.flatten() > -500], bins=256, density=True);
```
%% Cell type:code id: tags:
``` python
plt.hist(img_array.flatten()[np.logical_and(img_array.flatten() > -500,img_array.flatten() < 500)], bins=256, density=True);
```
%% Cell type:markdown id: tags:
### Window/Level
In general, medical images have pixel intensity ranges beyond the typical 256 grayscales (8-bit) that we know from conventional digital images. The loaded CT scan, for example, has a range of about [-1000,2000].
By default, `plt.imshow` will display the entire intensity range, mapping the minimum and maximum values in the array to the extremes of the colour scale. However, we can manually specify the display range, setting `clim=(<low>, <high>)` or independently `vmin=<low>` and/or `vmax=<high>`.
**Task:** Using the concept of window/level, think about how to calculate parameters `clim=(<low>, <high>)` given a particular set of values for window and level. Also think about formulae for calculating the window and level that capture the full range of image intensities.
%% Cell type:code id: tags:
``` python
# Calculate parameters low and high from window and level
def wl_to_lh(window, level):
low = level - window/2
high = level + window/2
return low,high
print(wl_to_lh(160,70))
print(wl_to_lh(2000,300))
print(wl_to_lh(350,50))
```
%% Cell type:markdown id: tags:
**Task:** Explore displaying the loaded brain CT scan `"ct-brain.nii.gz"` with different window/level settings. For example, using a window = 120 and level = 40 should give a good setting for displaying brain tissue. Other useful settings for CT images are a window = 2000 and level = 300 for good bone contrast, or window = 350 and level = 50 for abdominal organs.
%% Cell type:code id: tags:
``` python
# Display image slices with different window/level settings
```
%% Cell type:code id: tags:
``` python
low,high = wl_to_lh(120,40)
plt.imshow(img_array[100,:,:], cmap='gray', clim=(low, high));
```
%% Cell type:markdown id: tags:
**Task:** Try also some of the other images provided in the data folder, such as `"ct-abdomen.nii.gz"` for an abdominal CT scan showing organs such as the liver or kidneys, or `"mri-brain.nii.gz"` for an example of a brain MRI scan. Try to find a good window/level setting for the MRI scan that shows good contrast for gray and white matter tissue (Hint: z-slice 130 shows a good cross-section through the ventricles).
%% Cell type:code id: tags:
``` python
import os
# Display other images with different window/level settings
print(os.listdir(data_dir))
```
%% Cell type:code id: tags:
``` python
img_mri = sitk.ReadImage(data_dir + 'mri-brain.nii.gz')
low,high = wl_to_lh(300,200)
plt.imshow(sitk.GetArrayFromImage(img_mri)[130,:,:], cmap='gray', clim=(low, high));
```
%% Cell type:markdown id: tags:
### Multiplanar Image Viewer
**Task:** Now try writing a function to visualise an arbitrary medical volume, based on what you have done so far. It should take as input a SimpleITK image object (`img`), the slice indices (`x,y,z`) and window/level parameters (`window,level`), and display the specified axial (`z`), coronal (`y`) and sagittal (`x`) slices in grayscale and with correct axis dimensions and window/level contrast.
Note: If (`x,y,z`) are not specified (`=None`), the function should display the centre slices. If (`window,level`) are not specified, the function should calculate the window/level setting to cover the full intensity range.
%% Cell type:code id: tags:
``` python
def display_image(img, x=None, y=None, z=None, window=None, level=None):
# Convert SimpleITK image to NumPy array
img_array = sitk.GetArrayFromImage(img)
# Get image dimensions in millimetres
size = img.GetSize()
spacing = img.GetSpacing()
width = size[0] * spacing[0]
height = size[1] * spacing[1]
depth = size[2] * spacing[2]
if x is None:
x = np.floor(size[0]/2).astype(int)
if y is None:
y = np.floor(size[1]/2).astype(int)
if z is None:
z = np.floor(size[2]/2).astype(int)
if window is None:
window = np.max(img_array) - np.min(img_array)
if level is None:
level = window / 2 + np.min(img_array)
low,high = wl_to_lh(window,level)
# Display the orthogonal slices
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(10, 4))
ax1.imshow(img_array[z,:,:], cmap='gray', clim=(low, high), extent=(0, width, height, 0))
ax2.imshow(img_array[:,y,:], origin='lower', cmap='gray', clim=(low, high), extent=(0, width, 0, depth))
ax3.imshow(img_array[:,:,x], origin='lower', cmap='gray', clim=(low, high), extent=(0, height, 0, depth))
# Additionally display crosshairs
ax1.axhline(y * spacing[1], lw=1)
ax1.axvline(x * spacing[0], lw=1)
ax2.axhline(z * spacing[2], lw=1)
ax2.axvline(x * spacing[0], lw=1)
ax3.axhline(z * spacing[2], lw=1)
ax3.axvline(y * spacing[1], lw=1)
plt.show()
```
%% Cell type:markdown id: tags:
The code below should give you an interactive way of displaying 3D medical images based on your `display_image` function.
%% Cell type:code id: tags:
``` python
from ipywidgets import interact, fixed
from IPython.display import display
def interactive_view(img):
size = img.GetSize()
img_array = sitk.GetArrayFromImage(img)
interact(display_image,img=fixed(img),
x=(0, size[0] - 1),
y=(0, size[1] - 1),
z=(0, size[2] - 1),
window=(0,np.max(img_array) - np.min(img_array)),
level=(np.min(img_array),np.max(img_array)));
interactive_view(img)
```
%% Cell type:markdown id: tags:
### Accessing voxel values
In Python, we can access a single element of a sequence with an integer index (`[index]`), or we can retrieve a contiguous sub-sequence with so-called *slice notation*: `[start:stop]`. Either or both arguments can be left blank: `[start:]` (`stop` defaults to the end), `[:stop]` (`start` defaults to `0`) or `[:]` (entire sequence).
A multi-dimensional array (such as our CT volume) can be indexed simultaneously in all dimensions, allowing us to access not only 2D slices, but also lines, rectangles, cuboids or individual voxels. A complete reference for NumPy array indexing can be found [here](https://docs.scipy.org/doc/numpy-1.11.0/reference/arrays.indexing.html).
**Task:** Explore the indexing options for extracting 1D, 2D and 3D sub-arrays, and check the resulting shapes, e.g. `img_array[50, :, 100:120].shape`, comparing with the shape of the original array:
%% Cell type:code id: tags:
``` python
print(img_array.shape)
```
%% Cell type:code id: tags:
``` python
# Print the shape of indexed sub-arrays
print(img_array[50, :, 100:120].shape)
```
%% Cell type:markdown id: tags:
Indexing is not just useful for reading data in an array, we can also set values in entire regions without having to explicitly write iterative loops.
**Task:** Select any 2D slice, try setting a rectangular region to an arbitrary value and visualise the result with `plt.imshow`:
%% Cell type:code id: tags:
``` python
# Extract a 2D slice
# Set a rectangular region to a constant (e.g. 0)
# Visualise the result with plt.imshow
img = sitk.ReadImage(data_dir + 'mri-brain.nii.gz')
slice = sitk.GetArrayFromImage(img)[120,:,:];
slice[100:150,50:150] = 400
plt.imshow(slice, cmap='gray')
```
%% Cell type:markdown id: tags:
### Image arithmetic
Beside simply computing statistics, we can also perform arithmetic operations with image intensities, such as multiplication by scalars, contrast adjustment or any pointwise function (e.g. [gamma correction](https://en.wikipedia.org/wiki/Gamma_correction) also known as Power-Law Transform.
**Task:** Load the image `"mri-brain-t1-contrast.nii.gz"` and visualise slices (`x=105, y=170, z=95`) with window/level (`window=800, level=400`) using your `display_image` function.
%% Cell type:code id: tags:
``` python
img1 = sitk.ReadImage(data_dir + 'mri-brain-t1-contrast.nii.gz')
display_image(img1, 105, 170, 95, 800, 400)
```
%% Cell type:markdown id: tags:
**Task:** Implement a function for performing gamma correction, and apply it to the `"mri-brain-t1-contrast.nii.gz"` image (e.g., using (`c=10, gamma=3`)).
Note, for gamma correction you should first normalise the image intensity range to [0,1], apply gamma correction, and transform back to the original range. Some operations require the image data type to be floating point (`float`). NumPy arrays can be easily converted using `astype(<dtype>)`.
%% Cell type:code id: tags:
``` python
def gamma_correction(img, c, gamma):
img_array = sitk.GetArrayFromImage(img).astype(float)
min_value = np.min(img_array)
max_value = np.max(img_array)
img_array = (img_array - min_value) / (max_value - min_value)
img_array = c * np.power(img_array,gamma)
img_array = img_array * (max_value - min_value) + min_value
return sitk.GetImageFromArray(img_array)
```
%% Cell type:code id: tags:
``` python
img1_corrected = gamma_correction(img1, 10, 3)
display_image(img1_corrected, 105, 170, 95, 800, 400)
```
%% Cell type:markdown id: tags:
For the image above, an intravenous contrast agent has been used to highlight blood and tumourous tissue. In this case, we also have a corresponding brain scan taken before the contrast agent has been administered.
**Task:** Load the image `"mri-brain-t1.nii.gz"` and compare the two MRI scans with and without contrast using your `display_image` function. Hint: use the same window/level for both scans so you can better see the differences.
%% Cell type:code id: tags:
``` python
img2 = sitk.ReadImage(data_dir + 'mri-brain-t1.nii.gz')
display_image(img1, 105, 170, 95, 800, 400)
display_image(img2, 105, 170, 95, 800, 400)
```
%% Cell type:markdown id: tags:
Note that these two images of the same patient are registered (i.e. spatially aligned).
**Task:** Now try displaying `(img1-img2)` (SimpleITK supports arithmetics with image objects) to see what was highlighted by the contrast agent. Hint: you might need to adjust the window/level for better visibility.
%% Cell type:code id: tags:
``` python
# Display the difference image
```
%% Cell type:code id: tags:
``` python
display_image(img1-img2, 105, 170, 95, 200, 100)
```
%% Cell type:markdown id: tags:
**Task:** Try the same with gamma corrected version of the contrast image. Note, you might need to convert the data type of the non-contrast image to be able to subtract it from the gamma corrected image.
%% Cell type:code id: tags:
``` python
# Display the difference image of the gamma corrected images
```
%% Cell type:code id: tags:
``` python
img2_corrected = gamma_correction(img2, 1, 1)
display_image(img1_corrected-img2_corrected, 105, 170, 95, 200, 100)
```
%% Cell type:markdown id: tags:
### Intensity normalisation
When performing statistical analysis or running machine learning methods on images, it is often useful to first *standardise* the intensities, i.e. make them have zero mean and unit variance. This is achieved by subtracting the mean and dividing by the standard deviation of the intensities. In mathematical terms, the standardised image $\tilde{I}$ is computed as
$$\tilde{I} = \frac{I - \mu}{\sigma}, \qquad\qquad \mu = \frac{1}{N} \sum_{n=1}^N I_n, \quad \sigma = \sqrt{\frac{1}{N} \sum_{n=1}^N (I_n - \mu)^2},$$
where $I$ is the original image, $N$ is its total number of voxels and $I_n$ is the intensity of voxel $n$.
**Task:** Try standardising the intensities of `img1`, using `np.mean` and `np.std`, and plot the resulting histogram:
%% Cell type:code id: tags:
``` python
import numpy as np
img1_array = sitk.GetArrayFromImage(img1) # Convert img1 to a NumPy array
img1_array = img1_array[img1_array > 0] # Exclude the background voxels, with intensity 0
# img1_array is now an unstructured 'flat' array containing only the intensities of the brain voxels
# Compute its mean and standard deviation
# Standardise the intensity array
# Plot the histogram before and after normalisation
```
%% Cell type:code id: tags:
``` python
plt.hist(img1_array, bins=100, density=True);
```
%% Cell type:code id: tags:
``` python
plt.hist((img1_array-np.mean(img1_array)) / np.std(img1_array), bins=100, density=True);
```
%% Cell type:markdown id: tags:
### Image enhancement
We can also automatically enhance the contrast with [*histogram equalisation*](https://en.wikipedia.org/wiki/Histogram_equalization). It is a transformation applied to the image intensities which makes their distribution roughly uniform, i.e. all values occur approximately equally often in the image. Although it is not ideal for every application, the method is fast, simple to implement and useful for visualisation.
Below we define a function, `hist_eq`, which equalises the histogram of a given array, taking as input also the desired number of histogram bins (defaults to the maximum). The resulting array will have its values distributed almost uniformly between the original minimum and maximum values. The helper function `hist_eq_img` does exactly the same but can be called directly on SimpleITK images.
**Task:** Try to understand the individual steps of the implementation of `hist_eq`.
%% Cell type:code id: tags:
``` python
def hist_eq(data, nbins=None):
shape = data.shape
data_min, data_max = data.min(), data.max()
data = data.flatten()
# nbins defaults to the integer range of values
if nbins is None:
nbins = int(data_max - data_min)
# Compute image histogram
counts, bins = np.histogram(data, bins=nbins)
# Compute cumulative distribution function (CDF)
cdf = counts.cumsum() / counts.sum()
# Use linear interpolation of CDF to find new intensity values
data_eq = np.interp(data, bins[:-1], (data_max - data_min) * cdf + data_min)
return data_eq.reshape(shape)
def hist_eq_img(img, nbins=None):
data = sitk.GetArrayFromImage(img)
data_eq = hist_eq(data, nbins)
img_eq = sitk.GetImageFromArray(data_eq)
img_eq.CopyInformation(img)
return img_eq
```
%% Cell type:markdown id: tags:
**Task:** Now apply `hist_eq_img` to one of the SimpleITK images you have loaded and visualise it next to its original with your `display_image`:
%% Cell type:code id: tags:
``` python
# Apply histogram equalisation to an image
# Display original image
# Display equalised image
```
%% Cell type:code id: tags:
``` python
img = sitk.ReadImage(data_dir + 'ct-brain.nii.gz')
img_orig = img
img_eq = hist_eq_img(img_orig)
print("Before histogram equalisation:")
display_image(img_orig)
print("After histogram equalisation:")
display_image(img_eq)
```
%% Cell type:markdown id: tags:
The following plots compare the intensity distributions before and after histogram equalisation.
%% Cell type:code id: tags:
``` python
data_orig = sitk.GetArrayFromImage(img_orig).flatten()
data_eq = sitk.GetArrayFromImage(img_eq).flatten()
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.hist(data_orig, bins=128, density=True, histtype='step', cumulative=False)
plt.hist(data_eq, bins=128, density=True, histtype='step', cumulative=False)
plt.subplot(122)
plt.hist(data_orig, bins=128, density=True, histtype='step', cumulative=True)
plt.hist(data_eq, bins=128, density=True, histtype='step', cumulative=True);
```
%% Cell type:markdown id: tags:
### Filtering with SimpleITK
#### Smoothing
Occasionally we will acquire medical scans which include some amount of undesired noise.
One such noisy image might look like this:
%% Cell type:code id: tags:
``` python
# We convert it to `float32` for compatibility with some functions we'll use later
img = sitk.Cast(sitk.ReadImage(data_dir + 'mri-brain-noisy.nii.gz'), sitk.sitkFloat32)
display_image(img)
```
%% Cell type:markdown id: tags:
A basic denoising technique is to *convolve* the image with a smoothing filter.
We can achieve this with SimpleITK using `sitk.DiscreteGaussian(<img>)` (it has a `variance` option, default `=1.0`).
**Task:** Try applying a Gaussian filter to the loaded image. Try out different values for the `variance` parameter:
%% Cell type:code id: tags:
``` python
img_gauss = sitk.DiscreteGaussian(img, 1)
display_image(img_gauss)
```
%% Cell type:markdown id: tags:
Comparing to the original noisy image, we can see that most of the visible noise is gone, but the image edges also lose their sharpness, in particular with larger values for the variance.
Interestingly, Gaussian smoothing can be interpreted as *isotropic diffusion*, i.e. image intensities are 'diffused' (think heat conduction) homogeneously in all directions for a length of 'time' proportional to the variance of the Gaussian filter. Extending on this idea, another popular approach for denoising is *anisotropic diffusion*, which adjusts the local 'conductivity' based on the image gradients. In other words, it attempts to smooth out flat regions while preserving the edges.
**Task:** Try out the SimpleITK function `sitk.GradientAnisotropicDiffusion(<img>)` (can take a few seconds). Play around with different values for the parameters of this filter:
%% Cell type:code id: tags:
``` python
img_diffusion = sitk.GradientAnisotropicDiffusion(img)
display_image(img_diffusion)
#help(sitk.GradientAnisotropicDiffusion)
```
%% Cell type:markdown id: tags:
Let's visualise all three together. Pay attention to the overall noise level and the sharpness of the edges. You might want to adjust the window/level setting for better contrast:
%% Cell type:code id: tags:
``` python
display_image(img, window=400, level=200)
display_image(img_gauss, window=400, level=200)
display_image(img_diffusion, window=400, level=200)
```
%% Cell type:markdown id: tags:
**Task:** Now try the same two smoothing approaches with `'ct-brain-noisy.nii.gz'`:
%% Cell type:code id: tags:
``` python
img2 = sitk.Cast(sitk.ReadImage(data_dir + 'ct-brain-noisy.nii.gz'), sitk.sitkFloat32)
display_image(img2)
```
%% Cell type:code id: tags:
``` python
img2_gauss = sitk.DiscreteGaussian(img2)
display_image(img2_gauss)
```
%% Cell type:code id: tags:
``` python
img2_diffusion = sitk.GradientAnisotropicDiffusion(img2)
display_image(img2_diffusion)
```
%% Cell type:markdown id: tags:
**Task:** Again, visualise all three (noisy, blurred and diffused) in the CT intensity range for gray/white matter (`window=120, level=40`) and compare noise levels and edge sharpness.
%% Cell type:code id: tags:
``` python
display_image(img2, window=120, level=40)
display_image(img2_gauss, window=120, level=40)
display_image(img2_diffusion, window=120, level=40)
```
%% Cell type:markdown id: tags:
You will notice that each method gives better results for one image but worse for the other. Why do you think that is the case?
*Hint:* Think of the magnitude of the noise compared to the amplitude of the true image variations. If the signal-to-noise ratio if low, the algorithm has no way of knowing what needs to be preserved or smoothed out.
%% Cell type:markdown id: tags:
#### Edge detection
Another common application of image filtering is estimating image gradients for edge detection.
Let us compute some spatial derivatives with `sitk.Derivative(<img>, direction=<dir>)`, where `<dir>` is 0, 1 or 2 for X, Y or Z, respectively.
%% Cell type:code id: tags:
``` python
img_dx = sitk.Derivative(img, direction=0)
img_dy = sitk.Derivative(img, direction=1)
img_dz = sitk.Derivative(img, direction=2)
display_image(img_dx, level=0)
display_image(img_dy, level=0)
display_image(img_dz, level=0)
```
%% Cell type:markdown id: tags:
When we compute the magnitude of the gradients of an image, we obtain what is called an *edge map*, which is simply a local measure of the 'strength' of an edge.
This operation is readily available in SimpleITK with `sitk.SobelEdgeDetection(<img>)`:
%% Cell type:code id: tags:
``` python
img_sobel = sitk.SobelEdgeDetection(img)
display_image(img_sobel)
```
%% Cell type:markdown id: tags:
Note how the derivatives look quite 'grainy', as we are differentiating the superimposed noise as well.
**Task**: How could you improve the edge detection on a noisy image?
%% Cell type:code id: tags:
``` python
# We run edge detection on the smoothed image
img_sobel2 = sitk.SobelEdgeDetection(img_gauss)
display_image(img_sobel2)
```
%% Cell type:markdown id: tags:
#### Edge sharpening
**Task:** Here we implement a `sharpen()` function which applies unsharp masking to an image. It uses `sitk.DiscreteGaussian`, as above, and takes the following arguments:
- `scale`: the standard deviation of the Gaussian filter
- `strength`: the scaling factor for the smoothed image
%% Cell type:code id: tags:
``` python
def sharpen(img, scale=1, strength=2):
# Apply unsharp masking
img_smooth = sitk.DiscreteGaussian(img, scale)
img_sharpened = img + (img - img_smooth) * strength
return img_sharpened
img_sharp = sharpen(img_gauss, 1, 2)
display_image(img_gauss, window=400, level=200)
display_image(img_sharp, window=400, level=200)
```
%% Cell type:markdown id: tags:
### Resampling
Many image processing or computer vision algorithms adopt a multi-scale approach, for example making a coarse search over the entire image and then refining it at smaller scales. This can be achieved efficiently through the use of so-called *image pyramids*, which are formed by the base image and itself downsampled to lower resolutions, typically by a factor of 2 at each level.
**Task:** Implement a naïve downsampling function, by simply taking one every `<factor>` (integer-valued) pixels in each dimension.
*Hint:* SimpleITK image objects also support Python's indexing notation: `[start:stop:step]`, potentially omitting any of the arguments.
%% Cell type:code id: tags:
``` python
def downsample_naive(img, factor=2):
return img[::factor, ::factor, ::factor]
```
%% Cell type:markdown id: tags:
Now let us test with the MRI volume from before:
%% Cell type:code id: tags:
``` python
img = sitk.ReadImage(data_dir + "mri-brain-noisy.nii.gz")
display_image(img)
```
%% Cell type:code id: tags:
``` python
img_down_naive_1 = downsample_naive(img)
img_down_naive_2 = downsample_naive(img_down_naive_1)
img_down_naive_3 = downsample_naive(img_down_naive_2)
display_image(img)
display_image(img_down_naive_1)
display_image(img_down_naive_2)
display_image(img_down_naive_3)
```
%% Cell type:markdown id: tags:
It is known that naïve undersampling can produce *aliasing*, i.e. exacerbate high spatial frequencies (edges and noise) in the downsampled image, making it look jagged and 'blocky'. We usually circumvent this issue by first smoothing the image with a low-pass filter (e.g. Gaussian) before resampling the pixel values.
**Task:** Now try implementing a `downsample` method which first applies a Gaussian smoothing and then downsamples by an integer factor (no interpolation needed).
*Hint:* Recall the `variance` option for `sitk.DiscreteGaussian`. A Gaussian standard deviation of `0.5*factor` works well in practice, but feel free to experiment!
%% Cell type:code id: tags:
``` python
def downsample(img, factor=2):
smoothed = sitk.DiscreteGaussian(img, (.5 * factor) ** 2)
return smoothed[::factor, ::factor, ::factor]
```
%% Cell type:markdown id: tags:
Let's have a look at the results for this approach:
%% Cell type:code id: tags:
``` python
img_down_1 = downsample(img)
img_down_2 = downsample(img_down_1)
img_down_3 = downsample(img_down_2)
display_image(img)
display_image(img_down_1)
display_image(img_down_2)
display_image(img_down_3)
```
%% Cell type:markdown id: tags:
#### Resampling with SimpleITK
%% Cell type:markdown id: tags:
SimpleITK offers advanced resampling features that can be useful in many scenarios, e.g. when downsampling by non-integer factors or resampling to a specific resolution (e.g. isotropic).
Have a look at the `resample()` function we have implemented below and try to understand the role of each of the arguments to the `sitk.ResampleImageFilter`.
%% Cell type:code id: tags:
``` python
def resample(img, new_size=None, new_spacing=None):
old_size = img.GetSize()
old_spacing = img.GetSpacing()
if new_size is None and new_spacing is None:
return img
if new_size is None:
# Compute new image dimensions based on the desired rescaling of the voxel spacing
new_size = [int(np.ceil(old_size[d] * old_spacing[d] / new_spacing[d])) for d in range(3)]
if new_spacing is None:
# Compute new voxel spacing based on the desired rescaling of the image dimensions
new_spacing = [old_spacing[d] * old_size[d] / new_size[d] for d in range(3)]
# Smooth the input image with anisotropic Gaussian filter
img_smoothed = img
for d in range(3):
# Note how the blurring strength can be different in each direction,
# if the scaling factors are different.
factor = new_spacing[d] / old_spacing[d]
sigma = 0.2 * factor
img_smoothed = sitk.RecursiveGaussian(img_smoothed, sigma=sigma, direction=d)
# Finally, apply the resampling operation
img_resampled = sitk.ResampleImageFilter().Execute(
img_smoothed, # Input image
new_size, # Output image dimensions
sitk.Transform(), # Coordinate transformation. sitk.Transform() is a dummy identity transform,
# as we want the brain to be in exactly the same place. When we do image registration,
# for example, this can be a linear or nonlinear transformation.
sitk.sitkLinear, # Interpolation method (cf. also sitk.sitkNearestNeighbor and many others)
img.GetOrigin(), # Output image origin (same)
new_spacing, # Output voxel spacing
img.GetDirection(), # Output image orientation (same)
0, # Fill value for points outside the input domain
img.GetPixelID()) # Voxel data type (same)
return img_resampled
```
%% Cell type:markdown id: tags:
Let's resample the MR image to an element spacing of 2x4x8mm.
%% Cell type:code id: tags:
``` python
img_resampled = resample(img, new_spacing=[2, 4, 8])
print("Spacing: {}".format(img.GetSpacing()))
display_image(img, window=400, level=200)
print("Spacing: {}".format(img_resampled.GetSpacing()))
display_image(img_resampled, window=400, level=200)
```
%% Cell type:markdown id: tags:
Medical imaging data is often anisotropic. For many image analysis algorithms, however, it is easier to work with isotropic input data.
%% Cell type:code id: tags:
``` python
img3 = sitk.ReadImage(data_dir + 'mri-brain-anisotropic.nii.gz')
display_image(img3, z=10, window=800, level=400)
```
%% Cell type:markdown id: tags:
**Task:** Try resampling the above image to an isotropic 1mm resolution, and display the result showing approximately the same xy-plane by setting an appropriate value for the `z` parameter in the `display_image` function.
%% Cell type:code id: tags:
``` python
img3_resampled = resample(img3, new_spacing=[1, 1, 1])
print("Spacing: {}".format(img3.GetSpacing()))
display_image(img3, z=10, window=800, level=400)
print("Spacing: {}".format(img3_resampled.GetSpacing()))
display_image(img3_resampled, z=50, window=800, level=400)
```
%% Cell type:markdown id: tags:
### Segmentation
First let us define some helper functions to overlay an image with a segmentation, for good visualisation. `display_overlay()` takes as input the base image, a segmentation image (binary) and all the usual arguments for `display_image()`:
%% Cell type:code id: tags:
``` python
def label_overlay(img, seg):
minmax = sitk.MinimumMaximumImageFilter()
minmax.Execute(img)
low, high = minmax.GetMinimum(), minmax.GetMaximum()
img_norm = (img - low) / (high - low)
img_uint8 = sitk.Cast(256 * img_norm, sitk.sitkUInt8)
return sitk.LabelOverlay(img_uint8, seg)
def display_overlay(img, seg, *args, **kwargs):
display_image(label_overlay(img, seg), *args, **kwargs)
```
%% Cell type:markdown id: tags:
#### Thresholding
The simplest form of segmentation is just based on a single cutoff in the intensities, with a manually specified threshold. Alternatively, we can specify an upper (U) and a lower (L) thresholds.
**Task:** Try to find a good UL thresholding to segment the lesion in the noisy CT scan.
*Hint 1:* SimpleITK images support comparison (`<`, `>`, `<=`, `>=`, `==`, `!=`) and logical ('and' `&`, 'or' `|`, 'xor' `^`, 'not' `~`) operators to produce binary images.
*Hint 2:* Image noise causes major problems for thresholding approaches. Try removing noise before thresholding the image and compare the results.
%% Cell type:code id: tags:
``` python
img = sitk.ReadImage(data_dir + 'ct-brain-noisy.nii.gz')
display_image(img, x=70, y=100, z=90, window=120, level=40)
```
%% Cell type:code id: tags:
``` python
seg = (img > 50) & (img < 100)
display_overlay(img, seg, x=70, y=100, z=90)
```
%% Cell type:code id: tags:
``` python
img_gauss = sitk.DiscreteGaussian(img)
seg = (img_gauss > 50) & (img_gauss < 100)
display_overlay(img_gauss, seg, x=70, y=100, z=90)
```
%% Cell type:markdown id: tags:
Here we can observe one of the main shortcomings of purely intensity-based segmentation methods: we have no control over the location or spatial contiguity of the segmented regions. So even if we manage to segment most of the lesion, there are lot of areas included with the same intensity range outside the structure of interest.
%% Cell type:markdown id: tags:
#### Region growing
Region growing offers an alternative to simple thresholding that addresses the issue of spatial contiguity: it only labels the voxels which are *reachable* from a set of manually-specified *seed points*. In its simplest form, it is equivalent to thresholding, but limited to regions connected to the seeds.
**Task:** Implement region growing with UL thresholding.
*Hints:*
- To initialise your zero-filled segmentation image, use `sitk.Image(<img>.GetSize(), sitk.sitkUInt8)`. Don't forget to also call `<seg>.CopyInformation(<img>)` to copy the meta-data (spacing, origin, orientation) from the input image.
- You can use Python's `collections.deque` (double-ended queue). Use `.append(<elem>)` or `.extend(<list>)` to enqueue elements and `.popleft()` to dequeue an element (`.pop()` would work as a stack instead).
- Your algorithm should remember which voxels have already been visited. This can be achieved with a Python set (`set()`), with which you can do `.add(<elem>)` and `if <elem> in <set>:`.
- Use tuples to represent the voxel locations (e.g. `point=(x,y,z)`), then you can index directly into the SimpleITK image with them (e.g. `image[point]`).
- `neighbours(<point>, <img_size>)` returns the list of immediate neighbours of `<point>`, clipped at the image borders.
%% Cell type:code id: tags:
``` python
def neighbours(x, size):
nbs = []
for d in range(len(x)):
if x[d] > 0:
nb = list(x)
nb[d] -= 1
nbs.append(tuple(nb))
if x[d] < size[d] - 1:
nb = list(x)
nb[d] += 1
nbs.append(tuple(nb))
return nbs
```
%% Cell type:code id: tags:
``` python
from collections import deque
def region_growing(img, seeds, low, high):
size = img.GetSize()
seg = sitk.Image(size, sitk.sitkUInt8)
queue = deque()
queue.extend(seeds)
visited = set()
while len(queue) > 0:
x = queue.popleft()
if x in visited:
continue
visited.add(x)
if low <= img[x] < high:
seg[x] = 1
queue.extend(neighbours(x, size))
seg.CopyInformation(img)
return seg
```
%% Cell type:code id: tags:
``` python
display_image(img_gauss, x=70, y=100, z=90, window=120, level=40)
```
%% Cell type:code id: tags:
``` python
seed = (70, 100, 90)
low, high = 50, 100
seg2 = region_growing(img_gauss, [seed], low, high)
display_overlay(img_gauss, seg2, *seed)
```
%% Cell type:markdown id: tags:
Let's visually compare the results to a manual reference segmentation.
%% Cell type:code id: tags:
``` python
ref = sitk.ReadImage(data_dir + 'ct-brain-lesion.nii.gz')
print("Thresholding")
display_image(label_overlay(img_gauss, seg), *seed)
print("Region growing")
display_image(label_overlay(img_gauss, seg2), *seed)
print("Reference segmentation")
display_image(label_overlay(img_gauss, ref), *seed)
```
%% Cell type:markdown id: tags:
#### Segmentation Evaluation
%% Cell type:markdown id: tags:
Let's now quantitatively evaluate the segmentations using different performance measures. SimpleITK has many important measures already implemented.
First, we extract the surfaces from the segmentations.
%% Cell type:code id: tags:
``` python
seg_contour = sitk.LabelContour(seg)
seg2_contour = sitk.LabelContour(seg2)
ref_contour = sitk.LabelContour(ref)
print("Thresholding - Surface")
display_image(label_overlay(img_gauss, seg_contour), *seed)
print("Region growing - Surface")
display_image(label_overlay(img_gauss, seg2_contour), *seed)
print("Reference segmentation - Surface")
display_image(label_overlay(img_gauss, ref_contour), *seed)
```
%% Cell type:markdown id: tags:
Then we can use SimpleITK to compute overlap measures such as Jaccard Index (JI) and Dice Similarity Coefficient (DSC). We can also directly compute the Hausdorff Distance (HD) and Symmetric Average Surface Distance (ASD) from the segmentation contours using SimpleITK's `HausdorffDistanceImageFilter`. (Note, ASD is called AverageHausdorffDistance in SimpleITK)
%% Cell type:code id: tags:
``` python
overlap_measures_filter = sitk.LabelOverlapMeasuresImageFilter()
hausdorff_distance_filter = sitk.HausdorffDistanceImageFilter()
overlap_measures_filter.Execute(ref, seg)
hausdorff_distance_filter.Execute(ref_contour, seg_contour)
print('\nThresholding')
print('JI\t' + str(overlap_measures_filter.GetJaccardCoefficient()))
print('DSC\t' + str(overlap_measures_filter.GetDiceCoefficient()))
print('HD\t' + str(hausdorff_distance_filter.GetHausdorffDistance()))
print('ASD\t' + str(hausdorff_distance_filter.GetAverageHausdorffDistance()))
overlap_measures_filter.Execute(ref, seg2)
hausdorff_distance_filter.Execute(ref_contour, seg2_contour)
print('\nRegion growing')
print('JI\t' + str(overlap_measures_filter.GetJaccardCoefficient()))
print('DSC\t' + str(overlap_measures_filter.GetDiceCoefficient()))
print('HD\t' + str(hausdorff_distance_filter.GetHausdorffDistance()))
print('ASD\t' + str(hausdorff_distance_filter.GetAverageHausdorffDistance()))
```
%% Cell type:code id: tags:
``` python
```
......
......@@ -7,6 +7,27 @@
"# RCS Summer School"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running on Colab:\n",
"We again need to install dependencies and download the data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"! git clone https://gitlab.doc.ic.ac.uk/np716/rcs-summer-school.git\n",
"! mv rcs-summer-school/utils .\n",
"! pip install SimpleITK==1.2.2 \n",
"! wget https://www.doc.ic.ac.uk/~bglocker/teaching/rcs/data.zip\n",
"! unzip data.zip"
]
},
{
"cell_type": "markdown",
"metadata": {},
......@@ -41,8 +62,12 @@
"source": [
"# Read the meta data using pandas\n",
"import pandas as pd\n",
"import os\n",
"\n",
"data_dir = \"../data/brain/\"\n",
"if 'data' in os.listdir('.'):\n",
" data_dir = \"data/brain/\"\n",
"else:\n",
" data_dir = \"../data/brain/\"\n",
"\n",
"meta_data = pd.read_csv(data_dir + 'meta/clean_participant_data.csv')\n",
"meta_data.head() # show the first five data entries"
......@@ -596,7 +621,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
"version": "3.7.0"
}
},
"nbformat": 4,
......
%% Cell type:markdown id: tags:
# RCS Summer School
%% Cell type:markdown id: tags:
## Running on Colab:
We again need to install dependencies and download the data:
%% Cell type:code id: tags:
``` python
! git clone https://gitlab.doc.ic.ac.uk/np716/rcs-summer-school.git
! mv rcs-summer-school/utils .
! pip install SimpleITK==1.2.2
! wget https://www.doc.ic.ac.uk/~bglocker/teaching/rcs/data.zip
! unzip data.zip
```
%% Cell type:markdown id: tags:
## Age regression from brain MRI
By Ben Glocker, Biomedical Image Analysis Group, Imperial College London
%% Cell type:markdown id: tags:
Predicting age from a brain MRI scan can have diagnostic value for a number of diseases that cause structural changes and damage to the brain. Discrepancy between the predicted, biological age and the real, chronological age of a patient might indicate the presence of disease and abnormal changes to the brain. For this we need an accurate predictor of brain age which may be learned from a set of healthy reference subjects.
The objective for the project is to implement two different supervised learning approaches for age regression from brain MRI. Data from 600 healthy subjects will be provided. Each approach will require a processing pipeline with different components. There are dedicated sections in the Jupyter notebook for each approach which contain some detailed instructions, hints and notes.
%% Cell type:markdown id: tags:
### Getting started and familiarise ourselves with the data
The following cells provide some helper functions to load the data, and provide some overview and visualisation of the statistics over the population of 600 subjects. Let's start by loading the meta data, that is the data containing information about the subject IDs, their age, and gender.
%% Cell type:code id: tags:
``` python
# Read the meta data using pandas
import pandas as pd
import os
data_dir = "../data/brain/"
if 'data' in os.listdir('.'):
data_dir = "data/brain/"
else:
data_dir = "../data/brain/"
meta_data = pd.read_csv(data_dir + 'meta/clean_participant_data.csv')
meta_data.head() # show the first five data entries
```
%% Cell type:markdown id: tags:
Let's have a look at some population statistics.
%% Cell type:code id: tags:
``` python
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.catplot(x="gender_text", data=meta_data, kind="count")
plt.title('Gender distribution')
plt.xlabel('Gender')
plt.show()
sns.distplot(meta_data['age'], bins=[10,20,30,40,50,60,70,80,90])
plt.title('Age distribution')
plt.xlabel('Age')
plt.show()
plt.scatter(range(len(meta_data['age'])),meta_data['age'], marker='.')
plt.grid()
plt.xlabel('Subject')
plt.ylabel('Age')
plt.show()
```
%% Cell type:markdown id: tags:
### Set up a simple medical image viewer and import SimpleITK
%% Cell type:code id: tags:
``` python
import numpy as np
import SimpleITK as sitk
import matplotlib.pyplot as plt
from ipywidgets import interact, fixed
from IPython.display import display
from utils.image_viewer import display_image
```
%% Cell type:markdown id: tags:
### Imaging data
Let's check out the imaging data that is available for each subject. This cell also shows how to retrieve data given a particular subject ID from the meta data.
%% Cell type:code id: tags:
``` python
import glob
# Subject with index 0
ID = meta_data['ID'][0]
age = meta_data['age'][0]
# Data folders
image_dir = data_dir + 'images/'
image_filenames = glob.glob(image_dir + '*.nii.gz')
seg_dir = data_dir + 'segs_refs/'
seg_filenames = glob.glob(seg_dir + '*.nii.gz')
greymatter_dir = data_dir + 'greymatter/'
greymatter_filenames = glob.glob(greymatter_dir + '*.nii.gz')
image_filename = [f for f in image_filenames if ID in f][0]
img = sitk.ReadImage(image_filename)
seg_filename = [f for f in seg_filenames if ID in f][0]
seg = sitk.ReadImage(seg_filename)
greymatter_filename = [f for f in greymatter_filenames if ID in f][0]
gm = sitk.ReadImage(greymatter_filename)
print('Imaging data of subject ' + ID + ' with age ' + str(age))
print('\nMR Image')
display_image(img, window=400, level=200)
print('Segmentation (used in part A)')
display_image(sitk.LabelToRGB(seg))
print('Spatially normalised grey matter maps (used in part B)')
display_image(gm)
```
%% Cell type:markdown id: tags:
## Part A: Volume-based regression using brain structure segmentation
The first approach aims to regress the age of a subject using the volumes of brain tissues as features. The structures include grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF). It is known that with increasing age the ventricles enlarge (filled with CSF), while it is assumed that grey and white matter volume may decrease over time. However, as overall brain volume varies across individuals, taking the absolute volumes of tissues might not be predictive. Instead, relative volumes need to be computed as the ratios between each tissue volume and overall brain volume. To this end, a four-class (GM, WM, CSF, and background) brain segmentation is needed. Here, we assume this has already been implemented and segmentation results are already provided.
Different regression techniques should be explored, and it might be beneficial to investigate what the best set of features is for this task. Are all volume features equally useful, or is it even better to combine some of them and create new features. How does a simple linear regression perform compared to a model with higher order polynomials? Do you need regularisation? How about other regression methods such as regression trees or neural networks? The accuracy of different methods should be evaluated using two-fold cross-validation, and average age prediction accuracy should be compared and reported appropriately.
*Note:* For part A, only the segmentations of the MR images will be used. These can be found in subfolder `segs_refs`. The spatially normalised grey matter maps are used in part B only.
%% Cell type:markdown id: tags:
### TASK A-1: Feature calculation
Start by calculating the three absolute tissue volumes for each subject. Plot the volumes against the subjects' ages. Taking the absolute volumes of tissues as features, however, might not be predictive. Instead, relative volumes need to be computed as the ratios between each tissue volume and overall brain volume. But you might also want to explore using different combinations or even polynomial features.
Implement a function that constructs a big matrix $X$ with a row for each subject and features across the columns. Start with just calculating three simple features of relative tissue volumes for GM, WM and CSF, and compare these to the absolute volumes plotted above.
%% Cell type:code id: tags:
``` python
## CALCULATE ABSOLUTE TISSUE VOLUMES
import os
seg_dir = data_dir + 'segs_refs/'
vols = np.zeros((3,meta_data['ID'].count()))
for i in range(meta_data['ID'].count()):
ID = meta_data['ID'][i]
seg_filename = seg_dir + ID + '.nii.gz'
if os.path.exists(seg_filename):
print('Reading segmentation ' + seg_filename)
seg = sitk.ReadImage(seg_filename)
seg_array = sitk.GetArrayFromImage(seg)
vols[0,i] = np.sum(seg_array.flatten() == 1)
vols[1,i] = np.sum(seg_array.flatten() == 2)
vols[2,i] = np.sum(seg_array.flatten() == 3)
```
%% Cell type:markdown id: tags:
Plot features versus age.
%% Cell type:code id: tags:
``` python
plt.scatter(vols[0,:],meta_data['age'], marker='.')
plt.scatter(vols[1,:],meta_data['age'], marker='.')
plt.scatter(vols[2,:],meta_data['age'], marker='.')
plt.grid()
plt.title('Unnormalised')
plt.xlabel('Volume')
plt.ylabel('Age')
plt.legend(('CSF','GM','WM'))
plt.show()
```
%% Cell type:code id: tags:
``` python
## CALCULATE RELATIVE TISSUE VOLUMES
vols_normalised = vols / np.sum(vols, axis=0)
plt.scatter(vols_normalised[0,:],meta_data['age'], marker='.')
plt.scatter(vols_normalised[1,:],meta_data['age'], marker='.')
plt.scatter(vols_normalised[2,:],meta_data['age'], marker='.')
plt.grid()
plt.title('Normalised')
plt.xlabel('Volume')
plt.ylabel('Age')
plt.legend(('CSF','GM','WM'))
plt.show()
```
%% Cell type:code id: tags:
``` python
X = vols_normalised.T
y = meta_data['age'].values.reshape(-1,1)
print(X.shape)
print(y.shape)
```
%% Cell type:markdown id: tags:
### TASK A-2: Age regression and cross-validation
Experiment with different regression methods from the [scikit-learn toolkit](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning). Remember to construct the output vectur $y$ containing the age for each of the subjects.
Evaluate the methods using two-fold [cross-validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation) where the dataset of 600 subjects is split into two equally sized sets $(X_1,y_1)$ and $(X_2,y_2)$ which are used for training and testing in an alternating way (so each set is used as $(X_{\text{train}},y_{\text{train}})$ and $(X_{\text{test}},y_{\text{test}})$ exactly once).
Try using at least three different regression methods, and generate a plot allows easy comparison of the performance of the three methods. Useful [error metrics](https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics) to report include mean absolute error and r2 score. You might also want to plot the real vs predicted ages.
*Note:* These [scikit-learn examples](https://scikit-learn.org/stable/auto_examples/) might serve as an inspiration.
*Hint:* Be careful how you split the dataset into two folds. Take into account the data characteristics shown at the top of the notebook.
%% Cell type:code id: tags:
``` python
from sklearn.model_selection import cross_val_predict
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn.pipeline import Pipeline
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
regressor = Pipeline([('poly', PolynomialFeatures(degree=3)),('linear', linear_model.LinearRegression())])
#regressor = linear_model.LinearRegression()
kfold = KFold(n_splits=2, shuffle=True, random_state=0)
predicted = cross_val_predict(regressor, X, y, cv=kfold)
print('mean absolute error: {0}'.format(mean_absolute_error(y,predicted)))
print('r2 score: {0}'.format(r2_score(y,predicted)))
fig, ax = plt.subplots()
ax.scatter(y, predicted, marker='.')
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=2)
ax.set_xlabel('Real Age')
ax.set_ylabel('Predicted Age')
plt.show()
```
%% Cell type:markdown id: tags:
## Part B: Image-based regression using grey matter maps
The second approach will make use of grey matter maps that have been already extracted from the MRI scans and aligned to a common reference space to obtain spatially normalised maps. For this, we have used an advanced, state-of-the-art neuroimaging toolkit, called SPM12. The reference space corresponds to the commonly used MNI atlas.
Because these grey matter maps are spatially normalised (ie., registered), voxel locations across images from different subjects roughly correspond to the same anatomical locations. This means that each voxel location in the grey matter maps can be treated as an individual feature. Because those maps are quite large at their full resolution there would be a very large number of features to deal with (more than 850,000). A dimensionality reduction using PCA may need to be performed before training a suitable regressor on the low-dimensional feature representation obtained with PCA. It might also be beneficial to apply some pre-processing (downsampling, smoothing, etc.) before running PCA, which should be explored. The implemented pipeline should be evaluated using two-fold cross-validation using the same data splits as in part A, so the two different approaches can be directly compared in terms average age prediction accuracy.
*Note:* For part B, only the spatially normalised grey matter maps should be used.
%% Cell type:markdown id: tags:
### TASK B-1: Pre-processing
Before running PCA to reduce the dimensionality of the feature space for grey matter maps, it might be beneficial to run some pre-processing on the maps. In voxel-based analysis where each voxel location is a feature, it is common to apply some smoothing beforehand. This is to reduce noise and to compensate for errors of the spatial normalisation that had been applied to the maps.
Because the maps are quite large, it might also be worthwile to explore whether downsampling could be performed even before PCA. This would further reduce the dimensionality, and might be even needed in the case where PCA on the orignial resolution runs into memory issues. You may want to consider other ways of pre-processing and you can find insipiration in the notebook on medical image computing `1-Medical-Image-Computing.ipynb`.
Implement a function that performs suitable pre-processing on each grey matter map.
*Hint:* You may want to save the pre-processed maps using `sitk.WriteImage` to avoid recomputation each time you run the notebook.
%% Cell type:code id: tags:
``` python
def resample(img, new_size=None, new_spacing=None):
old_size = img.GetSize()
old_spacing = img.GetSpacing()
if new_size is None and new_spacing is None:
return img
if new_size is None:
# Compute new image dimensions based on the desired rescaling of the voxel spacing
new_size = [int(np.ceil(old_size[d] * old_spacing[d] / new_spacing[d])) for d in range(3)]
if new_spacing is None:
# Compute new voxel spacing based on the desired rescaling of the image dimensions
new_spacing = [old_spacing[d] * old_size[d] / new_size[d] for d in range(3)]
# Smooth the input image with anisotropic Gaussian filter
img_smoothed = img
for d in range(3):
# Note how the blurring strength can be different in each direction,
# if the scaling factors are different.
factor = new_spacing[d] / old_spacing[d]
sigma = 0.2 * factor
img_smoothed = sitk.RecursiveGaussian(img_smoothed, sigma=sigma, direction=d)
# Finally, apply the resampling operation
img_resampled = sitk.ResampleImageFilter().Execute(
img_smoothed, # Input image
new_size, # Output image dimensions
sitk.Transform(), # Coordinate transformation. sitk.Transform() is a dummy identity transform,
# as we want the brain to be in exactly the same place. When we do image registration,
# for example, this can be a linear or nonlinear transformation.
sitk.sitkLinear, # Interpolation method (cf. also sitk.sitkNearestNeighbor and many others)
img.GetOrigin(), # Output image origin (same)
new_spacing, # Output voxel spacing
img.GetDirection(), # Output image orientation (same)
0, # Fill value for points outside the input domain
img.GetPixelID()) # Voxel data type (same)
return img_resampled
```
%% Cell type:code id: tags:
``` python
def preproc(img):
img_smooth = sitk.DiscreteGaussian(img, 8.0)
img_resampled = resample(img_smooth, new_spacing=[4, 4, 4])
return img_resampled
gm_preproc = preproc(gm)
display_image(gm_preproc)
```
%% Cell type:code id: tags:
``` python
import os
preproc_dir = data_dir + 'preproc/'
if not os.path.exists(preproc_dir):
os.makedirs(preproc_dir)
img_data = []
img_size = []
for i in range(meta_data['ID'].count()):
ID = meta_data['ID'][i]
preproc_filename = preproc_dir + ID + '.nii.gz'
if not os.path.exists(preproc_filename):
greymatter_filename = [f for f in greymatter_filenames if ID in f][0]
gm = sitk.ReadImage(greymatter_filename)
gm_preproc = preproc(gm)
sitk.WriteImage(gm_preproc, preproc_filename)
else:
gm_preproc = sitk.ReadImage(preproc_filename)
img_array = sitk.GetArrayFromImage(gm_preproc)
img_size = img_array.shape
img_data.append(img_array.flatten())
img_data = np.array(img_data, dtype=np.float)
```
%% Cell type:code id: tags:
``` python
X = img_data
y = meta_data['age'].values.reshape(-1,1)
print(img_size)
print(X.shape)
print(y.shape)
```
%% Cell type:markdown id: tags:
### TASK B-2: Dimensionality reduction
Implement dimensionality reduction for grey matter maps using [scitkit-learn's PCA](http://scikit-learn.org/stable/modules/decomposition.html#pca). PCA has an option to set the percentage of variance to be preserved (by setting the parameter `n_components` to a value between 0 and 1). The number of principal modes, that is the new dimensionality of the data, is then automatically determined. Try initially to preserve 95% of the variance (`n_components=0.95`).
*Note:* When dimensionality reduction is used as pre-processing step for supervised learning, as in this case, it is important that PCA is fitted to the training data only, but then applied to both the training and testing data. So make sure your implementation consists of two separate steps, 1) fitting the PCA model to $X_{\text{train}}$ (using the `fit` function), and 2) applying dimensionality reduction to $X_{\text{train}}$ and $X_{\text{test}}$ using the `transform` function.
%% Cell type:code id: tags:
``` python
import numpy as np
from sklearn import decomposition
def run_pca(X_train, X_test):
pca = decomposition.PCA(n_components=0.95, whiten=False)
X_train_prime = pca.fit_transform(X_train)
X_test_prime = pca.transform(X_test)
mu = pca.mean_
U = pca.components_.T
D = pca.singular_values_**2 / (X_train.shape[0] - 1)
fig, ax = plt.subplots()
ax.plot(np.cumsum(pca.explained_variance_ratio_))
ax.set_xlabel('Mode')
ax.set_ylabel('Retained Variance')
plt.show()
return X_train_prime, X_test_prime, mu, U, D
```
%% Cell type:markdown id: tags:
### TASK B-3: Age regression and cross-validation
Experiment with different regression methods from the [scikit-learn toolkit](http://scikit-learn.org/stable/supervised_learning.html#supervised-learning). Evaluate the methods using two-fold [cross-validation](http://scikit-learn.org/stable/modules/cross_validation.html#cross-validation) in the same way as for your approach in Part A so results can be directly compared. Generate the similar plots.
Try using at least three different regression methods.
*Hint:* Remember, when you use cross-validation where you swap training and testing sets in each fold, you need to fit PCA to the training set of each fold.
%% Cell type:code id: tags:
``` python
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
from sklearn import linear_model
regressor = linear_model.LinearRegression()
kfold = KFold(n_splits=2, shuffle=True, random_state=0)
predicted = np.zeros(y.shape)
mu = []
U = []
D = []
for train_index, test_index in kfold.split(X):
X_train = X[train_index,:]
y_train = y[train_index]
X_test = X[test_index,:]
y_test = y[test_index]
X_train_prime, X_test_prime, mu, U, D = run_pca(X_train, X_test)
print('Size before PCA: ' + str(X_train.shape))
print('Size after PCA: ' + str(X_train_prime.shape))
regressor.fit(X_train_prime, y_train)
p = regressor.predict(X_test_prime)
predicted[test_index] = p
mode1 = U[:,0] * np.sqrt(D[0])*3;
mode2 = U[:,1] * np.sqrt(D[1])*3;
mode3 = U[:,2] * np.sqrt(D[2])*3;
display_image(sitk.GetImageFromArray(mode1.reshape(img_size)), colormap='jet')
display_image(sitk.GetImageFromArray(mode2.reshape(img_size)), colormap='jet')
display_image(sitk.GetImageFromArray(mode3.reshape(img_size)), colormap='jet')
print('mean absolute error: {0}'.format(mean_absolute_error(y,predicted)))
print('r2 score: {0}'.format(r2_score(y,predicted)))
fig, ax = plt.subplots()
ax.scatter(y, predicted, marker='.')
ax.plot([y.min(), y.max()], [y.min(), y.max()], 'k--', lw=2)
ax.set_xlabel('Real Age')
ax.set_ylabel('Predicted Age')
plt.show()
```
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment