.. _intro_fast_reconstruction: Fast SIRT reconstruction using PyTorch on the GPU ================================================= .. note:: To keep the size of the documentation in version control manageable, we do not include images of the volume and the reconstruction. It is recommended to follow along in your favorite python environment so you can see what is going on. In the :ref:`previous tutorial `, we implemented SIRT reconstruction using NumPy. This was not very fast, because intermediate data had to be moved back and forth to the GPU during the execution of the algorithm. In this tutorial, we implement SIRT using PyTorch, which can compute on both the CPU and the GPU. Before you continue, make sure to :ref:`install pytorch `. To get a better feel for the difference in speed, we use a larger volume and detector: .. testcode:: session import tomosipo as ts import numpy as np N = 256 M = (N * 3) // 2 vg = ts.volume(shape=(N, N, N)) pg = ts.parallel(angles=M, shape=(M, M)) A = ts.operator(vg, pg) Torch support is automatic if it has been installed in the host environment. To use torch, we import it: .. testcode:: session :skipif: (not cuda_available) or (not torch_available) import torch Using torch, we recreate the hollow cube phantom and forward project to obtain the sinogram stack `y`: .. testcode:: session :skipif: (not cuda_available) or (not torch_available) x = torch.ones(A.domain_shape) x[8:-8, 8:-8, 8:-8] = 0.0 y = A(x) Note that torch uses `float32` values by default, so we do not have to specify `dtype=torch.float32` explicitly. Also, `y` is a torch tensor, because the input to the operator `A` is a torch tensor: .. doctest:: session :skipif: (not cuda_available) or (not torch_available) >>> y.dtype torch.float32 Now we prepare the preconditioning matrices `R` and `C` using torch: .. testcode:: session :skipif: (not cuda_available) or (not torch_available) R = 1 / A(torch.ones(A.domain_shape)) torch.clamp(R, max=1 / ts.epsilon, out=R) C = 1 / A.T(torch.ones(A.range_shape)) torch.clamp(C, max=1 / ts.epsilon, out=C) Next, we reconstruct from the sinogram stack `y` into `x_rec`: .. testcode:: session :skipif: True num_iterations = 50 x_rec = torch.zeros(A.domain_shape) for i in range(num_iterations): x_rec += C * A.T(R * (y - A(x_rec))) This code is in fact not much faster than the NumPy code from the previous tutorial. We still use tensors that are stored "on the CPU", i.e., system RAM. We can create a reconstruction function that works on tensors whose `device` location is either the CPU or GPU: .. testcode:: session :skipif: (not cuda_available) or (not torch_available) def sirt(A, y, num_iterations=10): dev = y.device R = 1 / A(torch.ones(A.domain_shape, device=dev)) torch.clamp(R, max=1 / ts.epsilon, out=R) C = 1 / A.T(torch.ones(A.range_shape, device=dev)) torch.clamp(C, max=1 / ts.epsilon, out=C) x_rec = torch.zeros(A.domain_shape, device=dev) for i in range(num_iterations): x_rec += C * A.T(R * (y - A(x_rec))) return x_rec .. testcode:: session :skipif: True from timeit import default_timer as timer y_cpu = y y_gpu = y.to("cuda") start_cpu = timer() sirt(A, y_cpu) end_cpu = timer() start_gpu = timer() sirt(A, y_gpu) end_gpu = timer() print(f"cpu : {end_cpu - start_cpu:0.2f} seconds") print(f"gpu : {end_gpu - start_gpu:0.2f} seconds") :: cpu : 4.95 seconds gpu : 2.46 seconds As you can see, the GPU code is almost twice as fast! The SIRT algorithm is implemented in the `ts_algorithms `_ package with some additional optimizations. This package contains some well-tested reconstruction algorithms for use with `tomosipo`.