# Inverse Problem (Part 2)

In the last post I have written about inverse problems. A simplified toy example was presented, which showed you how to translate this problem into an optimization problem. Optimization problems can be solved with multiple algorithms, e.g. gradient descent or evolutionary algorithms.

This article presents a more sophisticated inverse problem. We want to classify images of digits.

In general inverse problems have causal factors from which observations arise. The test images are our observations. In our classification problem there are two kinds of forward models:

- The first forward problem is the projection from the latent space (I use the words “causal factors” and “latent space” interchangeably) to the image space. This is modeled as being a linear combination of example images and the latent vector. The latent space has 10 dimensions. Each dimension is represented as an example image of each class.
- The second forward problem is to get from the latent space to the class prediction. Here it is simply the argmax operation. In other cases it could be a more complex model.

Defining the forward problems in this way, leads to a latent space, in which each dimension describes the weight of the corresponding class. It is not a probability distribution, since the values don’t sum up to one.

More complex causal factors, like translation and orientation could be modeled into the latent space as well. This isn’t done here in this article to keep everything simple.

```
%matplotlib inline
from sklearn import datasets
import matplotlib.pylab as plt
digits = datasets.load_digits(10)
f, ax = plt.subplots(2, 5, sharex='col', sharey='row', figsize=(10, 8))
for i in range(2):
for j in range(5):
ax[i, j].imshow(
digits.images[i * 5 + j, :], cmap='gray_r')
```

These images are stacked into the model matrix. Each example image is flattened and represented as a row in the model matrix.

To show that this approach actually work, we want to classify the image of a zero, which is not contained in the model matrix.

```
import numpy as np
cur_causal_factors = np.zeros(10)
print('cur_causal_factors: {}'.format(cur_causal_factors.shape))
model_params = digits.images[:10].reshape((10, -1))
print('model_params: {}'.format(model_params.shape))
observation = digits.images[10].reshape(-1)
print('observation: {}'.format(observation.shape))
plt.imshow(observation.reshape((8, 8)), cmap='gray_r')
```

Solving this inverse problem requires modeling the forward problem. In this example the forward problem is simply modeled as the matrix multiplication of the model matrix and flattened test image. The loss is defined as the mean squared error of the difference between the observation and the forward projected causal factors.

I’m using the Einstein notation here. The Einstein notation is implemented as a domain specific language, which abstracts away many vector, matrix and tensor operations. E.g. matrix multiplication, matrix transpose and dot product. A well written blog post about this topic can be found here.

Solving the optimization problem with gradient descent leads to the best prediction of the causal factors.

```
def forward(causal_factors):
simulation = np.einsum('i,ij->j', causal_factors, model_params)
return simulation
def loss(observation, causal_factors):
# mean squared error
return np.mean((observation - forward(causal_factors))**2)
def derivative_loss(observation, causal_factors):
n = observation.shape[0]
return -2.0 / n * np.einsum('i,ji->j', (observation - forward(causal_factors)), model_params)
alpha = 5e-6
losses = [loss(observation, cur_causal_factors), ]
for i in range(100):
dL = -2 * np.einsum('i,ji->j', (observation - forward(cur_causal_factors)), model_params)
cur_causal_factors = cur_causal_factors - alpha * dL
losses.append(loss(observation, cur_causal_factors))
plt.plot(losses)
plt.grid()
plt.xlabel('iteration')
plt.ylabel('loss')
plt.figure()
```

In the next plot we see the result of the inverse problem. The class is predicted correctly as zero, because the zero dimension of the latent space has the highest value for the observed image.

```
prediction = np.argmax(cur_causal_factors)
print('predicted class: {}'.format(prediction))
plt.bar(range(10), cur_causal_factors)
plt.grid()
```

The next figure shows the results. At the left is the actual observed test image. The middle figure shows the projection of the resulting latent space into the image space. It can be seen, that the image is reproduced quite well. Since we have modeled the latent space (10 dimensions) to be smaller than the image space (64 dimensions) it is not possible to reproduce the observed image perfectly. The difference of both images is shown in the right figure.

```
f, ax = plt.subplots(1, 3, figsize=(10, 4))
ax[0].imshow(observation.reshape((8, 8)), cmap='gray_r')
ax[0].set_title('Model')
ax[1].imshow(forward(cur_causal_factors).reshape((8, 8)), cmap='gray_r')
ax[1].set_title('Forward Projection')
ax[2].imshow(forward(cur_causal_factors).reshape((8, 8)) - observation.reshape((8, 8)), cmap='gray_r')
ax[2].set_title('Difference')
```

We have seen, that classification problems can be translated into inverse problems and can be solved with gradient based optimization procedures. However, for classification there is not so much value in this approach, because the forward model has to be defined. If we are talking about classification, we are also talking about training the model with data. This is in fact possible. I show you in the next article how this is done. It is possible to use the same methods as we have seen already and to solve it as an optimization problem.