Applications¶

Various forms of corruption in images are investigated
A U-Net type architecture is employed where the input and target have been independently corrupted using the same type of corruption
It seems that during training inputs and targets are with randomly chosen parameters each time they are fed to the model whilst during validation so that it is different each time whilst at validation the corruption random is with fixed parameters.

Random pixel noise and $L_2$ loss¶

Gaussian noise
- White Gaussian noise corruptipn is where each pixel $x_i$ in image is replaced with $\hat{x}_i = x + \epsilon$ where $\epsilon \sim \mathcal{N}(0, \sigma^2)$
- Brown Gaussian noise corruption is where interpixel correlation is introduced by blurring the white Gaussian noise image using a spatial Gaussian filter
Poisson noise is where $x_i$ is replaced by a pixel drawn from a Poisson distribution with mean $x_i'$
Bernouilli noise is where $x_i$ is deleted at random according to a Bernouilli distribution with probability $p$ and for this noise gradients are not backpropagated into the missing pixels
On average across different datasets, noise2noise yields results comparable to training with clean targets (31.63 dB/31.61 dB clean/noisy for Gaussian, 30.59 dB/30.57 dB for Poisson and 31.85 dB/32.02 dB for Bernouilli)
For all these methods the $L_2$ loss works because the expected value of the noisy image will be the original image:
- This is the case for Bernouilli noise as well because of how the gradients are masked for missing pixel.
- The network "sees" each pixel in a fraction $p$ of the $N$ times approximately it sees the image.
- The average is therefore across $pN$ pixels rather $N$ pixels and each time the pixel has the same value so the average is the original image.

Text removal and $L_1$ loss¶

Text of different colours and font styles is overlaid on top of the image.
With any reasonable quantity of overlaid text a pixel will retain its original value more often than not across iterations

$$p(\hat{x}_i = x_i) \geq p(\hat{x}_i \neq x_i) \implies p(\hat{x}_i = x_i) + p(\hat{x}_i = x_i) \geq p(\hat{x}_i = x_i) + p(\hat{x}_i \neq x_i) = 1 \implies p(\hat{x}_i = x_i) \geq \frac{1}{2}$$

The median of a distribution $m$ is defined as the value which satisfies

$$p(x \geq m) \geq \frac{1}{2} \
```
p(x \leq m) \geq \frac{1}{2} $$
```
Thus $x_i$, the original pixel value, is the median of $\hat{x}_i$ since the following inequalities hold $$p(\hat{x}_i \geq x_i) = p(\hat{x}_i \gt x_i) + p(\hat{x}_i = x_i) \geq \frac{1}{2}\\ p(\hat{x}_i \leq x_i) = p(\hat{x}_i \lt x_i) + p(\hat{x}_i = x_i) \geq \frac{1}{2}$$
So for this type of corruption the $L_1$ loss works better than the $L_2$ loss as the latter leads to an averaging over the original pixel value and the unrelated text colours.
The performance using $L_1$ is 35.75 dB which is close to 35.82 dB with clean targets.

Random-valued impulse noise and $L_0$ loss¶

The image with pixel values normalised to such that $x_i \in [0,1]$ is perturbed according to the following distribution:

$$p(\hat{x}_i) = \left{\begin{array}{ll}

             1-p \text{, }\text{  }\hat{x}_i  = x_i \\
             p \text{, }\text{  }\hat{x}_i \in [0,x_i) \cup (x_i,1] \
          \end{array}
        \right.$$

Actually for each pixel in an RGB image we need three values but they will be independently sampled so we can analyse each pixel in each channel separately
To see that this is a distribution

$$\int_0^1 p(\hat{x}_i)\cdot d\hat{x}_i = \int_0^{x_i} p \cdot d\hat{x}_i+ (1 - p) + \int_{x_i}^1 p\cdot d\hat{x}_i = px_i + (1 - p) + p(1 - x_i) = 1 - p + p = 1$$

$x_i$ is the mode of this distribution
To see this intuitively note that $p(\hat{x}_i = x_i) = 1-p$ but for any other $x_j \neq x_i$ the probability of the small region between $x_j - \frac{\Delta}{2}$ and $x_j + \frac{\Delta}{2}$ will be $p\Delta \ll 1 - p$
For this example therefore the $L_0$ loss, since it is minimised by the mode, works better than $L_1$ or $L_2$.
Performance with $L_0$ is 28.43 dB which is comparable to 28.86 dB using clean targets.

Monte Carlo rendered images¶

Monte Carlo path tracing is used to generate physically accurate rendering of virtual environments.
It involves drawing random sequences of scattering events (or light paths) that connect light sources and virtual sensors in the sense, and the radiance carried by them is integrated across all paths.
Noise is difficult to get rid off as distribution can be complex for various reasons:
- Varies from pixel to pixel
- Depends a lot on scene configuration and rendering parameters
- Possibly arbitrarily multi-modal
- Sometimes extremely long-tailed with rare outliers
Pixel luminances can vary significantly so they are compressed to a fixed range using a non-linear function.
Non-linearity of this function makes MSE loss unsuitable so a different loss function more appropriate for high dynamic range images is used $$ \text{L}_\text{HDR} = \frac{\left(f_\theta\left(\hat{x}\right) − \hat{y}\right)^2}{\left(f_\theta\left(\hat{x}\right) + 0.01\right)^2}$$
Using a fixed set of images, it takes about twice as long using noisy images to get similar performance as using clean ones (31.83 dB).
However it is much faster to render noisy images so there seems to be a quality/speed tradeoff.
In an online setting using noisy targets yielded improvements almost as good as clean ones (values are not given but from the plot in Figure 8 looks both seem to be around 30dB with clean marginally higher) but significantly faster.

MRI¶

MRI produces 3D images essentially by sampling the Fourier transform of the signal