Colorizing the Prokudin-Gorskii photo collection

Introduction

Sergei Mikhailovich Prokudin-Gorskii was convinced that color photography was the wave of the future, so he won Tzar's special permission to travel across the vast Russian Empire and take color photographs of everything he saw, including the only color portrait of Leo Tolstoy. He recorded three exposures of every scene onto a glass plate using a red, a green, and a blue filter. In this project, we will take the digitized Prokudin-Gorskii glass plate images and, using image processing techniques, automatically produce a color image. Specifically, the code will extract the three color channel images, place them on top of each other, and align them to form a single RGB color image.

Single Scale Alignment

For images with lower resolutions, the easiest way to align the three-channel images is to exhaustively search over a window of possible displacements and score each using a metric. We can then take the displacement with the best score. As suggested in the project description, I used a displacement range of [-15,15] pixels. I experimented with the Sum of Absolute Difference (SAD) and Normalized Cross Correlation (NCC) regarding the scoring metric. I found that SAD performed better for the single-scale alignment. However, the images were still not very well aligned. To further improve performance, I decided to crop the borders of the images by 10%. Furthermore, I also parallelized the code using a ThreadPoolExecutor and the map function, which improved the speed by a decent margin.

Low Resolution Image Gallery

cathedral — R Shift: (3, 11), G Shift: (2, 5)

monastery — R Shift: (2, 3), G Shift: (2, -3)

tobolsk — R Shift: (3, 6), G Shift: (3, 3)

Multi Scale Alignment

However, the exhaustive search would become too computationally expensive for images with higher resolutions and larger displacements. To combat this issue, I implemented an Image pyramid that searches for shifts in multiple scales.

Algorithm Steps

Downscale both the reference image and the image to align using multiple scaling factors. Starting with the coarsest (smallest) image and working to the finest (full resolution).
Similarly to the single scale implementation, crop 10% of the image borders at each scale to focus on the central region.
At each scale, search for the best displacement (shift) by trying different pixel shifts in height and width directions.
For each shift, calculate a similarity score using normalized cross-correlation.
Once the best displacement is found at a coarse scale, adjust the search range for the next finer scale using the best shift found so far.
After refining the alignment at all scales, apply the final best displacement to the original image to align it with the reference image.

As before, I also parallelized the code using a ThreadPoolExecutor to improve the speed to roughly 30s per picture, whereas before, it would take up to 2 minutes.

High Resolution Image Gallery

emir — R Shift: (-240, 96), G Shift: (24, 48)

church — R Shift: (-8, 56), G Shift: (0, 24)

harvesters — R Shift: (16, 120), G Shift: (16, 56)

R Shift: (24, 88), G Shift: (16, 40)

lady — R Shift: (8, 112), G Shift: (8, 48)

melons — R Shift: (8, 176), G Shift: (8, 80)

onion_church — R Shift: (40, 104), G Shift: (24, 48)

sculpture — R Shift: (-24, 136), G Shift: (-8, 32)

self_portrait — R Shift: (40, 176), G Shift: (32, 80)

three_generations — R Shift: (8, 112), G Shift: (16, 56)

train — R Shift: (32, 88), G Shift: (8, 40)

Notes:

One thing to notice is that the Emir picture does not seem to have aligned properly, while the rest of the images look good. This is most likely because the blue plate was used as the reference for matching, In this case, there was a very high amount of blue but a meager amount of green. So, aligning the red and blue to the green fixes the issue in this case.

Bells & Whistles

To fix the issue with the Emir picture, I used the Structural Similarity Index Measure (SSIM) as the scoring metric. This fixed the issue with the Emir picture, which can be seen below. Furthermore, the shifts seemed fairly the same for the rest of the images. Due to this, using NCC is generally much faster than SSIM, but SSIM produces better results.

NCC vs SSIM

emir ncc image — NCC Picture: R Shift: (-240, 96), G Shift: (24, 48)

emir ssim image — SSIM Picture: R Shift: (40, 104), G Shift: (24, 48)

three_generations ncc image — NCC Picture: R Shift: (8, 112), G Shift: (16, 56)

three_generations ssim image — SSIM Picture: R Shift: (8, 112), G Shift: (16, 48)

Programming Project #1: Colorizing the Prokudin-Gorskii photo collection

By Sai Kolasani