How I made sense of dimensionality reduction for images - draft

Dimensionality reduction is an interesting topic in Machine Learning, and most of us have read about PCA (Principal Component Analysis) as a prime example. PCA is a linear dimensionality reduction technique and if you need a refresher, please visit here.</p

When I first got to know about non-linear dimensionality techniques like LLE (Locally Linear Embedding), I was blown away by the mathematical theory behind it. I am making an attempt to summarize all that for you here.

Think about images - they are a multi-dimensional matrix of pixels. Each R,G or B pixel has an intensity value from 0 to 255, i.e. can take 256 values.

Let’s say we have an image of 1x1, i.e. 1 pixel. The total number of images possible is = 256x256x256 (each pixel can take 256 values for each of the red, green, and blue channels), which means there are 2**24 images of 1x1 possible - that’s quite a lot.

But who works with 1x1 images, so let’s take a more realistic example. 256x256 images - there can be 2^(8+8)*2^(24) images or 2^40 ~ 1088 billion total images possible of size 32x32. This is a huge number. One of the largest image datasets - Imagenet, has 14 million images = 14*10^6 images (with differing sizes).

Now, I like to imagine there are many images possible in billions of these images. We can have the image of different cars, different hand gestures, different kinds of zebras, or whatever is possible in 256x256 resolution.

Let’s say we are just looking at images of hand gestures.

TBC...

← Previous Post Next Post →