Dimensionality reduction
Fitting an autoencoder by hand¶
Let’s say you have the following dataset, , shown below.
**Sample** Height Weight
------------ -------- --------
#1 1 2
#2 1 4
#3 1 6
#4 1 13
Say you use this data and find an optimal linear autoencoder with one latent dimension. You get two functions, , and .
What will be the value of
(Hint: it may be helpful to try to explicitly find suitable functions .)
Solutions
Zero. Here is a suitable autoencoder.
The dataset has no diversity in the Height variable, so, as far as this dataset is concerned, we can summarize everything about each sample by looking at Weight alone.
Reconstruction error¶
You have been given an autoencoder:
What is the squared reconstruction error for the point ?
Solutions
and . So the squared reconstruction error is .
PCA and scaling¶
You have a dataset with samples and features per sample. You run PCA to get a 2d latent representations (summaries) for each point. You store these summaries in a matrix, . Then you standardize your dataset with the standard scaler, run PCA on the standardized data, and get a different matrix of summaries . Which is true?
.
There exists a matrix such that .
None of the above.
Solutions
None of the above is correct. Running PCA on standardized data will give you fundamentally different latent representations. One way to see why is to consider that PCA minimizes the mean squared reconstruction error---and that reconstruction error is measured in terms of euclidean distance in the feature space. If you rescale the feature space, you’re changing what distance means, and you’re changing what is considered a “large” error.
Variance explained¶
Let be a random variable. Let and . Say you have fit an autoencoder . Say the mean squared reconstruction error is given by . What is the formula for the proportion of variance explained by the autoencoder?
Solutions
.