Graph Signal Processing  Background
Last modified: 20190314
A gentle but rather lengthy introduction to key concepts (graph Laplacians, eigenvalues/eigenvectors, matrix mathematics, etc.) that are needed to comprehend this domain.
Main Content
 Background
 Introduction to Graph Signal Processing
 Applications of graph Signal Processing
 Relevant Excerpts
 References
This Page
 Matrix / GNU Octave Basics
 Adjacency Matrix
 Degree Matrix
 The Graph Laplacian
 Fourier Transform
 Eigenvalues and Eigenvectors
 Calculating the Determinant
 Identity Matrix
 Calculating the Eigenvalues
 Calculating the Eigenvectors (Eigenspace)
 Sources
Background  Introduction  Applications  Excerpts  References
[Background]
Adjacency Matrix
Suppose we have this graph,
[Image source. Click image to open in new window.]By listing the nodes in the graph in a $\small [row \times column]$ matrix, we can easily represent that graph as an adjacency matrix (shown above) – placing a $\small 1$ in the matrix anytime two nodes connect with one another, $\small 0$ otherwise. Thus, in the image above, node $\small A$ connects with nodes $\small B$, $\small C$, and $\small D$ – so we place $\small 1$’s in those positions in the $\small A$ row (likewise for the $\small A$ column, which simply mirrors the $\small A$ row).
In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
Matrices are indexed with the $\small (i,j)$ notation, where $\small i$ is the row position, and $\small j$ is the column position:
[Image source. Click image to open in new window.]Similarly, we can represent this graph,
[Image source (slide 7). Click image to open in new window.]with this adjacency matrix,

$ \mathbf{A} = \begin{bmatrix}
0 & 1 & 1 & 0 \\\
1 & 0 & 1 & 1 \\\
1 & 1 & 0 & 0 \\\
0 & 1 & 0 & 0
\end{bmatrix}$
given these definitions

$\small \mathbf{A} :=
\begin{cases}
\mathbf{A}_{ij} = \ 1 & \text{if there is an edge } e_{ij} \\\
\mathbf{A}_{ij} = 0 & \text{if there is no edge} \\\
\mathbf{A}_{ii} = 0
\end{cases} $
In this example, the first row (column) of the adjacency matrix indicates that node $\small \mathcal{\mathbf{v}}_1$ connects to nodes $\small \mathbf{v}_2$ and $\small \mathbf{v}_3$. … Likewise, the fourth row indicates that node $\small \mathcal{\mathbf{v}}_4$ connects to node $\small \mathbf{v}_2$.
Counting the connections to nodes defines a simple 1to1 weight matrix, in both undirected and directed graphs (adjacency matrices):
[Image source. Click image to open in new window.]Here is a slightly more complicated example, a weighted adjacency matrix for a directed graph (here, replace the empty positions in the matrix with zeros):
Weighted matrices also find use (for example) as the weights ($\small \mathbf{W}$) between nodes in different layers in the feedforward stage of artificial neural networks.
Formally:
 $\small \mathcal{G} = (\mathcal{V}, \mathcal{E})$ is an unweighted graph;
 $\small \mathcal{G} = (\mathcal{V}, \mathcal{E}, \mathbf{w})$ is a weighted graph.
An undirected graph $\small \mathcal{G}$ is represented by a pair $\small (\mathcal{V}, \mathcal{E})$ where $\small \mathcal{V}$ is the set of vertices $\small v_1 ,v_2, \ldots, v_n$ and $\small \mathcal{E}$ the set of edges of the graph. A pair $\small \{i,j\} \in \mathcal{E}$ if vertices $\small v_i$ and $\small v_j$ are adjacent. We denote the edge connecting $\small v_i$ and $\small v_j$ by $\small e_{i,j}$.
A weighted undirected graph is represented by $\small (\mathcal{V}, \mathcal{E},w)$ where $\small w : \mathcal{V} \times \mathcal{V} \to \mathbb{R}$ and satisfies

$\small w_{i,j} = w_{j,i}$
and

$\small w_{i,j} \geq 0$.
There are various operators that can be associated with a graph. The most commonly known operator of a graph $\small \mathcal{G}$ is the adjacency matrix. It is a symmetric matrix defined by

$\small \mathbf{A}_{i,j} =
\begin{cases}
1 & \text{if $\small (i,j) \in \mathcal{E}$} \\\
0 & \text{otherwise}.
\end{cases}$
For a weighted graph ($\small \mathcal{V},\mathcal{E},w$) the definition is similar, the weighted adjacency matrix of the graph

$\small \mathbf{W}_{i,j} =
\begin{cases}
w_{i,j} & \text{if $\small (i,j) \in \mathcal{E}$} \\\
0 & \text{otherwise}.
\end{cases}$
$\small \mathbf{W}$ reduces to $\small \mathbf{A}$ in the case where we restrict the weights to be $\small \{0,1\}$.
[… from / continued in Ch. 2 in Spectral Graph Theory and Deep Learning on Graphs.]
This is neat (excerpted from the pggraphblas repository on GitHub).
Matrix multiplication has a remarkable property of being useful for finding the neighbors of any node in a graph algorithm. By using different combinations of operations (semiring) different graph algorithms can step and accumulate different results, interpreting the data in unique ways, even over the same graphs.
[Image source, adapted fromSource Graphs, Matrices, and the GraphBLAS: Seven Good Reasons. Click image to open in new window.]
When you multiply the transpose of $\small \mathbf{A}$ and $\small \mathbf{v}$, the result vector contains all of the neighboring nodes for the nodes specified in $\small \mathbf{v}$. In this case, $\small \mathbf{v}$ contains a value for node 4, shown in red. By multiplying the matrix by the input vector, the result is the output vector with the 4’s two neighboring nodes, 1 and 3. Iterating this multiplication process produces the most common graph operation: breadthfirst search.
[Refer here for a refresher on matrix [h x w] multiplication.] For the matrix operation above, we have a [7x7] x [7x1] = [7x1] output, calculated as follows:

$\small \begin{equation}
{\begin{bmatrix}
0{\times}0 + 0{\times}0 + 0{\times}0 + \mathbf{\color{Green}1}{\times}\color{Red}1 + 0{\times}0 + 0{\times}0 + 0{\times}0 \\\
1{\times}0 + 0{\times}0 + 0{\times}0 + 0{\times}\color{Red}1 + 0{\times}0 + 0{\times}0 + 0{\times}0 \\\
0{\times}0 + 0{\times}0 + 0{\times}0 + \mathbf{\color{Green}1}{\times}\color{Red}1 + 0{\times}0 + 1{\times}0 + 1{\times}0 \\\
1{\times}0 + 0{\times}0 + 0{\times}0 + 0{\times}\color{Red}1 + 0{\times}0 + 0{\times}0 + 1{\times}0 \\\
0{\times}0 + 1{\times}0 + 0{\times}0 + 0{\times}\color{Red}1 + 0{\times}0 + 0{\times}0 + 1{\times}0 \\\
0{\times}0 + 0{\times}0 + 1{\times}0 + 0{\times}\color{Red}1 + 1{\times}0 + 0{\times}0 + 0{\times}0 \\\
0{\times}0 + 1{\times}0 + 0{\times}0 + 0{\times}\color{Red}1 + 0{\times}0 + 0{\times}0 + 0{\times}0 \end{bmatrix}}
{\begin{matrix}
\\\
\\\
\\\
\mathbf{=} \\\
\\\
\\\
\\\
\end{matrix}}
{\begin{bmatrix}
1 \\\
0 \\\
1 \\\
0 \\\
0 \\\
0 \\\
0
\end{bmatrix}}
\end{equation}$
[Background]
Degree Matrix
While the adjacency matrix describes the basic structure of the graph, the degree matrix ($\small \mathcal{D}$) describes some basic information about each of those nodes (their overall connectivity).
In the mathematical field of graph theory the degree matrix is a diagonal matrix which contains information about the degree of each vertex – that is, the number of edges attached to each vertex. It is used together with the adjacency matrix to construct the Laplacian matrix of a graph.
The degree matrix is an example of diagonal matrix.
Given a graph $\small \mathcal{G = (V,E)}$ with $\small \vert\mathcal{V}\vert = n$, the degree matrix $\small \mathcal{D}$ for $\small \mathcal{G}$ is a $\small [n \times n]$ diagonal matrix defined as

$\small d_{i,j} :=
\begin{cases}
\text{deg}(\mathcal{v}_i) & \text{if } i = j \\\
0 & \text{otherwise}
\end{cases} $
For example,
[Image source. Click image to open in new window.]In that example, node $\small \mathcal{\mathbf{1}}$ has four degrees:
 nodes $\small \mathcal{\mathbf{2}}$ and $\small \mathcal{\mathbf{5}}$ connect to it, and
 since this is an undirected graph, node $\small \mathcal{\mathbf{1}}$ also connects to itself – twice: that selfreferencing edge connects both ways!
So, the node $\small \mathcal{\mathbf{1}}$ position on the diagonal ($\small i = j = 1$) gets the value 4 (and the rest of that row (1) and column (1) gets zero value entries.
Likewise, node node $\small \mathcal{\mathbf{2}}$ has three degrees, so matrix position ($\small i = j = 2$) gets the value 3 (and the rest of that row (2) and column (2) gets zero values. Etc.
Formally:
The weighted degree of a vertex is the total weight of the edges in which it participates. The degree matrix of a graph is defined as

$\small \mathbf{D}_{i,j} =
\begin{cases}
d_i & \text{if $\small i = j$} \\\
0 & \text{otherwise}.
\end{cases}$
Thus, the degree matrix is a diagonal matrix with entries the weighted degree $\small d_i$ of the vertex $\small i$ defined as

$\small \begin{align} d_i = \sum_{j:(i,j) \in \mathcal{E}} w_{i,j} \end{align}$
[… from / continued in Ch. 2 in Spectral Graph Theory and Deep Learning on Graphs.]
[Background]
The Graph Laplacian

"The Laplace operator in its various manifestations is the most beautiful and central object in all of mathematics. Probability theory, mathematical physics, Fourier analysis, partial differential equations, the theory of Lie groups, and differential geometry all revolve around this sun, and its light even penetrates such obscure regions as number theory and algebraic geometry."
[Nelson E (1967) Tensor Analysis, p. 100]
Next, we need to understand the Laplacian of a graph (the graph Laplacian), which is determined by the graph adjacency matrix. The graph Laplacian is later used to derive it’s associated eigenvectors, which in turn are used to interpret and process the signals (values) associated with the nodes of the graph.
While we can use definitions to derive the graph Laplacian (illustrated below), we can also use linear algebra (matrix mathematics) to calculate graph Laplacians (also illustrated below).
Method 1: Rulesbased
While this MIT course note [local copy] formalizes and concisely summarizes these ideas, without a priori knowledge it is hard to comprehend their derivation of the graph Laplacian $\small \mathcal{L}_H$ for adjacency matrix $\small \mathcal{A}_H$ on their page 2, without the material I have provided here. Once you work through this material, the derivation of graph Laplacians should be demystified!
For the MIT example, they give adjacency matrix $\small \mathcal{A}_H$ for graph $\small \mathcal{H}$,

$\small \mathbf{A}_\mathcal{H} = \begin{bmatrix}
0 & 1 & 0 & 1 & 0 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 1 & 0 & 1 & 1 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 0 & 1 & 0 & 0
\end{bmatrix}$
The totals in each row (node) give the degree (number of edges) for those nodes; here: 2, 2, 3, 2, 1.
This is evident in the graph that is specified by that adjacency matrix:
[Plotted online at Graph Online. Click image to open in new window.]
The graph Laplacian in the MIT document,

$\small \mathbf{L}_\mathcal{H} =
\begin{bmatrix}
\phantom{\ \ \ }2 & 1 & \phantom{}0 & 1 & \phantom{}0 \\\
1 & \phantom{}2 & 1 & \phantom{}0 & \phantom{}0 \\\
\phantom{\ \ }0 & 1 & \phantom{}3 & 1 & 1 \\\
1 & \phantom{}0 & 1 & \phantom{}2 & \phantom{}0 \\\
\phantom{\ \ }0 & \phantom{}0 & 1 & \phantom{}0 & \phantom{}1
\end{bmatrix}$
now becomes obvious, when considered along with their Definition 5 (here I substituted $\small \mathcal{G}$ with $\small \mathcal{H}$):

Given an unweighted graph $\small \mathcal{H}$, the Laplacian matrix $\small \mathcal{L} = \mathcal{L}_\mathcal{H}$ is the $\small [n \times n]$ matrix given by

$\small \mathcal{L_{i,\ j}} =
\begin{cases}
1 & \text{if } (i,j) \in \mathcal{E} \\\
\ \ d_i & \text{if } i = j \\\
\ \ 0 & \text{otherwise}
\end{cases} $
Based on those definitions, to derive the Laplacian from the adjacency matrix we:

replace all the edges ($\small 1$) in the adjacency matrix with $\small 1$;

on the diagonal ($\small i=j$) – i.e. nodes $\small 1$, $\small 2$, …, $\small 5$ – we replace those values in the adjacency matrix with that node’s value from the degree matrix;

lastly, we replace the remaining unsubstituted values with $\small 0$.
Graph Laplacians have the following properties:
 symmetric ($\small [n \times n]$)
 nondiagonal entries are nonpositive
 rows (columns) sum to zero
Although succinctly formalized mathematically, that’s a rather cumbersome way of computing the graph Laplacian. However, using linear algebra (matrix mathematics) we can computationally simplify this process.
Method 2: Linear algebra
An equivalent definition for the Laplacian matrix is

$\small \mathcal{L}_\mathcal{H} = \mathcal{D}_\mathcal{H}  \mathcal{A}_\mathcal{H}$
where $\small \mathcal{D}_\mathcal{H}$ is the diagonal matrix with $\small i^{th}$ diagonal entry equal to the degree of node $\small v_i$.
In our MIT example we already have the real value ($\small \mathbb{R}$) adjacency matrix $\small \mathcal{A}_H$,

$\small \mathbf{A}_\mathcal{H} = \begin{bmatrix}
0 & 1 & 0 & 1 & 0 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 1 & 0 & 1 & 1 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 0 & 1 & 0 & 0
\end{bmatrix}$
so we need the diagonal (degree) matrix $\small \mathcal{D}_\mathcal{H}$, which we can determine programmatically from the adjacency matrix (remembering that the row or column totals give those values)

$\small \mathbf{D}_\mathcal{H} = \begin{bmatrix}
2 & 0 & 0 & 0 & 0 \\\
0 & 2 & 0 & 0 & 0 \\\
0 & 0 & 3 & 0 & 0 \\\
0 & 0 & 0 & 2 & 0 \\\
0 & 0 & 0 & 0 & 1
\end{bmatrix}$
Then, by simple elementwise matrix subtraction (remembering of course that matrices must have similar dimensions for addition or subtraction), $\small \mathcal{L}_\mathcal{H} = \mathcal{D}_\mathcal{H}  \mathcal{A}_\mathcal{H}$, we get the graph Laplacian in one simple processing step:

$\begin{equation}
{\begin{bmatrix}
2 & 0 & 0 & 0 & 0 \\\
0 & 2 & 0 & 0 & 0 \\\
0 & 0 & 3 & 0 & 0 \\\
0 & 0 & 0 & 2 & 0 \\\
0 & 0 & 0 & 0 & 1
\end{bmatrix} \atop \mathbf{D}_\mathcal{H}}
{\begin{matrix}
\\\
\\\
\\\
\\\
\mathbf{} \\\
\\\
\\\
\end{matrix} \atop \ }
{\begin{bmatrix}
0 & 1 & 0 & 1 & 0 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 1 & 0 & 1 & 1 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 0 & 1 & 0 & 0
\end{bmatrix} \atop \mathbf{A}_\mathcal{H}}
{\begin{matrix}
\\\
\\\
\\\
\\\
\mathbf{=} \\\
\\\
\\\
\end{matrix} \atop \ }
{\begin{bmatrix}
\phantom{\ }2 & 1 & \phantom{}0 & 1 & \phantom{}0 \\\
1 & \phantom{}2 & 1 & \phantom{}0 & \phantom{}0 \\\
\phantom{}0 & 1 & \phantom{}3 & 1 & 1 \\\
1 & \phantom{}0 & 1 & \phantom{}2 & \phantom{}0 \\\
\phantom{}0 & \phantom{}0 & 1 & \phantom{}0 & \phantom{}1
\end{bmatrix} \atop \mathbf{L}_\mathcal{H}}
\end{equation}$
Interestingly, the Laplacian succinctly shows the connectivity of the adjacency matrix, plus the degree/numbers of connections, plus the locations of those connections (above; below).
[Image source (slide 24). Click image to open in new window.][Background]
Fourier Transform
The Fourier transform is important in mathematics, engineering, and the physical sciences [1]. Its discrete counterpart, the Discrete Fourier Transform (DFT), which is normally computed using the socalled Fast Fourier Transform (FFT), has revolutionized modern society, as it is ubiquitous in digital electronics and signal processing. Radio astronomers are particularly avid users of Fourier transforms because Fourier transforms are key components in data processing (e.g., periodicity searches) and instruments (e.g., antennas, receivers, spectrometers), and they are the cornerstones of interferometry and aperture synthesis.
The Fourier transform is a reversible, linear transform with many important properties [1]. For any function $\small f(x)$ (which in astronomy is usually realvalued, but $\small f(x)$ may be complex), the Fourier transform can be denoted $\small F(s)$, where the product of $\small x$ and $\small s$ is dimensionless. Often $\small x$ is a measure of time $\small t$ (i.e., the timedomain signal) and so $\small s$ corresponds to inverse time, or frequency (i.e., the frequencydomain signal).
The Fourier transform is defined by

$\begin{align} F(s) \equiv \int_{\infty}^{\infty} f(x)e^{2\pi isx}dx \end{align}$
which is usually known as the forward transform, and

$\begin{align} F(x) \equiv \int_{\infty}^{\infty} f(s)e^{2\pi isx}ds \end{align}$
which is the inverse transform. In both cases, $\small i \equiv \sqrt{1}$. Alternative definitions of the Fourier transform are based on angular frequency ($\small \omega =2\pi v$), have different normalizations, or the opposite sign convention in the complex exponential. Since Fourier transformation is reversible, the logical equivalence) symbol $\small \Leftrightarrow$ is often used to mean “is the Fourier transform of ” – e.g., $\small F(s) \Leftrightarrow F(x)$.
The complex exponential is the heart of the transform. A complex exponential is simply a complex number where both the real and imaginary parts are sinusoids. The exact relation is called Euler’s formula,

$e^{i\phi} = cos\phi + (i)sin \phi$
which leads to the famous (and beautiful) Euler’s identity $\small e^{i\pi} + 1 = 0$ that relates five of the most important numbers in mathematics. Complex exponentials are much easier to manipulate than trigonometric functions, and they provide a compact notation for dealing with sinusoids of arbitrary phase, which form the basis of the Fourier transform.
Complex exponentials (or sines and cosines) are periodic functions, and the set of complex exponentials is complete and orthogonal [1]. Thus the Fourier transform can represent any piecewise continuous function and minimizes the leastsquare error between the function and its representation. There exist other complete and orthogonal sets of periodic functions; for example, Walsh functions (square waves) are useful for digital electronics. Why do we always encounter complex exponentials when solving physical problems? Why are monochromatic waves sinusoidal, and not periodic trains of square waves or triangular waves? The reason is that the derivatives of complex exponentials are just rescaled complex exponentials. In other words, the complex exponentials are the eigenfunctions of the differential operator. Most physical systems obey linear differential equations. Thus an analog electronic filter will convert a sine wave into another sine wave having the same frequency (but not necessarily the same amplitude and phase), while a filtered square wave will not be a square wave. This property of complex exponentials makes the Fourier transform uniquely useful in fields ranging from radio propagation to quantum mechanics.

The Fourier transform decomposes a function of time (a signal) into the frequencies that make it up, in a way similar to how a musical chord can be expressed as the frequencies (or pitches) of its constituent notes.

Here is a succinct description:
[Image source. Click image to open in new window.]
From Wikipedia:
The Fourier transform of the function $\small f$ is traditionally denoted by adding a circumflex: $\small \hat{f}$. There are several common conventions for defining the Fourier transform of an integrable function $\small f : \mathbb {R} \to \mathbb{C}$. Here we will use the following definition for any real number $\small \xi$ (Greek letter “xi ”):

$\small \begin{align} \hat{f}(\xi) = \int_{\infty}^{\infty} f(\xi)e^{2\pi ix\xi} dx \end{align}$.
The reason for the negative sign convention in the definition of $\small \hat{f}(\xi)$ is that the integral produces the amplitude and phase of the function $\small f(x)e^{2\pi ix\xi}$ at frequency zero (0), which is identical to the amplitude and phase of the function $\small f(x)$ at frequency $\small \xi$, which is what $\small \hat{f}(\xi)$ is supposed to represent.
When the independent variable $\small x$ represents time, the transform variable $\small \xi$ represents frequency (e.g. if time is measured in seconds, then the frequency is in hertz). Under suitable conditions, $\small f$ is determined by $\small \hat{f}$ via the Fourier inverse transform for any real number $\small x$:

$\small \begin{align} f(x) = \int_{\infty}^{\infty} \hat{f}(\xi)e^{2\pi ix\xi} d\xi \end{align}$,
The statement that $\small f$ can be reconstructed from $\small \hat{f}$ is known as the Fourier inversion theorem, and was first introduced in Fourier’s Analytical Theory of Heat, although what would be considered a proof by modern standards was not given until much later. The functions $\small f$ and $\small \hat{f}$ often are referred to as a Fourier integral pair or Fourier transform pair.
For other common conventions and notations, including using the angular frequency $\small \omega$ instead of the frequency $\small \xi$ … The Fourier transform on Euclidean space is treated separately, in which the variable $\small x$ often represents position and $\small \xi$ represents the momentum. The conventions chosen in this article are those of harmonic analysis, and are characterized as the unique conventions such that the Fourier transform is both unitary on $\small L^2$ and an algebra homomorphism from $\small L^1$ to $\small L^\infty$, without renormalizing the Lebesgue measure.
Many other characterizations of the Fourier transform exist. …
Sources for this subsection
[Background]
Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors play a prominent role in the study of ordinary differential equations and in many applications in the physical sciences [1].
In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes by only a scalar factor when that linear transformation is applied to it. More formally, if $\small T$ is a linear transformation from a vector space $\small V$ over a field $\small F$ into itself and $\small \mathbf{v}$ is a vector in $\small V$ that is not the zero vector, then $\small \mathbf{v}$ is an eigenvector of $\small T$ if $\small T(\mathbf{v})$ is a scalar multiple of $\small \mathbf{v}$. This condition can be written as the equation

$T \mathbf{v} = \lambda \mathbf{v}$,
where $\small \lambda$ is a scalar in the field $\small F$, known as the eigenvalue, characteristic value, or characteristic root associated with the eigenvector $\small \mathbf{v}$.
Definition: let $\small A$ be an $\small [n \times n]$ matrix. The number $\small \lambda$ is an eigenvalue of $\small A$ if there exists a nonzero vector $\small \mathbf{v}$ such that $\small A\mathbf{v} = \mathbf{v}$. In this case, vector $\small \mathbf{v}$ is called an eigenvector of $\small A$ corresponding to $\small \lambda$.
If the vector space $\small V$ is finitedimensional, then the linear transformation $\small T$ can be represented as a square matrix $\small A$, and the vector $\small \mathbf{v}$ by a column vector, rendering the above mapping as a matrix multiplication on the left hand side and a scaling of the column vector on the right hand side in the equation.

$A \mathbf{v} = \lambda \mathbf{v}$

$\lambda$: eigenvalue
$\mathbf{v}$: eigenvector
There is a direct correspondence between $\small [n \times n]$ square matrices and linear transformations from an $\small n$dimensional vector space to itself, given any basis of the vector space. For this reason, it is equivalent to define eigenvalues and eigenvectors using either the language of matrices or the language of linear transformations.
Geometrically an eigenvector, corresponding to a real nonzero eigenvalue, points in a direction that is stretched by the transformation and the eigenvalue is the factor by which it is stretched. If the eigenvalue is negative, the direction is reversed.
In the image below, due to a shear mapping the red arrow changes direction but the blue arrow does not. The blue arrow is an eigenvector of this shear mapping because it does not change direction, and since its length is unchanged, its eigenvalue is 1.
[https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors]
Almost all vectors change direction, when they are multiplied by $\small A$ [3]. Certain exceptional vectors $\small \mathbf{x}$ are in the same direction as $\small A \mathbf{x}$. Those are the “eigenvectors.” Multiply an eigenvector by $\small A$, and the vector $\small A \mathbf{x}$ is a number times the original $\small \mathbf{x}$.
The basic equation is $\small A \mathbf{x} = \lambda \mathbf{x}$. The number $\small \lambda$ is an eigenvalue of $\small A$. The eigenvalue tells whether the “special vector” $\small \mathbf{x}$ – the eigenvector – is {stretched  shrunk  reversed  left unchanged} when it is multiplied by $\small A$. We may find $\small \lambda = 2$ or $\small \frac{1}{2}$ or $\small 1$ or $\small 1$. The eigenvalue could be zero – in which case, $\small A \mathbf{x} = \mathbf{x}$means that this eigenvector $\small \mathbf{x}$ is in the nullspace.
To calculate the eigenvalues and the eigenvectors, we need to:

be able to calculate the determinant of a matrix;

understand the identity matrix ($\small I$), used in $\small (A  \lambda I)\mathbf{v}$;

calculate the determinant of $\small (A  \lambda I)$, which gives us the eigenvalues ($\small \lambda$);

use those eigenvalues to solve the characteristic polynomial of $\small \mathbf{A}$, which gives us the eigenvectors (“eigenspace”) for each of those eigenvalues;

those eigenvectors also form a “basis” for those eigenvalues.
This excellent MIT tutorial, Singular Value Decomposition (SVD), gives a good overview of the necessary matrix mathematics (discussed at greater length, below).
Calculating the Determinant
The determinant of a matrix is a special number that can be calculated from a square matrix [2]. The determinant tells us things about the matrix that are useful in systems of linear equations, helps us find the inverse of a matrix, is useful in calculus and more.
Like the symbol for an absolute value $\small \vert 1 \vert = 1$, $\small \vert \mathbf{A} \vert$ means “the determinant of matrix $\small \mathbf{A}$.” However, note that in this case, the vertical lines do not mean absolute value: the determinant can be negative.
While the methods for calculating the determinant of $\small [2 \times 2]$ and $\small [3 \times 3]$ matrices are relatively straightforward, approaches for addressing larger matrices through minor and cofactor expansion – while not overly complicated – are omitted here in the interest of brevity (refer instead to Sources [15], below).
Examples
$\small [2 \times 2]$ matrix:

$\small \mathbf{A} =
\begin{bmatrix}
a & b \\\
c & d
\end{bmatrix}$
Determinant $\small \vert A \vert = ad  bc$
$\small \mathbf{B} = \begin{bmatrix} 4 & 6 \\\ 3 & 8 \end{bmatrix}$
$\small \vert B \vert = (4 \times 8)  (6 \times 3) = 32  18 = 14$
$\small [3 \times 3]$ matrix:

$\small \mathbf{A} =
\begin{bmatrix}
a & b & c \\\
d & e & f \\\
g & h & i
\end{bmatrix}$
$\small \vert A \vert = a \cdot \begin{bmatrix} e & f \\\ h & i \end{bmatrix}  b \cdot \begin{bmatrix} d & f \\\ g & i \end{bmatrix} + c \cdot \begin{bmatrix} d & e \\\ g & h \end{bmatrix}$
[Image source. Click image to open in new window.]
Determinant $\small \vert A \vert = a(ei  fh)  b(di  fg) + c(dh  eg)$
A similar "expansion" holds for larger matrices [2], with the operators in the order "$\small + \  \ + \  \ \ldots$" [see the "sign chart" on p. 4 in [4], and Section 9 (p. 6) in [5]]  which is why the "$\small b(di  fg)$" term, above, is subtracted.
Identity Matrix
In linear algebra, the identity matrix, or sometimes ambiguously called a unit matrix, of size $\small n$ is the $\small [n \times n]$ square matrix with ones on the main diagonal and zeros elsewhere. It is denoted by $\small I_n$, or simply by $\small I$ if the size is immaterial or can be trivially determined by the context.
$\small I_{1}={\begin{bmatrix}1\end{bmatrix}},\ \small I_{2}={\begin{bmatrix}1&0\\0&1\end{bmatrix}},\ \small I_{3}={\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\end{bmatrix}},\ \cdots ,\ \small I_{n}={\begin{bmatrix}1&0&0&\cdots &0\\0&1&0&\cdots &0\\0&0&1&\cdots &0 \\ \vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\cdots &1 \end{bmatrix}}$
The product of any matrix and the appropriate identity matrix is always the original matrix, regardless of the order in which the multiplication was performed! Multiplying by the identity matrix $\small I$ doesn’t change anything, just like multiplying a number by 1 doesn’t change anything: $\small A \cdot I = I \cdot A$.
Restated, when $\small A$ is $\small [m \times n]$, it is a property of matrix multiplication that $\small I_{m}A = AI_{n} = A$. This is illustrated in the following example.
[Source. Click image to open in new window.]The identity matrix also has the property that, when it is the product of two square matrices, the matrices can be said to be the inverse of one another.
Calculating the Eigenvalues
We can rewrite the condition $\small A \mathbf{v} = \lambda \mathbf{v}$ as $\small (A  \lambda I)\mathbf{v} = 0$, where $\small I$ is the $\small [n \times n]$ identity matrix.
Now, in order for a nonzero vector $\small \mathbf{v}$ to satisfy the equation

$\small (A  \lambda I)\mathbf{v} = 0$,
$\small A  \lambda I$ must not be invertible, as we are looking for a nonzero vector $\small \mathbf{v}$.
Otherwise, if $\small A  \lambda I$ has an inverse,
$\small (A  \lambda I)^{1} (A  \lambda I)\mathbf{v} = (A  \lambda I)^{1}(0)$
$\small \mathbf{v} = 0$.
That is, the determinant of $\small A  \lambda I$ must equal 0:

$\small det(A  \lambda I) = 0$
i.e.

$\small \vert (\mathbf{A}  \lambda I) \vert = 0$
We call

$\small p(\lambda) = det(A  \lambda I)$
the characteristic polynomial of $\small A$. The eigenvalues of $\small A$ are simply the roots of the characteristic polynomial of $\small A$.
Example
Let
$\small \mathbf{A} = \begin{bmatrix} 2 & 4 \\ 1 & 1 \\ \end{bmatrix}$
then
$\small p(\lambda) = det(A  \lambda I) = det \begin{bmatrix} 2  \lambda & 4 \\ 1 & 1  \lambda \\ \end{bmatrix}$
For a $\small [2 \times 2]$ matrix, $\small I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}$, so for a square matrix $\small \mathbf{A}$, $\small \lambda I = \begin{bmatrix} \lambda & 0 \\ 0 & \lambda \\ \end{bmatrix}$.
Thus $\small A  \lambda I = \begin{bmatrix} a  \lambda & b \\ c & d  \lambda \\ \end{bmatrix}$.
Generally, for any square matrix $\small \mathbf{A}$ and $\small (\mathbf{A}  \lambda I)$, just subtract $\small \lambda$ from the $\small a_{ij}$ diagonals, where $\small i = j$.
$\small p(\lambda) = ((2  \lambda) \times (1  \lambda))  ((4) \times (1))$
$= (\lambda^2  \lambda  2)  (4) = \lambda^2  \lambda  6$
This happens to factor as $= (\lambda  3)(\lambda + 2)$, giving $\small \lambda_1 = 3$ and $\small \lambda_2 = 2$, the eigenvalues of $\small \mathbf{A}$.
Generally, however, you would need to solve the quadratic formula,

$\small ax^2 + bx + c = 0$
$\small x = \dfrac{b \pm \sqrt{b^2  4ac}}{2a}$
here

$\small \lambda^2  \lambda  6 = 0$
so,
$\small \lambda = \dfrac{(1) \pm \sqrt{(1)^2  4(1)(6)}}{2(1)} = \dfrac{1 \pm \sqrt{1 + 24}}{2}$
$\small \lambda_1 = \frac{1 + 5}{2} = 3$
and
$\small \lambda_2 = \frac{1  5}{2} = 2$.
While those eigenvalues are positive integer values, they can be negativevalued and complex numbers; e.g., the solution to
$\small \mathbf{A} = \begin{bmatrix} 4 & 6 \\ 3 & 8 \end{bmatrix}$
calculated similarly gives eigenvalues $\small \lambda_1 = 1.3096$ and $\small \lambda_2 = 10.6904$
…
$\small (4  \lambda)(8 \lambda)  18 = 0$
$\small \lambda^2  12\lambda  14 = 0$
…
Calculating those by hand is a good exercise but errorprone; a good (Linux) alternative is GNU Octave:
 The last example in the textarea, above, is from [1].
 Note that the eigenvalues returned by $\small \text{eig()}$ in GNU Octave are not ordered.
 See also:
Calculating the Eigenvectors (Eigenspace)
Continuing my

$\small \mathbf{A} =
\begin{bmatrix}
4 & 6 \\\
3 & 8
\end{bmatrix}$
example above, which yielded eigenvalues $\small \lambda_1 = 1.3096$ and $\small \lambda_2 = 10.6904$ … we will use those eigenvalues to calculate the eigenvectors/eigenspace, like the examples in [1]

$\small det(\mathbf{A}  \lambda I) = 0$
$\small det \begin{bmatrix} 4  \lambda & 6 \\\ 3 & 8  \lambda \\\ \end{bmatrix} = 0$
$\small (4  \lambda)(8  \lambda)  (6)(3) = 0$
$\small \lambda^2 12 \lambda 18 = 0$
Solve the quadratic $\small \lambda = \dfrac{b \pm \sqrt{b^2  4ac}}{2a}$ , giving

$\small \lambda_1 = 1.3096$
$\small \lambda_2 = 10.6904$.
Now, we want to solve $\small (\mathbf{A}  \lambda I)\mathbf{v} = 0$, for each eigenvalue ($\small \lambda_1$ and $\small \lambda_2$).
For $\small \lambda_1 = 1.3096$, let $\small \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix}$
Hence, solve $\small \left(\begin{bmatrix} 4 & 6 \\ 3 & 8 \end{bmatrix}  1.3096 \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \right) \mathbf{v} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
$\small \begin{bmatrix} 4  1.3096 & 6 \\ 3 & 8  1.3096 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
$\small \begin{bmatrix} 2.6904 & 6 \\ 3 & 6.6904 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
Note: that is a matrix multiplication step:
$\small [2 \times 2][2 \times 1] = [2 \times 1]$
$\small \begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} e \\ f \end{bmatrix} = \begin{bmatrix} ae + bf \\ ce + df \end{bmatrix}$
$\small \begin{matrix} (1) \\ (2) \end{matrix} \ \ \begin{bmatrix}2.6904v_1 + 6v_2 \\ 3v_1 + 6.6904v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
Just solve one of those two equations, (1) or (2):
$\small 2.6904v_1 + 6v_2 = 0$
$\small 2.6904v_1 = 6v_2$
Let $\small v_2 = 1$
$\small v_1 = 6 / 2.6904 = 2.230$
Since we set $\small v_2 = 1$, we have our eigenvector for $\small \lambda_1$: $\small \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 2.230 \\ 1 \end{bmatrix}$, which defines that eigenspace for that eigenvector.
$\small \left\{ \begin{bmatrix} 2.230 \\ 1 \end{bmatrix} \right\}$ provides a basis for the eigenspace for $\small \lambda_1 = 1.3096$.
Repeating that process for $\small \lambda_2 = 10.6904$ gives our eigenvector for $\small \lambda_2$: $\small \mathbf{v} = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0.8968 \\ 1 \end{bmatrix}$.
Thus we have $\small \begin{bmatrix} 0.912 & 0.897 \\ 1 & 1 \end{bmatrix}$ as our nonnormalized eigenvectors.
Normalized Eigenvectors
To calculate the normalized values, for a nonzero vector $\small (a,b)$ the normalized vector is $\small \frac{1}{\sqrt{a^2 + b^2}} (a,b)$.
So, for our first eigenvector, $\small \begin{bmatrix} 2.230 \\ 1 \end{bmatrix}$, we calculate $\small \frac{1}{\sqrt{(2.230)^2 + 1^2}} (2.230, 1) = (0.4092)(2.230, 1)$
$\small = \begin{bmatrix}0.912 \\ 0.409 \end{bmatrix}$ as our * normalized* eigenvector for eigenvalue $\small \lambda_1 = 1.3096$.
Likewise for the nonnormalized eigenvector $\small \begin{bmatrix} 0.897 \\ 1 \end{bmatrix}$ for eigenvalue $\small \lambda_2 = 10.6904$, our normalized eigenvector is $\small \begin{bmatrix} 0.668 \\ 0.744 \end{bmatrix}$.
Thus, the complete set of normalized eigenvectors for $\small \mathbf{A} = \begin{bmatrix} 4 & 6 \\ 3 & 8 \end{bmatrix}$ and eigenvalues $\small \lambda_1 = 1.3096$ and $\small \lambda_2 = 10.6904$ are $\small \begin{bmatrix} 0.912 & 0.668 \\ 0.409 & 0.744 \end{bmatrix}$.
That agrees with our output from GNU Octave:
octave:>> a = [4,6;3,8] a = 4 6 3 8 octave:>> [V,lambda]=eig(a) V = 0.91247 0.66765 0.40915 0.74448 lambda = Diagonal Matrix 1.3096 0 0 10.6904 octave:>>
Don’t worry about the signs of those eigenvectors in Octave, cf. my calculated values. See:
 Does the sign of eigenvectors matter?, and
 PCA: Eigenvectors of opposite sign and not being able to compute eigenvectors with ‘solve’ in R.
Calculating Eigenvectors: Example 2
This second example is taken from the top of p. 2 (their p. 284; my local copy) of this MIT chapter. Since I cite that document [3] in my Sources, below, I wanted to include it as they don’t clearly explain how they derived their eigenvectors, which trivially “differ” in scale (likely due to a scalar multiplier, as the proportions are identical) from my calculated values, and *those provided by GNU Octave and the online calculator, described below.
This fabulous online calculator also provides the (nonnormalized) eigenvectors, and – more significantly – provides the stepwise derivations. This includes the use of reduced row echelons (also available in GNU octave as the $\small \text{rref}$ function) – which I needed to solve Example 3, below.
So, given

$\small \mathbf{A} =
\begin{bmatrix}
0.8 & 0.3 \\\
0.2 & 0.7
\end{bmatrix}$
for which we solve the characteristic polynomial of $\small A$,

$\small p(\lambda) = det(a  \lambda i) = 0$.
[As mentioned, the eigenvalues of $\small A$ are simply the roots of the characteristic polynomial of $\small A$.]
$\small p(\lambda) = det \begin{bmatrix} 0.8  \lambda & 0.3 \\ 0.2 & 0.7  \lambda \end{bmatrix}$
$\small p(\lambda) = (0.8  \lambda)(0.7  \lambda)  (0.3)(0.2) = (\lambda^2 1.5\lambda + 0.56)  0.06 = \lambda^2 1.5\lambda + 0.5$
Solving the quadratic, we get

$\small \lambda_1 = 1$
$\small \lambda_2 = 0.5$
which we use to calculate the eigenvectors.
We want to solve $\small (\mathbf{A}  \lambda I)\mathbf{x} = 0$, for eigenvalues $\small \lambda_1$ and $\small \lambda_2$.
For $\small \lambda_1 = 1$, let $\small \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}$ and solve
$\small \begin{bmatrix} 0.8  1 & 0.3 \\ 0.2 & 0.7  1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
$\small \begin{bmatrix} 0.2 & 0.3 \\ 0.2 & 0.3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$
Remember: that is a matrix multiplication step!
Just solve one of the two equations:

$\small 0.2x_1 + 0.3x_2 = 0$
$\small \ \ \ 0.2x_1  0.3x_2 = 0$
Solving the first equation, setting $\small x_2 = 1$ (tip: although it doesn’t matter, try choosing the term with the largest coefficient; if you choose the other term, you’ll get numerically different values, but – more importantly – the ratio of the values will be unchanged):
$\small 0.2x_1 = 0.3$
$\small \ \ \ \ \ \ x_1 = 1.5$
Since we set $\small x_2 = 1$, we have as our eigenvector for $\small \lambda_1$: $\small \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 1.5 \\ 1 \end{bmatrix}$.
Repeating that process for $\small \lambda_2 = 0.5$ gives our eigenvector for $\small \lambda_2$: $\small \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$.
Thus we have $\small \begin{bmatrix} 1.5 & 1 \\ 1 & 1 \end{bmatrix}$ as our nonnormalized eigenvectors.
To calculate the normalized values
for a nonzero vector $\small (a,b)$ the normalized vector is $\small \frac{1}{\sqrt{a^2 + b^2}} (a,b)$
for our first eigenvector, $\small \begin{bmatrix} 1.5 \\ 1 \end{bmatrix}$, we calculate $\small \frac{1}{\sqrt{(1.5)^2 + 1^2}} (1.5, 1) = (0.5547)(1.5, 1)$
$\small = \begin{bmatrix}0.832 \\ 0.555 \end{bmatrix}$ as our * normalized* eigenvector for eigenvalue $\small \lambda_1 = 1$.
Likewise for the nonnormalized eigenvector $\small \begin{bmatrix} 1 \\ 1 \end{bmatrix}$ for eigenvalue $\small \lambda_2 = 0.5$, our normalized eigenvector is $\small \begin{bmatrix} 0.707 \\ 0.707 \end{bmatrix}$.
Thus, the complete set of normalized eigenvectors for $\small \mathbf{A} = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}$ and eigenvalues $\small \lambda_1 = 1$ and $\small \lambda_2 = 0.5$ are $\small \begin{bmatrix} 0.832 & 0.707 \\ 0.555 & 0.707 \end{bmatrix}$.
That agrees with our output from GNU Octave:
octave:>> a=[0.8,0.3; 0.2,0.7] a = 0.80000 0.30000 0.20000 0.70000 octave:>> [V,lambda]=eig(a) V = 0.83205 0.70711 0.55470 0.70711 lambda = Diagonal Matrix 1.00000 0 0 0.50000 octave:>>
Calculating Eigenvectors: Example 3
This example is more complicated: a $\small [3 \times 3]$ matrix, which coincidentally generates a set of determinant equations that are simple multiples of one another (thus, we essentially have 1 equation with three unknowns, which we can not directly solve, but for which we have workarounds). Mentioned earlier, one of those very useful “tricks” (workaround) is the use of the the reduced row echelon, which is employed in this online calculator (which shows the steps; see also here), is described here, and is available in GNU Octave as the rref function.
This example is taken from [1] (local copy), but doesn’t adequately explain their $\small \lambda_2$ eigenvectors solution.
We are working with

$\small \mathbf{A} =
\begin{bmatrix}
5 & 8 & 16 \\\
4 & 1 & 8 \\\
4 & 4 & 11
\end{bmatrix}$
for which we solve the characteristic polynomial of $\small A$,

$\small p(\lambda) = det(a  \lambda i) = 0$.
to get the eigenvalues of $\small A$ (the roots of the characteristic polynomial of $\small A$.)
We could solve those manually, but it is much easier to use GNU Octave, or an online tool such as this or this – all of which give (unsorted)

$\small \lambda_1 = 1$
$\small \lambda_2 = 3$
$\small \lambda_3 = 3$
which we use to calculate the eigenvectors.
Like before, we want to solve $\small (\mathbf{A}  \lambda I)\mathbf{x} = 0$, for eigenvalues $\small \lambda_1, \lambda_2, \lambda_3$.
…
For the first eigenvalue, $\small \lambda_1 = 1$,
$\small \begin{bmatrix} 4 & 8 & 16 \\ 4 & 0 & 8 \\ 4 & 4 & 12 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}$
gives this set of equations which we can easily solve for the eigenvalues $\small \lambda_i$ (tip: solve the equation with the least number of terms, and set the term with the largest coefficient to 1):

$\small 4x_1 + 8x_2 + 16x_3 = 0$
$\small 4x_1 + 8x_3 = 0$ ← let $\small x_3 = 1$
$\small 4x_1  4x_2  12x_3 = 0$
… then solve for $\small x_1, x_2$, given the eigenvector for this eigenvalue,
$\small \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \\ 1 \end{bmatrix}$.
We run into issues with the other eigenvalues ($\small \lambda_2 = 3, \lambda_3 = 3$), however: a multiply duplicated equation with too many unknowns. E.g., for $\small \lambda_2 = 3$, we get

$\small 8x_1 + 8x_2 + 16x_3 = 0$
$\small 4x_1 + 4x_2 + 8x_3 = 0$
$\small 4x_1  4x_2  8x_3 = 0$
You can try solving these manually, using the source material [local copy] as a guide and setting one term to 1 and one of the other two terms to 0, but it’s quite confusing.
However, that source and especially this online calculator offers a solution – the “reduced row echelon.” The calculation of the reduced row echelon is very complicated, but that online calculator (as well as GNU Octave) provide those solutions!
[Image source. Click image to open in new window.]As can be seen from that image (above), rather than determining the eigenvectors for eigenvalues $\small \lambda_2 = 3$ and $\small \lambda_3 = 3$ from
$\small \begin{bmatrix} 8 & 8 & 16 \\ 4 & 4 & 8 \\ 4 & 4 & 8 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}$
we can much more tractably work with the reduced row echelon version:
$\small \begin{bmatrix} 1 & 1 & 2 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}$
which rather astoundingly preserves the information from the source matrix!
Like the solution shown here, that online calculator (screenshot above) uses the setting of variables to “t” and “s” to solve the eigenspace:
[Image source. Click image to open in new window.]$\small v_1 + v_2 + 2v_3 = 0$; let $\small v_2 = t$ and $\small v_3 = s$
$\small v_1 + t + 2s = 0$
$\small v_1 = 2S  t$
Therefore,
$\small \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} = \begin{bmatrix} 2s  t \\ t \\ s \end{bmatrix} = \begin{bmatrix} 2 \\ 0 \\ 1 \end{bmatrix} \cdot s + \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} \cdot t$
I regard that factorization as being similar to partial derivatives, where constants “disappear” (see my note, here):

$\small \frac{\partial}{\partial x}2x^3 = 6x^2$
$\small \frac{\partial}{\partial x}(2x^3 + 3c) = 6x^2 + 0 = 6x^2$
$\small \frac{\partial}{\partial x}3c = 0$
$\small \frac{\partial}{\partial x}c = 0$
$\small \frac{\partial}{\partial x}(2x^3 + 5z^4) = 6x^2$
$\small \frac{\partial}{\partial z}(2x^3 + 5z^4) = 20z^3$
etc.
octave:>> e e = 5 8 16 4 1 8 4 4 11 octave:>> ## CALCULATE (e  lambda*I) for lambda = 3: octave:>> eye(3) ans = Diagonal Matrix 1 0 0 0 1 0 0 0 1 octave:>> g = e(3*eye(3)) g = 8 8 16 4 4 8 4 4 8 octave:>> ## CALCULATE REDUCED ROW ECHELON: octave:>> h = rref(g) h = 1 1 2 0 0 0 0 0 0 octave:>>
So … the set of nonnormalized eigenvectors for

$\small \mathbf{A} =
\begin{bmatrix}
5 & 8 & 16 \\\
4 & 1 & 8 \\\
4 & 4 & 11
\end{bmatrix}$
with eigenvalues $\small (1, 3, 3)$ is $\small \begin{bmatrix} 2 & 2 & 1 \\\ 1 & 0 & 1 \\\ 1 & 1 & 0 \end{bmatrix}$.
Normalization:

for $\small \lambda_1$, multiply the associated eigenvector values by $\small \frac{1}{\sqrt{(2)^2 + (1)^2 + 1^2}} = \frac{1}{\sqrt{6}} = 0.40825$

for $\small \lambda_1$, multiply the associated eigenvector values by $\small \frac{1}{\sqrt{(2)^2 + 1^2}} = \frac{1}{\sqrt{5}} = 0.44721$

for $\small \lambda_1$, multiply the associated eigenvector values by $\small \frac{1}{\sqrt{(1)^2 + 1^2}} = \frac{1}{\sqrt{2}} = 0.70711$
Thus, the normalized eigenvectors are
$\small \begin{bmatrix} 0.8165 & 0.8944 & 0.7071 \\ 0.4082 & 0 & 0.7071 \\ 0.4082 & 0.4472 & 0 \end{bmatrix}$
Compare that to the GNU Octave output (remembering that the output is unordered, and the signs of the vectors may be reversed):
octave:>> e e = 5 8 16 4 1 8 4 4 11 octave:>> [V,lambda]=eig(e) V = 0.81650 + 0.00000i 0.14037  0.55581i 0.14037 + 0.55581i 0.40825 + 0.00000i 0.71521 + 0.00000i 0.71521  0.00000i 0.40825 + 0.00000i 0.28742 + 0.27790i 0.28742  0.27790i lambda = Diagonal Matrix 1.00000 + 0.00000i 0 0 0 3.00000 + 0.00000i 0 0 0 3.00000  0.00000i octave:>>
I’m not sure why Octave stumbled there, as the online calculators and my hand calculations all agree; e.g. (nonnormalized, below):
[Image source. Click image to open in new window.]EDIT (20180928):
This is OK! See:
 Eigenvectors in Matlab/Octave, function “eig()”, why are the eigenvectors output like that?o
 Eigenvalues in octave with eig()
 Octave eig(L,’vector’) not functioning as Matlab
 [Octave docs] Linear Algebra on Sparse Matrices

[Octave docs] eig ()
 Why does MATLAB/Octave wipe the floor with C++ in Eigenvalue Problems?
Calculating Eigenvectors: Example 4
Ok, this is pretty cool (taken from Octave eig(L,’vector’) not functioning as Matlab)!
To make it easier to view those results, I saved them in this text file ($\small \text{Ctrl +/}$ to resize that font).
This is the Octave $\small \text{spy(W)}$ plot:
[Click image to open in new window.]Calculating Eigenvectors: Example 5
One last example! I created this graph,
[Graph created online at GraphOnline.ru. Click image to open in new window.]from this adjacency matrix,
$\small \begin{bmatrix} 0 & 1 & 1 & 0 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 1 & 1 & 1 \\ 1 & 1 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}$
Solving that eigenspace online (dCode.fr) – for comparison to GNU Octave, gives eigenvalues

$\small {\\ \lambda_1 = 2.87167 \\ \lambda_2 = 2.07964 \\ \lambda_3 = 1.46488 \\ \lambda_4 = 0.843532 \\ \lambda_5 = 0.612888 \\ \lambda_6 = 0.442204 \\ \lambda_7 = 0}$
and eigenvectors

$\small {\mathbf{v_1} = [1.29393,1.57774,1.138,0.396284,0.549416,0.549416,1.] \\
\mathbf{v_2} = [0.204806,1.87484,1.30076,0.625473,0.901519,0.901519,1.] \\
\mathbf{v_3} = [3.02839,1.56351,1.8727,1.2784,1.06733,1.06733,1.] \\
\mathbf{v_4} = [0.338065,1.1816,2.46677,2.92433,1.40077,1.40077,1.] \\
\mathbf{v_5} = [0.554125,0.0587637,0.60162,0.981614,0.09588,0.0958,1.] \\
\mathbf{v_6} = [0.831455,0.389251,0.243076,0.549693,0.880254,0.880254,1.] \\
\mathbf{v_7} = [0.,0.,0.,0.,1.,1.,0.]}$
In GNU Octave:
I looked at those GNU Octave data in LibreOffice Calc, best viewed in this text file (reduce the font size if needed: Ctrl ).
While different from the online (or handcalculated result: not shown) – it looks fine. The Octave eigenvectors are normalized: the sum of the squares of the row entries (ditto, re: the sum of squares of the column entries) each total 1.0.
Sources for “Graph Signal Processing  Background”
[1] Eigenvalues and Eigenvectors [local copy]
[2] Determinant of a Matrix [local copy]
[3] Eigenvalues and Eigenvectors [local copy]
[4] The Determinant of a Square Matrix [local copy]
[5] Matrix Determinants [local copy]
Additional Reading
 Graph Signal Processing – my “main page” on the subject.

Graph Signal Processing – my more recent literature reviews.

Mathematics for Machine Learning, notably:
 Chapter 2: Linear Algebra
 Chapter 3: Analytic Geometry
 Chapter 4: Matrix Decompositions. Includes:
 Ch. 4.1: Determinant and Trace
 Ch. 4.2: Eigenvalues and Eigenvectors
 Ch. 4.3: Cholesky Decomposition
 …
 Chapter 5: Vector Calculus

Of interest – linear algebra, eigenvectors, PageRank and Google:
 Bryan, K., & Leise, T. (2006). The $25,000,000,000 Eigenvector: The Linear Algebra Behind Google. SIAM review, 48(3), 569581. [local copy]
 Gallardo, P. F. (2007). Google’s Secret and Linear Algebra. EMS Newsletter, 63, 1015. local copy]; [discussion]