Graph Signal Processing
Last modified: 20181204
Contents
 Background
 Introduction to Graph Signal Processing
 Applications of Graph Signal Processing
 Relevant Excerpts
 References / Additional Reading
Background  Introduction  Applications  Excerpts  References
Background
A gentle but rather lengthy introduction to key concepts (graph Laplacians, eigenvalues/eigenvectors, matrix mathematics, etc.) that are needed to comprehend this domain.
Background  Introduction  Applications  Excerpts  References
Introduction to Graph Signal Processing

The basic intuition regarding graph signal processing is as follows:

a graph contains nodes (vertices) and edges (relationships) connecting the nodes;

the nodes have values (data): a signal;

the relationships (edges) between the nodes indicate how those signals are related;

by studying those signals, we can better understand and leverage the data within the graph.

[Introduction]
Basics
Suppose we have this graph,
[Image source. Click image to open in new window.]By listing the nodes in the graph in a $\small [row \times column]$ matrix, we can easily represent that graph as an adjacency matrix, $\small \mathbf{A}$.
Formally,
 $\small \mathcal{G} = (\mathcal{V}, \mathcal{E})$ is an unweighted graph;
 $\small \mathcal{G} = (\mathcal{V}, \mathcal{E}, \mathbf{w})$ is a weighted graph.
An undirected graph $\small \mathcal{G}$ is represented by a pair $\small (\mathcal{V}, \mathcal{E})$ where $\small \mathcal{V}$ is the set of vertices $\small v_1 ,v_2, \ldots, v_n$ and $\small \mathcal{E}$ the set of edges of the graph. A pair $\small \{i,j\} \in \mathcal{E}$ if vertices $\small v_i$ and $\small v_j$ are adjacent. We denote the edge connecting $\small v_i$ and $\small v_j$ by $\small e_{i,j}$.
A weighted undirected graph is represented by $\small (\mathcal{V}, \mathcal{E},w)$ where $\small w : \mathcal{V} \times \mathcal{V} \to \mathbb{R}$ and satisfies

$\small w_{i,j} = w_{j,i}$
and

$\small w_{i,j} \geq 0$.
There are various operators that can be associated with a graph. The most commonly known operator of a graph $\small \mathcal{G}$ is the adjacency matrix. It is a symmetric matrix defined by

$\small \mathbf{A}_{i,j} =
\begin{cases}
1 & \text{if $\small (i,j) \in \mathcal{E}$} \\\
0 & \text{otherwise}.
\end{cases}$
For a weighted graph ($\small \mathcal{V},\mathcal{E},w$) the definition is similar, the weighted adjacency matrix of the graph

$\small \mathbf{W}_{i,j} =
\begin{cases}
w_{i,j} & \text{if $\small (i,j) \in \mathcal{E}$} \\\
0 & \text{otherwise}.
\end{cases}$
$\small \mathbf{W}$ reduces to $\small \mathbf{A}$ in the case where we restrict the weights to be $\small \{0,1\}$.
While the adjacency matrix describes the basic structure of the graph, the degree matrix ($\small \mathcal{D}$) describes some basic information about each of those nodes (their overall connectivity). Given a graph $\small \mathcal{G = (V,E)}$ with $\small \vert\mathcal{V}\vert = n$, the degree matrix $\small \mathcal{D}$ for $\small \mathcal{G}$ is a $\small [n \times n]$ diagonal matrix defined as

$\small d_{i,j} :=
\begin{cases}
\text{deg}(\mathcal{v}_i) & \text{if } i = j \\\
0 & \text{otherwise}
\end{cases} $
For example,
[Image source. Click image to open in new window.]In that example, node $\small \mathcal{\mathbf{1}}$ has four degrees: nodes $\small \mathcal{\mathbf{2}}$ and $\small \mathcal{\mathbf{5}}$ connect to it, and since this is an undirected graph, node $\small \mathcal{\mathbf{1}}$ also connects to itself, twice (the selfreferencing edge connects both ways).
Formally,
The weighted degree of a vertex is the total weight of the edges in which it participates. The degree matrix of a graph is defined as

$\small \mathbf{D}_{i,j} =
\begin{cases}
d_i & \text{if $\small i = j$} \\\
0 & \text{otherwise}.
\end{cases}$
where the weighted degree $\small d_i$ of the vertex $\small i$ is

$\small \begin{align} d_i = \sum_{j:(i,j) \in \mathcal{E}} w_{i,j} \end{align}$.
Given the adjacency matrix and the degree matrix, we can easily calculate the graph Laplacian, $\small \mathcal{D}$, which can be used to derive its associated eigenvalues and eigenvectors:

$\small \mathcal{L}_\mathcal{H} = \mathcal{D}_\mathcal{H}  \mathcal{A}_\mathcal{H}$
where $\small \mathcal{D}_\mathcal{H}$ is the diagonal matrix with $\small i^{th}$ diagonal entry equal to the degree of node $\small v_i$.
For example, given graph $\small \mathcal{H}$
[Plotted online at Graph Online. Click image to open in new window.]
and its real value ($\small \mathbb{R}$) adjacency matrix $\small \mathcal{A}_H$,

$\small \mathbf{A}_\mathcal{H} = \begin{bmatrix}
0 & 1 & 0 & 1 & 0 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 1 & 0 & 1 & 1 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 0 & 1 & 0 & 0
\end{bmatrix}$
we can determine the diagonal (degree) matrix $\small \mathcal{D}_\mathcal{H}$ from the adjacency matrix (remembering that the sums of each row or column gives those values)

$\small \mathbf{D}_\mathcal{H} = \begin{bmatrix}
2 & 0 & 0 & 0 & 0 \\\
0 & 2 & 0 & 0 & 0 \\\
0 & 0 & 3 & 0 & 0 \\\
0 & 0 & 0 & 2 & 0 \\\
0 & 0 & 0 & 0 & 1
\end{bmatrix}$
Then, by simple elementwise matrix subtraction (remembering of course that matrices must have similar dimensions for addition or subtraction), $\small \mathcal{L}_\mathcal{H} = \mathcal{D}_\mathcal{H}  \mathcal{A}_\mathcal{H}$, we get the graph Laplacian in one simple processing step:

$\begin{equation}
{\begin{bmatrix}
2 & 0 & 0 & 0 & 0 \\\
0 & 2 & 0 & 0 & 0 \\\
0 & 0 & 3 & 0 & 0 \\\
0 & 0 & 0 & 2 & 0 \\\
0 & 0 & 0 & 0 & 1
\end{bmatrix} \atop \mathbf{D}_\mathcal{H}}
{\begin{matrix}
\\\
\\\
\\\
\\\
\mathbf{} \\\
\\\
\\\
\end{matrix} \atop \ }
{\begin{bmatrix}
0 & 1 & 0 & 1 & 0 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 1 & 0 & 1 & 1 \\\
1 & 0 & 1 & 0 & 0 \\\
0 & 0 & 1 & 0 & 0
\end{bmatrix} \atop \mathbf{A}_\mathcal{H}}
{\begin{matrix}
\\\
\\\
\\\
\\\
\mathbf{=} \\\
\\\
\\\
\end{matrix} \atop \ }
{\begin{bmatrix}
\phantom{\ }2 & 1 & \phantom{}0 & 1 & \phantom{}0 \\\
1 & \phantom{}2 & 1 & \phantom{}0 & \phantom{}0 \\\
\phantom{}0 & 1 & \phantom{}3 & 1 & 1 \\\
1 & \phantom{}0 & 1 & \phantom{}2 & \phantom{}0 \\\
\phantom{}0 & \phantom{}0 & 1 & \phantom{}0 & \phantom{}1
\end{bmatrix} \atop \mathbf{L}_\mathcal{H}}
\end{equation}$
Interestingly, the Laplacian succinctly shows the connectivity of the adjacency matrix, plus the degree/numbers of connections, plus the locations of those connections (above; below).
[Image source (slide 24). Click image to open in new window.][Introduction]
Graph Signals
Networks and graphs are used to represent pairwise relationships between elements of a set, and are objects of intrinsic interest [5]. In applications such as social, energy, transportation, sensor, and neuronal networks, highdimensional data naturally reside on the vertices of weighted graphs [1]. When data vectors are associated with graph vertices, a socalled graph signal is obtained.
A signal or function $\small f : \mathcal{V} \to \mathbf{R}$ defined on the vertices of the graph may be represented as a vector $\small f \in \mathbf{R}^N$, where the $\small i^{th}$ component of the vector $\small f$ represents the function value at the $\small i^{th}$ vertex in $\small \mathcal{V}$ [1].
The following figure provides an example of a graph signal.
[Image source. Click image to open in new window.]In order to analyze data residing on the edges of an unweighted graph, one option is to build its line graph ($\small \cdots — \circ — \circ — \circ \cdots$), where we associate a vertex to each edge and connect two vertices in the line graph if their corresponding edges in the original graph share a common vertex, and then analyze the data on the vertices of the line graph.
If we consider a quantitative dataset for which we are given information about the relationship between its elements, we can represent it as a numericalvalued signal indexed by a graph [8]. For example, for a set of sensor measurements, the relation between measurements from different sensors can be expressed through the physical distance between sensors. For a collection of researchers and their publication records, the relation can be given by their collaborations and publication coauthorship. Assuming that the dataset is finite, we can treat it as a set of vectors

$\small \mathcal{S} = \{ \mathbf{s} : \mathbf{s} = (s_0, \ldots, s_{N1})^T , s_n \in \mathbb{C}\}$.
Then, we can represent the relation between coefficients $\small s_n$ of $\small \mathbf{s}$ with a graph $\small \mathcal{G} = (\mathcal{V},\mathcal{\mathbf{A}})$, so that $\small \mathcal{V} = \{v_0, \ldots, v_{N1}\}$ is a set of $\small N$ nodes, and $\small \mathcal{\mathbf{A}}$ is a $\small [N \times N]$ weighted adjacency matrix. Each coefficient $\small s_n$ corresponds to (is indexed by) node $\small v_n$, and the weight $\small \mathcal{\mathbf{A}}_{n,m}$ of the directed edge from $\small v_m$ to $\small v_n$ expresses the degree of relation of $\small s_n$ to $\small s_m$ . Note that edge weights $\small A_{n,m}$ can take arbitrary real or complex values (for example, if data elements are negatively correlated). We call a signal $\small \mathbf{s}$ indexed by a graph $\small \mathcal{G}$ a graph signal.
Graph signals, in general, can be complexvalued [8]. Furthermore, they can be added together and scaled by constant coefficients. Hence, they form a vector space. If no additional assumptions are made on their values, the set $\small \mathcal{S}$ of graph signals corresponds to the $\small N$dimensional complex vector space $\small \mathcal{S} = \mathbb{C}^N$.
It is possible to add attributes to nodes on a graph, modeling those attribute as signals; for example, the year of graduation in a social network, or the temperature in a given city on a given day in a weather network, etc [9]. Doing so requires us to extend classical signal processing concepts and tools such as Fourier transform, filtering and frequency response to data residing on graphs. It also leads us to tackle complex tasks such as sampling in a principled way. The field that gathers all these questions under a common umbrella is called graph signal processing (GSP). Basically, a graph signal is a set of values residing on a set of nodes.
[Introduction]
Graph Signal Processing
In graph signal processing (GSP), the object of study is not the network itself but a signal supported on the vertices of the graph [5]. The graph is intended to represent a notion of proximity between the components of the signal, provides a structure that can be exploited in its processing.
The emerging field of signal processing on graphs merges algebraic and spectral graph concepts with computational harmonic analysis to process such signals on graphs [1]. The weight associated with each edge in the graph often represents the similarity between the two vertices it connects. The connectivities and edge weights are either dictated by the physics of the problem at hand or inferred from the data. For instance, the edge weight may be inversely proportional to the physical distance between nodes in the network. The data on these graphs can be visualized as a finite collection of samples, with one sample at each vertex in the graph. Collectively, we refer to these samples as a graph signal.
The use of a graph to process a signal is formalised by the definition of the graph shift operator (GSO), which is a matrix representation of the graph, and the graph Fourier transform (GFT), which projects signals in the eigenvector space of the GSO [5]. The GSO and GFT have been proven useful in the processing of brain activity signals using brain networks, the study of epidemic spreading over social networks, and analysis of Markov random fields.
[Introduction]
The Graph Laplacian
The nonnormalized graph Laplacian, also called the combinatorial graph Laplacian, is defined as $\small \mathcal{L} := \mathbf{D}  \mathbf{W}$, where the degree matrix $\small \mathbf{D}$ is a diagonal matrix whose $\small i^{th}$ diagonal element $\small d_i$ is equal to the sum of the weights of all the edges incident to vertex $\small i$ [1]. The graph Laplacian is a difference operator: for any signal $\small \mathbf{f} \in \mathbb{R}^N$, it satisfies

$\small \begin{align} (\mathcal{L}f)(i) = \sum_{j \in \mathcal{N}_i } W_{i,j} [ f(i)  f(j) ] \end{align}$,
where the neighborhood $\small \mathcal{N}_i$ is the set of vertices connected to vertex $\small i$ by an edge. More generally, we denote by $\small N(i,k)$ the set of vertices connected to vertex $\small i$ by a path of $\small k$ or fewer edges.
Because the graph Laplacian $\small \mathcal{L}$ is a real symmetric matrix, it has a complete set of orthonormal eigenvectors, which we denote by $\small \{\mathbf{u}_l\}_{l=0,1,\ldots,N1}$ [1]. These eigenvectors have associated real, nonnegative eigenvalues $\small \{\lambda_l\}_{l=0,1,\ldots,N1}$ satisfying $\small \mathcal{L} \mathbf{u}_l = \lambda_l \mathbf{u}_l$, for $\small \lambda = 0, 1, \ldots, N1$.
Zero appears as an eigenvalue with multiplicity equal to the number of connected components of the graph, and thus, since we consider connected graphs, we assume the graph Laplacian eigenvalues are ordered as

$\small 0 = \lambda_0 < \lambda_1 \leq \lambda_2 \ldots \leq \lambda_{N1} := \lambda_{max}$.
We denote the entire spectrum by

$\small \sigma (\mathcal{L}) := \{\lambda_0, \lambda_1, \ldots, \lambda_{N1}\}$.
For a realsymmetric matrix $\small \mathbf{A}$, the eigenspace $\small S_i$ contains the eigenvectors associated with eigenvalues $\small \lambda_i$:

$\small S_i = \{ \mathbf{x} \in \mathbb{R}^n \vert \mathbf{Ax} = \lambda_i \mathbf{x} \}$.
The eigenvectors of the adjacency matrix $\small \mathbf{Ax} = \lambda \mathbf{x}$ can be viewed as eigenfunctions:
[Image source (slide 9). Click image to open in new window.][Introduction]
Graph Fourier Transform and Notion of Frequency
The classical Fourier transform

$\begin{align} \hat{f}(\xi) := \langle f, e^{2 \pi i \xi t} \rangle = \int\limits_{\mathbb{R}} f(t) e^{2 \pi i \xi t} dt \end{align}$
is the expansion of a function $\small f$ in terms of the complex exponentials, which are the eigenfunctions of the onedimensional Laplace operator [1]:

$\Delta(e^{2 \pi i \xi t}) = \frac{\partial^2}{\partial{t^2}} e^{2 \pi i \xi t} = (2 \pi \xi)^2 e^{2 \pi i \xi t}$.
Analogously, we can define the graph Fourier transform of any function $\small f \in \mathbb{R}^N$ on the vertices of $\small \mathcal{G}$ as the expansion of $\small f$ in terms of the eigenvectors of the graph Laplacian:

$\small \begin{align} \hat{f}(\lambda_l) := \langle \mathbf{f}, \mathbf{u}_l \rangle = \sum_{i=1}^{N} f(i) u^*_l (i) \end{align}$.
The inverse graph Fourier transform is then given by

$\small \begin{align} f(i) = \sum_{l=0}^{N1} \hat{f}(\lambda_l)u_l(i) \end{align}$.
In classical Fourier analysis, the eigenvalues $\small \{(2 \pi \xi)^2\}_{\xi \in \mathbb{R}}$ in the onedimensional Laplace operator (equation, further above) carry a specific notion of frequency: for $\small \xi$ close to zero (low frequencies), the associated complex exponential eigenfunctions are smooth, slowly oscillating functions, whereas for $\small \xi$ far from zero (high frequencies), the associated complex exponential eigenfunctions oscillate much more rapidly.
In the graph setting, the graph Laplacian eigenvalues and eigenvectors provide a similar notion of frequency. For connected graphs, the Laplacian eigenvector $\small \mu_0$ associated with the eigenvalue $\small 0$ is constant and equal to $\small \frac{1}{\sqrt{N}}$ at each vertex. The graph Laplacian eigenvectors associated with low frequencies $\small \lambda_l$ vary slowly across the graph; i.e., if two vertices are connected by an edge with a large weight, the values of the eigenvector at those locations are likely to be similar. The eigenvectors associated with larger eigenvalues oscillate more rapidly and are more likely to have dissimilar values on vertices connected by an edge with high weight. This is demonstrated in both Figure 2, which shows different graph Laplacian eigenvectors for a random sensor network graph, and Figure 3, which shows the number $\small \vert \mathcal{Z}_{\mathcal{G}} (\cdot) \vert $ of zero crossings of each graph Laplacian eigenvector.
[Image source (slide 33). "Hammond11" is Wavelets on Graphs via Spectral Graph Theory. Click image to open in new window.]The set of zero crossings of a signal $\small \mathbf{f}$ on a graph $\small \mathcal{G}$ is defined as

$\small \mathcal{Z}_{\mathcal{G}} (\mathbf{f}) := \{ e = (i,j) \in \mathcal{E} : f (i) f (j) < 0 \}$;
that is, the set of edges connecting a vertex with a positive signal to a vertex with a negative signal.
[Click image to open in new window.]Classical Fourier transform thus provides the frequency domain representation of the signals. Similarly, the graph Fourier transform and its inverse give us a way to equivalently represent a signal in two different domains:
 the vertex (spatial) domain, and
 the frequency (graph spectral) domain.
The latter (graph spectral domain) is important for the analysis of signal properties.
[Image source (local copy). Click image to open in new window.]While we often start with a signal $\small \mathbf{g}$ in the vertex domain, it may also be useful to define a signal $\small \mathbf{\hat{g}}$ directly in the graph spectral domain. We refer to such signals as kernels. In Figure 4 (below), one such signal, a heat kernel, is shown in both domains. Analogously to the classical analog case, the graph Fourier coefficients of a smooth signal such as the one shown in Figure 4 decay rapidly. Such signals are compressible as they can be closely approximated by just a few graph Fourier coefficients.
[Image source. Click image to open in new window.]The Laplacian allows a natural link between discrete representations, such as graphs, and continuous representations, such as vector spaces and manifolds. [Thus, we can view equivalent representations of a graph signal in the vertex domain and in the graph spectral domain, as indicated in Fig. 4 in The Emerging Field of Signal Processing on Graphs (above).]
[Image source (local copy). Click image to open in new window. Refer here for a discussion regarding the derivation of the graph Laplacian.][Introduction]
Operators on Vertices: Laplacian Operators
[Image source (local copy). Click image to open in new window.]
In mathematics, the Laplace operator or Laplacian is a differential operator given by the divergence of the gradient of a function on Euclidean space. It is usually denoted by the symbols $\small \nabla \cdot \nabla$, $\small \nabla^2$, or $\small \Delta$. The Laplacian $\small \Delta f(p)$ of a function $\small f$ at a point $\small p$, is (up to a factor) the rate at which the average value of $\small f$ over spheres centered at $\small p$ deviates from $\small f(p)$ as the radius of the sphere grows. In a Cartesian coordinate system, the Laplacian is given by the sum of second partial derivatives of the function with respect to each independent variable. In other coordinate systems such as cylindrical and spherical coordinates, the Laplacian also has a useful form.
Definition. The Laplace operator is a second order differential operator in the $\small n$dimensional Euclidean space, defined as the divergence ($\small \nabla \cdot$) of the gradient ($\small \nabla f$). Thus if $\small f$ is a twicedifferentiable realvalued function, then the Laplacian of $\small \nabla f$ is defined by

$\small \Delta f=\nabla ^{2}f=\nabla \cdot \nabla f$
where the latter notations derive from formally writing

$\nabla =\left({\frac {\partial }{\partial x_{1}}},\ldots ,{\frac {\partial }{\partial x_{n}}}\right)$
Equivalently, the Laplacian of $\small f$ is the sum of all the unmixed second partial derivatives in the Cartesian coordinates $\small x_i$:

$\small \begin{align} \Delta f = \sum_{i=1}^n \frac {\partial^2 f}{\partial x^2_i} \end{align}$
As a secondorder differential operator, the Laplace operator maps $\small \mathcal{C}^k$ functions to $\small \mathcal{C}^{k2}$ functions for $\small k \geq 2$. The two expressions above define an operator $\small \Delta : \mathcal{C}^k(\mathbb{R}^n) \rightarrow \mathcal{C}^{k2}(\mathbb{R}^n)$, or more generally an operator $\small \Delta : \mathcal{C}^k(\Omega) \rightarrow \mathcal{C}^{k2}(\Omega)$ for any open set $\small \Omega$.
 Section 2 in [6] also discusses Laplace operators.
Spectral theory. The spectrum of the Laplace operator consists of all eigenvalues $\small \lambda$ for which there is a corresponding eigenfunction $\small f$ with

$\small \Delta f = \lambda f$.
This is known as the Helmholtz equation.
If $\small \Omega$ is a bounded domain in $\small \mathbb{R}^n$ then the eigenfunctions of the Laplacian are an orthonormal basis for the Hilbert space $\small \mathcal{L}^2(\Omega)$. This result essentially follows from the spectral theorem on compact selfadjoint operators, applied to the inverse of the Laplacian (which is compact, by the Poincaré inequality and the RellichKondrachov theorem). It can also be shown that the eigenfunctions are infinitely differentiable functions. More generally, these results hold for the LaplaceBeltrami operator on any compact Riemannian manifold with boundary, or indeed for the Dirichlet eigenvalue problem of any elliptic operator with smooth coefficients on a bounded domain. When $\small \Omega$ is the $\small n$sphere, the eigenfunctions of the Laplacian are the wellknown spherical harmonics.

Orthonormal basis. particularly linear algebra, an orthonormal basis for an inner product space $\small V$ with finite dimension is a basis for $\small V$ whose vectors are orthonormal, that is, they are all unit vectors [vectors of length 1] and orthogonal [perpendicular] to each other. For example, the standard basis for a Euclidean space $\small \mathbb{R}^n$ is an orthonormal basis, where the relevant inner product is the dot product of vectors. The image of the standard basis under a rotation or reflection (or any orthogonal transformation) is also orthonormal, and every orthonormal basis for $\small \mathbb{R}^n$ arises in this fashion.
For a general inner product space $\small V$, an orthonormal basis can be used to define normalized orthogonal coordinates on $\small V$. Under these coordinates, the inner product becomes a dot product of vectors. Thus the presence of an orthonormal basis reduces the study of a finitedimensional inner product space to the study of $\small \mathbb{R}^n$ under dot product. Every finitedimensional inner product space has an orthonormal basis, which may be obtained from an arbitrary basis using the GramSchmidt process.

Eigenvalues and eigenvectors. In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes by only a salar factor when that linear transformation is applied to it. More formally, if $\small T$ is a linear transformation from a vector space $\small T$ over a field $\small F$ into itself and $\small \mathbf{v}$ is a vector in $\small V$ that is not the zero vector, then $\small \mathbf{v}$ is an eigenvector of $\small T$ if $\small T(\mathbf{v})$ is a scalar multiple of $\small \mathbf{v}$. This condition can be written as the equation

$\small T(\mathbf{v}) = \lambda \mathbf{v}$
where $\small \lambda$ is a scalar in the field $\small F$, known as the eigenvalue, characteristic value, or characteristic root associated with the eigenvector $\small \mathbf{v}$.
If the vector space $\small V$ is finitedimensional, then the linear transformation $\small T$ can be represented as a square matrix $\small A$, and the vector $\small \mathbf{v}$ by a column vector, rendering the above mapping as a matrix multiplication on the left hand side and a scaling of the column vector on the right hand side in the equation

$\small A\mathbf{v} = \lambda \mathbf{v}$.
There is a direct correspondence between $\small [n \times n]$ square matrices and linear transformations from an $\small n$dimensional vector space to itself, given any basis of the vector space. For this reason, it is equivalent to define eigenvalues and eigenvectors using either the language of matrices or the language of linear transformations.
Geometrically an eigenvector, corresponding to a real nonzero eigenvalue, points in a direction that is stretched by the transformation and the eigenvalue is the factor by which it is stretched. If the eigenvalue is negative, the direction is reversed.
The following, taken from Section 2.3.2 (“Operators on Vertices”) in [4], also discusses the need to study graph eigenvalues and eigenvectors.
If $\small \mathbf{A}$ is a square matrix, and $\small \mathbf{v}_1 , \mathbf{v}_2 , \ldots, \mathbf{v}_n$ its set of eigenvectors that form a basis, we know that

$\small \mathbf{A} \mathbf{v}_i = \lambda \mathbf{v}_i$ .
Multiplication of a vector x with the matrix $\small \mathbf{A}$ can be viewed through the lens of the eigenvectors. The vector $\small \mathbf{x}$ can be written as

$\small \begin{align} \mathbf{x} = \sum_i c_i \mathbf{v}_i \end{align}$,
and therefore the multiplication with $\small \mathbf{A}$ is

$\small \begin{align} \mathbf{A}^k \mathbf{x} = \sum_i c_i \mathbf{A}^k \mathbf{v}_i = \sum_i c_i \lambda^k_i \mathbf{v}_i \end{align}$ .
This is important because the operators often associated with undirected graphs are symmetric. Symmetric operators admit an eigenvalue decomposition into a set of orthogonal eigenvectors that form a basis. This is known as the spectral theorem. Therefore, matrix multiplications with powers of these operators are easier to study through their spectrum (eigenvalues), since any vector can be expressed in the eigenbasis of the operator.
The vertices of a graph $\small \mathcal{G}$ can be mapped to a vector $\small \mathbf{x} \in \mathbb{R}^{\vert V \vert}$. The vector $\small \mathbf{x}$ can be viewed as a function $\small \mathbf{x} : \mathcal{V} \rightarrow \mathbb{R}$.
Let $\small \mathbf{A}$ be a square $\small [n \times n]$ matrix, and $\small \lambda$ a (real or complex) scalar variable. The graph Fourier Transform (GFT) is eigenmatrix of graph Laplacian $\small \mathcal{L}$:

$\small \mathcal{L}\mathbf{u}_i = \lambda_i \mathbf{u}_i$ ,
where $\small \lambda_i$ are the eigenvalues and $\small \mathbf{u}_i$ are the eigenvectors of the Laplacian matrix:
 $\small 0 = \lambda_0 < \lambda_1 \leq \lambda_2 \leq \ldots \leq \lambda_{N1}$
 $\small \mathcal{L} \mathbf{x}_i = \lambda_i \mathbf{x}_i$
 note: for $\small \mathcal{L}$, $\small \mathbf{x}_0 = 1$
 Edge weights affect the shapes of the eigenvectors.
On any graph, the eigenvectors $\small \mathbf{\chi}_i$ of the Laplacian matrix $\small \mathcal{L}$ will be considered as the Fourier vectors, and its eigenvalues $\small \lambda_i$ the associated (squared) frequencies.
[Image source (local copy). Click image to open in new window.][Introduction]
Spectral Clustering, Eigenvalues and Eigenvectors
 “eigen” [German] : “own” or “characteristic”
 See also: Eigenvalues and Eigenvectors
Clustering is one of the most widely used techniques for exploratory data analysis, with applications ranging from statistics, computer science, biology to social sciences or psychology [2]. In virtually every scientific field dealing with empirical data, people attempt to get a first impression on their data by trying to identify groups of “similar behavior” in their data. In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the kmeans algorithm.
The main tools for spectral clustering are graph Laplacian matrices [2]. There exists a whole field dedicated to the study of those matrices, called spectral graph theory (e.g., see Chung, 1997). In mathematics, spectral graph theory is the study of the properties of a graph in relationship to the characteristic polynomial, eigenvalues, and eigenvectors of matrices associated with the associated graph matrices: the adjacency matrix, and the graph Laplacian, $\small \mathcal{L}$ (Laplacian matrix) and its variants. Both matrices have been extremely well studied from an algebraic point of view.
The adjacency matrix of a simple graph is a real symmetric matrix and is therefore orthogonally diagonalizable; its eigenvalues are real algebraic integers. While the adjacency matrix depends on the vertex labeling, its spectrum is a graph invariant, although not a complete one. Spectral graph theory is also concerned with graph parameters that are defined via multiplicities of eigenvalues of matrices associated to the graph.
The Laplacian allows a natural link between discrete representations, such as graphs, and continuous representations, such as vector spaces and manifolds. [Thus, we can view equivalent representations of a graph signal in the vertex domain and in the graph spectral domain, as indicated in Fig. 4 in The Emerging Field of Signal Processing on Graphs.]
This is the graphical equivalent the Fourier transform. In signal processing, timefrequency analysis comprises those techniques that study a signal in both the time and frequency domains simultaneously, using various timefrequency representations. A signal can be converted between the time and frequency domains with a pair of mathematical operators called a transform. An example is the Fourier transform, which decomposes a function into the sum of a (potentially infinite) number of sine wave frequency components. The ‘spectrum’ of frequency components is the frequency domain representation of the signal. The inverse Fourier transform converts the frequency domain function back to a time function.
The most important application of the Laplacian is spectral clustering that corresponds to a computationally tractable solution to the graph partitioning problem. Another application is spectral matching that solves for graph matching. [Source (slide 2).]
The study of complex systems greatly benefits from graph models and their analysis [3]. In particular,

the eigendecomposition of the graph Laplacian lets properties of global organization emerge from local interactions [3].
For example, the Fiedler vector is the smallest nonzero eigenvalue and plays a key role for graph clustering. Spectral graph theory supports an important class of methods for studying graph topology, as well as analyzing graph signals.
 The first nonnull eigenvalue $\small \lambda_{k+1}$ is called the Fiedler value.
 The corresponding eigenvector $\small \mathbf{u}_{k+1}$ is called the Fiedler vector.
 The multiplicity of the Fiedler eigenvalue is always equal to 1.
 The Fiedler value is the algebraic connectivity of a graph: the further from 0, the more connected the graph.
 The Fidler vector has been extensively used for spectral bipartitioning.
 Source (slide 20).
As in classical signal processing, graph signals can have properties such as smoothness that need to be appropriately defined [9]. They can also be represented via basic atoms and can have a spectral representation. In particular, the graph Fourier transform allows us to develop the intuition gathered in the classical setting and extend it to graphs; we can talk about the notions of frequency and bandlimiting, for example. We can filter graph signals. They can be sampled, a notoriously hard problem; with graph signal processing, one gains access to principled tools mimicking the classical ones. We can denoise graph signals, we can learn their underlying structure, we can model them. If the graphs cannot be directly observed, we can also learn their structure from data.
[Image source. Click image to open in new window.][Introduction]
Spectral Graph Theory
Main Points

Spectral graph theory studies the properties of graphs via the eigenvalues and eigenvectors of their associated graph matrices: the adjacency matrix and the graph Laplacian and its variants.

The Laplacian allows a natural link between discrete representations (such as graphs), and continuous representations (such as vector spaces and manifolds).

The most important application of the Laplacian is spectral clustering that corresponds to a computationally tractable solution to the graph partitioning problem.
 Another application is spectral matching that solves for graph matching. Given a set of feature points, graph nodes can represent the points and graph edges can encode the relationships between two points. The purpose of graph matching is then to find correspondences between the nodes of two feature graphs such that both the unary information (e.g., feature descriptors) on the nodes and the relation information (e.g., distance or angle) associated with the edges are preserved as much as possible. [Source: Fast and Scalable Approximate Spectral Matching for HigherOrder Graph Matching (2014).]
Source: A Short Tutorial on Graph Laplacians, Laplacian Embedding, and Spectral Clustering [local copy]
Spectral graph theory is the field concerned with the study of the eigenvectors and eigenvalues of the matrices that are naturally associated with graphs (Ch. 2 in [4]). One of the goals is to determine important properties of the graph from its graph spectrum. A lot of invariant properties of the graph that are fundamental in understanding it are closely associated with its spectrum. As we will later explore in more detail, spectral graph theory has interesting links to mathematical areas such as differential geometry and spectral Riemannian geometry.
Apart from theoretical interest it also has a wide range of applications. Some of the earliest applications of spectral graph theory have been in chemistry (Ch. 2 in [4]). Since then, it has been successfully applied to a plethora of problems in biology, communication networks and more recently image processing, computer graphics, and data science. Graph spectra also naturally arise in various problems of theoretical physics and quantum mechanics, such as energy minimization in Hamiltonian systems.
The spectrum of a graph $\small \mathcal{G}$ refers to the set of its eigenvalues (Ch. 2 in [4]). The definitions of the graph spectrum are not entirely consistent in the literature as different authors prefer to focus on the eigenvalues of different matrices of the graph. In Spectral Graph Theory the spectrum of a graph $\small \mathcal{G}$ is defined as the set of the eigenvalues of its Laplacian matrix, while in Section 28.3 (p. 285) in Handbook of Linear Algebra [local copy] and Ch. 8.1 (p. 164) in Algebraic Graph Theory [local copy] the eigenvalues of the adjacency matrix are used instead.
Spectral graph theory has historically focused on constructing, analyzing, and manipulating graphs, as opposed to signals on graphs [1]. In the area of signal processing on graphs, spectral graph theory has been leveraged as a tool to define frequency spectra and expansion bases for graph Fourier transforms. It has proved particularly useful for the construction of expander graphs, graph visualization, spectral clustering, graph coloring and numerous other applications in biology, chemistry, physics, computer science, and other domains.
[Image source (same authors as [1]). Click image to open in new window.][Introduction]
Graph (Representation) Learning
Related to the bolded comment, further above (reiterated here):

"The study of complex systems greatly benefits from graph models and their analysis [3]. In particular,
the eigendecomposition of the graph Laplacian lets properties of global organization emerge from local interactions [3].
the following discussion is relevant.
Wavelets on Graphs via Spectral Graph Theory [7] likewise discusses the utility of graph signal processing for generalizing from local to more global environments:

“Many signal processing techniques are based on transform methods, where the input data is represented in a new basis before analysis or processing. One of the most successful types of transforms in use is wavelet analysis. Wavelets have proved over the past 25 years to be an exceptionally useful tool for signal processing. Much of the power of wavelet methods comes from their ability to simultaneously localize signal content in both space and frequency. For signals whose primary information content lies in localized singularities, such as step discontinuities in time series signals or edges in images, wavelets can provide a much more compact representation than either the original domain or a transform with global basis elements such as the Fourier transform.
“An enormous body of literature exists for describing and exploiting this wavelet sparsity. We include a few representative references for applications to signal compression, denoising, and inverse problems including deconvolution. As the individual waveforms comprising the wavelet transform are selfsimilar, wavelets are also useful for constructing scale invariant descriptions of signals. This property can be exploited for pattern recognition problems where the signals to be recognized or classified may occur at different levels of zoom. In a similar vein, wavelets can be used to characterize fractal selfsimilar processes.”
While the math and jargon in Discrete Signal Processing on Graphs [10] is dense, it eventually settles down – providing two very interesting examples, that suggest the embedding of global properties in sampled subsets of the graph.

“Classification and labeling are important problems in data analysis. These problems have traditionally been studied in machine learning. Here, we propose a novel data classification algorithm by demonstrating that a classifier system can be interpreted as a filter on a graph. … Our approach is based on the label propagation, which is a simple, yet efficient technique that advances known labels from labeled graph nodes along edges to unlabeled nodes. … instead of propagating labels as in a Markov chain, we construct a filter $\small h(\mathbf{A})$ that produces labels $\small \tilde{s} = h(\mathbf{A})s$ [where given $\small \mathcal{G} = (\mathcal{V}\mathbf{A})$, $\small \mathbf{A}$ is the adjacency matrix, and $\small s$ is the signal].
“The following example illustrates our approach. Consider a set of $\small N = 1,224$ political blogs on the Web that we wish to classify as “conservative” or “liberal” based on the their context. Reading and labeling each blog is very timeconsuming. Instead, we read and label only a few blogs, and use these labels to adaptively build a filter $\small h(\mathbf{A})$ [shown above].
“Let signal $\small s$ contain initially known labels, where ‘conservative’, ‘liberal ’, and unclassified blogs are represented by values $\small s_n = +1, 1$, and $\small 0$, respectively. Also, let signal $\small t$ contain training labels, a subset of known labels from $\small s$. Both $\small s$ and $\small t$ are represented by a graph $\small \mathcal{G} = ( \mathcal{V}, \mathbf{A})$, where node $\small v_n$ containing the label of the $\small n^{th}$ blog, and edge $\small A_{n,m} = 1$ if and only if there is a hyperlink reference from the $\small n^{th}$ to the $\small m^{th}$ blog; hence the graph is directed. Observe that the discovery of hyperlink references is a fast, easily automated task, unlike reading the blogs. An example subgraph for 50 blogs is shown in Fig. 1(d).
[Image source. Click image to open in new window.][Aside. Regarding Fig. 1(c), above – that is another very interesting example! While the weather stations are plotted on the sensor geolocations (thus the map adopts the mainland USA shape), the analysis of the signals on those sensors (nodes) reveals the temperature gradient across the USA. image
Source.]Recall that the graph shift $\small \mathbf{A}$ replaces each signal coefficient with a weighted combination of its neighbors. In this case, processing training labels $\small t$ with the filter

$\small \mathbf{I}_N + h_1 \mathbf{A}$
produces new labels
$\small \tilde{\mathbf{t}} = \mathbf{t} + h_1 \mathbf{At}$. Here, every node label is adjusted by a scaled sum of its neighbors’ labels. The parameter $\small h_1$ can be interpreted as the “confidence” in our knowledge of current labels: the higher the confidence $\small h_1$, the more neighbors’ labels should affect the current labels. We restrict the value of $\small h_1$ to be positive.
Since the sign of each label indicates its class, label $\small \tilde{t}_n$ is incorrect if its sign differs from $\small s_n$ , or $\small \tilde{t}_n s_n \leq 0$ for $\small s_n \neq 0$. We determine the optimal value of $\small h_1$ by minimizing the total error, given by the number of incorrect and undecided labels. This is done in linear time proportional to the number of initially known labels $\small s_n = 0$, since each constraint

$\small \begin{align} \tilde{t}_n s_n = \left( t_n + h_1 \sum_{k \in N_n} t_k \right) s_n \leq 0 \end{align}$
is a linear inequality constraint on $\small h_1$.
To propagate labels to all nodes, we repeatedly feed them through $\small P$ filters of the form

$\small h^{(p)}(\mathbf{A}) = \mathbf{I}_N + h_p \mathbf{A}$,
each time optimizing the value of $\small h_p$ using the greedy approach discussed above. The obtained adaptive classification filter is

$\small h(\mathbf{A}) = (\mathbf{I}_N + h_P \mathbf{A}) (\mathbf{I}_N + h_{P1} \mathbf{A}) \ldots (\mathbf{I}_N + h_1 \mathbf{A})$.
In experiments, we set $\small P = 10$, since we observed that the classification filter above converges quickly, and in many cases, $\small h p = 0$ for $\small p > 10$, which is similar to the actual graph’s diameter of $\small 8$. After the classification filter is constructed, we apply it to all known labels $\small s$, and classify all $\small N$ nodes based on the signs of resulting labels $\small \tilde{s} = h(\mathbf{A})s$.
In our experiments, we considered two methods for selecting nodes to be labeled initially: random selection, and selection of blogs with most hyperlinks. As Table I shows, our algorithm achieves high accuracy for both methods. In particular, assigning initial labels $\small s to only 2% of blogs with most hyperlinks leads to the correct classification of 93 % of unlabeled blogs.
[Image source. Click image to open in new window.]Their second example involves customer behavior prediction, an example of a mobile service provider that is interested in keeping its customers. The company wants to predict which users will stop using their services in the near future, and offer them incentives for staying with the provider (improved call plan, discounted phones, etc.). … [continued on pp. 910, here].
Section III.E (p. 10) in Graph Signal Processing: Overview, Challenges and Applications [9] discusses Graph Learning.

“Much recent work on graph signal processing assumes that the graph is given or can be defined in a reasonable way based on the nature of the application. As an example, in communication or social networks the connectivity of the network (directed or undirected) can be used to define the graph. In other applications edge weights between nodes can be chosen as a decreasing function of distance, e.g., physical distance between sensors in the case of sensor networks or distance in feature space in the case of learning applications.
“Recent work has been considering alternative techniques where the goal is to learn graphs from data. This work is motivated by scenarios where (i) no reasonable initial graph exists or (ii) it is desirable to modify a known graph (based on network connectivity for example) by selecting weights derived from data. The key idea in these approaches is to select a graph such that the most likely vectors in the data (the graph signals) correspond to the lowest frequencies of the GFT [graph Fourier transform: “smoothness”] or to the more likely signals generated by Gauss Markov Random Field (GMRF) related to the graph.
“… The basic idea in the latter approaches is to identify GMRF models such that the inverse covariance (precision) matrix has the form of a graph Laplacian (e.g., combinatorial or generalized). Note that this work extends popular approaches for graph learning (e.g., graphical Lasso) to precision matrices restricted to have a Laplacian form (corresponding to a graph with positive edge weights). Other approaches have addressed graph selection under the assumption that the observed data was obtained through graph based diffusion. … While not explicitly a graph learning problem, the related question of blind identification of graph filters has also been studied.
Those statements are relevant to the following graph signal processing examples, provided in that paper [9].

“Sensor Networks. … A first approach to define a graph associated to a sensor network is to choose edge weights as a decreasing function of distance between nodes (sensors). … In some cases, relations between sensor readings are not exclusively explained by the distance between sensor locations, or by some actual network constraints. …
“In some cases the phenomena that can explain these relations between measurements are latent and this leads to the challenging problem of learning a graph (see also Section IIIE) that can explain the data observations under signal smoothness or other signal model assumptions. This allows inferring system features and behaviors that are hidden in the measured datasets (e.g., ozone datasets in ref. [165]).”

“Biological Networks. … Biological networks have also proved to be a popular application domain for graph signal processing, with recent research works focusing on the analysis of data from systems known to have a network structure, such as the human brain, and also on the inference of a priori unknown biological networks.
“It should finally be noted that brain networks are not the only biological networks where GSP offers promising solutions. Graph signal processing elements and biological priors are combined to infer networks and discover meaningful interactions in gene regulatory networks, as in refs. [182] & [183]. The inference of the structure of protein interaction networks has also been addressed with help of spectral graph templates. In particular, the observed matrix of mutual information can be approximated by some (unknown) analytic matrix function of the unobserved structure to be recovered. Observed data is then used to obtain the eigenvectors of the matrix representation of the graph model, and then the eigenvalues are estimated with the help of sparsity priors.
The above examples are only some illustrations of the recent works that attempt to infer structures of biological networks using a GSP perspective. Biological networks that cannot be explicitly recorded and measured are potentially good applications for graph learning and inference methods in particular, which can uncover unknown interactions in the biological data.”
Introduction: Sources
[1] The Emerging Field of Signal Processing on Graphs
[2] A Tutorial on Spectral Clustering
[3] When Slepian Meets Fiedler: Putting a Focus on the Graph Spectrum
[4] Spectral Graph Theory and Deep Learning on Graphs
[5] [Research Seminar] Graph Signal Processing: Fundamentals and Applications
[6] Kernels and Regularization on Graphs [local copy; slides]
[7] Wavelets on Graphs via Spectral Graph Theory [local copy]
[8] Discrete Signal Processing on Graphs: Graph Fourier Transform
[9] Graph Signal Processing: Overview, Challenges and Applications
[10] Discrete Signal Processing on Graphs
Background  Introduction  Applications  Excerpts  References
Applications of Graph Signal Processing
See also the examples (“Sensor Networks;” “Biological Networks;” “Image and 3D Point Cloud Processing;” “Machine Learning and Data Science”) in Section IV (pp. 11+), “Graph Signal Processing Applications,” in Graph Signal Processing: Overview, Challenges and Applications.
Applications of graph signal processing include signal processing and machine learning tasks including:
 signal compression,
 denoising (e.g. of noisy fMRI images),
 semisupervised learning, data {classification  clustering  dimensionality reduction},
 linear prediction,
 customer behavior prediction,
 etc. (additional examples below).
Application domains include neuroimaging, social network analysis (community detection, recommendation, link prediction, citation networks, …), urban computing (traffic patterns, …), computer graphics and computer vision, geoscience and remote sensing, biological networks (epidemiological, molecular, gene regulation, …), etc.
While deep learning models have been particularly successful when dealing with signals such as speech, images, or video in which there is an underlying Euclidean structure, recently there has been a growing interest in trying to apply learning on nonEuclidean geometric data [2]. Such kinds of data arise in numerous applications; for instance, in social networks, the characteristics of users can be modeled as signals on the vertices of the social graph. Sensor networks are graph models of distributed interconnected sensors, whose readings are modelled as timedependent signals on the vertices. In genetics, gene expression data are modeled as signals defined on the regulatory network. In neuroscience, graph models are used to represent anatomical and functional structures of the brain. In computer graphics and vision, 3D objects are modeled as Riemannian manifolds (surfaces) endowed with properties such as color texture.
Biological Networks
Biological networks have also proved to be a popular application domain for graph signal processing, with recent research focusing on the analysis of data from systems known to have a network structure (such as the human brain), and also on the inference of a priori unknown biological networks [1]. For example, human brain activity signals – e.g., functional magnetic resonance imaging (fMRI) data – can be mapped on a network (graph) where each node corresponds to a brain region, and the edges (edge weights) represent the structural connectivity or the functional coherence between brain regions. Low frequencies (for example) in the graph signal represent similar activities in regions that are highly connected in functional brain networks, while high frequencies denote very different activities in those brain regions.
The GSP framework has also been proposed for the classification of brain graph signals and the analysis of anomalies or diseases [1]. For example, source localization algorithms based on sparse regularization can be used to localize the possible origins of Alzheimer’s disease based on a large set of repeated magnetic resonance imaging (MRI) scans. This can help understand the dynamics and origin of dementia, which is an important step towards developing effective treatment of neurodegenerative diseases.
Graph signal processing elements and biological priors are combined to infer networks and discover meaningful interactions in gene regulatory networks [1]. The inference of the structure of protein interaction networks has also been addressed with help of spectral graph templates. In particular, the observed matrix of mutual information can be approximated by some (unknown) analytic matrix function of the unobserved structure to be recovered. Observed data is then used to obtain the eigenvectors of the matrix representation of the graph model, and then the eigenvalues are estimated with the help of sparsity priors. The above examples are only some illustrations of the recent works that attempt to infer structures of biological networks using a GSP perspective. Biological networks that cannot be explicitly recorded and measured are potentially good applications for graph learning and inference methods in particular, which can uncover unknown interactions in the biological data.
[Continued on pp. 1112 in Graph Signal Processing: Overview, Challenges and Applications.]
Computer Vision and Graphics:
The computer vision community has recently shown an increasing interest in working with 3D geometric data [2]. Many machine learning techniques successfully working on images were tried “as is” on 3D geometric data, represented for this purpose in some way “digestible” by standard frameworks, e.g. as range images or rasterized volumes. The main drawback of such approaches is their treatment of geometric data as Euclidean structures. First, for complex 3D objects, Euclidean representations such as depth images or voxels may lose significant parts of the object or its fine details, or even break its topological structure. Second, Euclidean representations are not intrinsic, and vary when changing pose or deforming the object. Achieving invariance to shape deformations, a common requirement in many vision applications, demands very complex models and huge training sets due to the large number of degrees of freedom involved in describing nonrigid deformations (left side of figure 5, below).
[Image source. Click image to open in new window.]In the domain of computer graphics, on the other hand, working intrinsically with geometric shapes is a standard practice. In this field, 3D shapes are typically modeled as Riemannian manifolds and are discretized as meshes. [ … snip … ] By resorting to intrinsic deep neural networks, the invariance to isometric deformations is automatically built into the model, thus vastly reducing the number of degrees of freedom required to describe the invariance class.
[Continued on pp. 1718 in Geometric Deep Learning: Going Beyond Euclidean Data.]
Image and 3D Point Cloud Processing
GSP is widely used in applications such as image restoration or denoising, and image/video compression [1]. Graphs are used to capture the geometric structure in images, such as contours that carry crucial visual information, in order to avoid blurring them during the filtering process.
[Continued on pp. 1213 in Graph Signal Processing: Overview, Challenges and Applications.]
Machine Learning and Data Science
Graph methods have long played an important role in machine learning applications, as they provide a natural way to represent the structure of a dataset [1]. In this context, each vertex represents one data point to which a label can be associated, and a graph can be formed by connecting vertices with edge weights that are assigned based on a decreasing function of the distance between data points in the feature space. Graph signal processing then enables different types of processing, learning or filtering operations on values associated to graph vertices. When data labels are presented as signals on a (nearest neighbor) graph, graph signal regularization techniques can be used in the process of estimating labels, optimizing the prediction of unknown labels in classification or semisupervised learning problems.
Data clustering or community detection can also benefit from tools developed under the GSP framework. For example graph transforms, and especially graph wavelets, have been used to solve the classical problem of community detection. … the extension of clustering or community detection tasks to large scale systems generally relies on sampling or randomized strategy where GSP methods can also be very helpful. For example, fast graphbased filtering of random signals can be used to estimate the graph structure, and in particular to approximate eigenvectors that are often crucial in the design of clustering algorithms and other machine learning tasks.
[Continued on pp. 1314 in Graph Signal Processing: Overview, Challenges and Applications.]
Medical Imaging:
An application area where signals are naturally collected on nonEuclidean domains and where the methodologies we reviewed could be very useful is brain imaging [2]. A recent trend in neuroscience is to associate functional MRI traces with a precomputed connectivity rather than inferring it from the traces themselves. In this case, the challenge consists in processing and analyzing an array of signals collected over a complex topology, which results in subtle dependencies. In a recent work from Imperial College, graph CNN were used to detect disruptions of the brain functional networks associated with autism.
Molecule Design
See p.18 in Geometric Deep Learning: Going Beyond Euclidean Data.
Network Analysis
One of the classical examples used in many works on network analysis are citation networks [2]. A citation network is a graph where vertices represent papers and there is a directed edge $\small (i,j)$ if paper $\small i$ cites paper $\small j$. Typically, vertexwise features representing the content of the paper (e.g. histogram of frequent terms in the paper) are available. A prototypical classification application is to attribute each paper to a field. Traditional approaches work vertexwise, performing classification of each vertex’s feature vector individually. More recently, it was shown that classification can be considerably improved using information from neighbor vertices, e.g. with a CNN on graphs. The figure below shows an example of application of spectral and spatial graph CNN models on a citation network.
[Image source. Click image to open in new window.]Another fundamental problem in network analysis is ranking and community detection [2]. These can be estimated by solving an eigenvalue problem on an appropriately defined operator on the graph: for instance, the Fiedler vector (the eigenvector associated with the smallest nontrivial eigenvalue of the Laplacian) carries information on the graph partition with minimal cut, and the popular PageRank algorithm approximates page ranks with the principal eigenvector of a modified Laplacian operator. In some contexts, one may want develop datadriven versions of such algorithms, that can adapt to model mismatch and perhaps provide a faster alternative to diagonalization methods. By unrolling power iterations, one obtains a Graph Neural Network architecture whose parameters can be learnt with backpropagation from labeled examples, similarly to the Learnt Sparse Coding paradigm. We are currently exploring this connection by constructing multiscale versions of graph neural networks.
Particle Physics and Chemistry
See p.18 in Geometric Deep Learning: Going Beyond Euclidean Data.
Recommender Systems
Recommending movies on Netflix, friends on Facebook, or products on Amazon are a few examples of recommender systems that have recently become ubiquitous in a broad range of applications [2]. Mathematically, a recommendation method can be posed as a matrix completion problem, where columns and rows represent users and items, respectively, and matrix values represent a score determining whether or not a user would like an item. Given a small subset of known elements of the matrix, the goal is to fill in the rest.
[Continued on pp.1617 in Geometric Deep Learning: Going Beyond Euclidean Data.]
Sensor Networks
A graph represents the relative positions of sensors in the environment, and the application goals include compression, denoising, reconstruction, or distributed processing of the sensor data [1]. A first approach to define a graph associated to a sensor network is to choose edge weights as a decreasing function of distance between nodes (sensors). Then, data observations that are similar at neighboring nodes lead naturally to a smooth (lowpass) graph signal. Such a smooth graph signal model makes it possible to detect outliers or abnormal values by high pass filtering and thresholding, or to build effective signal reconstruction methods from sparse set of sensor readings, which can potentially lead to significant savings in energy resources, bandwidth and latency in sensor network applications.
In one application, urban data processing relies on data that naturally live on networks, such as energy, transportation or road networks [1]. In these applications cases, GSP has been used, for example, to monitor urban air pollution, or to monitor and analyze power consumption. Graph signal processing tools have also been used for analyzing traffic and mobility in large cities; for example, wavelets on graphs can serve to extract useful traffic patterns to detect disruptive traffic events such as congestion. Graph wavelet coefficients at different scales permit the inference of useful information such as the origin, propagation, and the span of traffic congestion.
[Continued on p. 11 in Graph Signal Processing: Overview, Challenges and Applications.]
Sources for the preceding material (and references therein)
[1] Graph Signal Processing: Overview, Challenges and Applications
[2] Geometric Deep Learning: Going Beyond Euclidean Data
Background  Introduction  Applications  Excerpts  References
Relevant Excerpts
See also these specific literature discussions.
To give an indication of the application of graph signal processing, here are some selected examples from
[Image source. Click image to open in new window.]
Graph signals – genetic profiles:
[Image source. Click image to open in new window.]Frequencies of the Laplacian:
[Image source. Click image to open in new window.]Importance of the underlying graph (note: re discussions of multiview graphs, …):
[Image source. Click image to open in new window.]Network of economic sectors of the United States:
[Image source. Click image to open in new window.]Disaggregated GDP of the United States:
[Image source. Click image to open in new window.]Application: cancer subtype classification:
[Image source. Click image to open in new window.]Genetic network:
[Image source. Click image to open in new window.]Genetic profiles:
[Image source. Click image to open in new window.]Improving knearest neighbor classification:
[Image source. Click image to open in new window.]Distinguishing power:
[Image source. Click image to open in new window.]Increasing accuracy by selecting the best frequencies:
[Image source. Click image to open in new window.]Application: explaining human learning rates:
[Image source. Click image to open in new window.]Measuring brain state variability:
[Image source. Click image to open in new window.]Diffusion as lowpass filtering:
[Image source. Click image to open in new window.]Computing correlation for three signals:
[Image source. Click image to open in new window.]Lowpass filtering reveals correlation:
[Image source. Click image to open in new window.]Summary:
[Image source. Click image to open in new window.]Background  Introduction  Applications  Excerpts  References
References / Additional Reading
See also:
 [Alex Smola] Kernels and Regularization on Graphs (2003) [local copy; slides]
 [arXiv] A Tutorial on Spectral Clustering (Nov 2007)
 Wavelets on Graphs via Spectral Graph Theory (2011) [local copy]
 The wavelet approach from this paper was later used (same authors in each of the following)
 Graph Convolutional Neural Networks via Scattering (Mar 2018)
 Graph Generation via Scattering (Sep 2018)
 [arXiv] Discrete Signal Processing on Graphs (Dec 2012)
 Discrete Signal Processing on Graphs: Graph Fourier Transform (2013)
 [arXiv] The Emerging Field of Signal Processing on Graphs (Mar 2013)
 [arXiv: Yann LeCun] Spectral Networks and Deep Locally Connected Networks on Graphs (May 2014)
 Bandlimited graph signal reconstruction by diffusion operator (2016)
 [arXiv: Yann LeCun] Geometric Deep Learning: Going Beyond Euclidean Data (May 2017)
 When Slepian Meets Fiedler: Putting a Focus on the Graph Spectrum (Jul 2017)
 [M.Sc. thesis?] Spectral Graph Theory and Deep Learning on Graphs (2017)
 [arXiv: Yann LeCun] Graph Signal Processing: Overview, Challenges and Applications (Mar 2018)
 [arXiv] Spectral Domain Sampling of Graph Signals (May 2018)
 [arXiv] Graph Spectral Regularization for Neural Network Interpretability (Sep 2018)
 [arXiv] Weighted Spectral Embedding of Graphs (Sep 2018) [code]
 [arXiv] A Note on Spectral Clustering and SVD of Graph Data
 [arXiv] Graph Convolutional Neural Networks via Scattering (Mar 2018)
 [arXiv] Graph Generation via Scattering (Sep 2018)
 [arXiv] Hypergraph Neural Networks (Sep 2018)
 [arXiv] Network Distance Based on Laplacian Flows on Graphs (Oct 2018)
 [arXiv] Graph Classification with Geometric Scattering (Oct 2018)
 [arXiv] Higherorder Spectral Clustering for Heterogeneous Graphs (Oct 2018)
[arXiv] Graph Laplacian Mixture Model (Oct 2018) [Summary]“We propose a novel generative model that explains a collection of distinct data naturally living on different graphs. We assume the mapping of data to graphs is not known and investigate the problem of jointly clustering a set of data and learning a graph for each of the clusters. …
“In this paper, we precisely consider data that naturally forms clusters, where signals from each of the clusters live on a different graph. This allows analysis of more complex datasets where simple graph learning methods would suffer from intertwined data and thus lose the ability to capture a meaningful graph structure. In particular, we study the problem of multiplegraph inference from a general group of signals that are an unknown combination of data of different structures. Namely, we propose a generative model for data represented as a mixture of signals naturally living on a collection of different graphs. As is often the case with data, separation of these signals into clusters is assumed to be unknown. We thus propose an algorithm that will jointly cluster the signals and infer multiple graph structures, one for each of the clusters. Numerical simulations suggest the effectiveness of our method in realworld datasets in both clustering signals and recovering the structure of corresponding graphs. As we will deal with clustering in large dimensionalities in these settings, it is worth noting that inherently high dimensional clustering problems often suffer from the curse of dimensionality and poor interpretability. …“
Source: Graph Laplacian Mixture Model[Conference] 3rd Graph Signal Processing Workshop  GSP’18 [Program: includes slides, video]
[Research Seminar] Graph Signal Processing: Fundamentals and Applications
 [slides] Graph Signal Processing: Filterbanks, Sampling and Applications to Machine Learning and Video Coding [local copy]
 [slides] A Short Tutorial on Graph Laplacians, Laplacian Embedding, and Spectral Clustering [local copy]
 [slides] Spectral Algorithms [local copy]
[slides] Graph Signal Processing for Image Coding & Restoration [local copy]
GSP: interactive, online demo: scroll down to the “Example”
 [Glossary] Geometric deep learning
[Glossary] Hyperbolic Embeddings
 [code] Graph Signal Processing toolbox
 [code] PyGSP: Graph Signal Processing in Python
 [code] GitHub repositories
Videos
[Video source. Click video to open a fullscreen version.]
[Video source. Click video to open a fullscreen version.]