Common Principal Component Subspace and Clustering Symmetric Positive Semi-Definite Matrices

Hanchao Zhang

Division of Biostatistics, Department of Population Health, Grossman School of Medicine, New York University

Thaddeus Tarpey

Division of Biostatistics, Department of Population Health, Grossman School of Medicine, New York University

Refresh Clustering Algorithm: K-Means

K-Means Algorithm

\[\begin{align*} & \underset{\mathscr C}{\text{minimize }} \underbrace{\frac{1}{n}\sum_{j=1}^m \sum_{k \in \mathscr C_j} \big\Vert x_k - \bar x_{\mathscr C_j}\big\Vert^2}_{\text{within cluster sum of squares}} \\ &\textbf{ OR } \\ &\underset{\mathscr C}{\text{maximize}} \underbrace{\sum_{i=m} \frac{n_{\mathscr C_j}}{n}\cdot\big\Vert\bar x_{\mathscr C_j}\big\Vert^2 }_{\text{between cluster sum of squares}} \end{align*}\] Minimize the Euclidean distance between the data points to its closet centroid

K-Means Algorithm

\[\begin{align*} & \underset{\mathscr C}{\text{minimize }} \underbrace{\frac{1}{n}\sum_{j=1}^m \sum_{k \in \mathscr C_j} \big\Vert x_k - \bar x_{\mathscr C_j}\big\Vert^2}_{\text{within cluster sum of squares}} \\ &\textbf{ OR } \\ &\underset{\mathscr C}{\text{maximize}} \underbrace{\sum_{i=m} \frac{n_{\mathscr C_j}}{n}\cdot\big\Vert\bar x_{\mathscr C_j}\big\Vert^2 }_{\text{between cluster sum of squares}} \end{align*}\] Minimize the Euclidean distance between the data points to its closet centroid

Clustering Centers

\[\begin{align*} \mathbf u_j & = \mathscr E[\mathbf X \vert \mathbf X \in \mathscr C_j] \\ & = \frac{1}{n_j}\sum_{i=1}^{n_j} \mathbf x_i \cdot \delta(\mathbf x_i \in \mathscr C_j), \end{align*}\] where the centers of cluster \(\mathscr C_j\) is the average of the data points that is classified to the group \(\mathscr C_j\).

Symmetric Positive Semi-Definite (SPD) Matrix-Valued Data

Functional Connectivity Matrix

Functional Connectivity. [Gillebert et al., 2013]

Functional connectivity is defined as the temporal coincidence of spatially distant neurophysiological events.

For each participant \(i\), let \(\mathbf{y}_{ij} \in \mathbb{R}^{\mathscr T}\) be the longitudinal measurement of blood oxygen level-dependent (BOLD) signal on the region of interest \(j\), \(j = 1,2,\ldots,p\).

The functional connectivity matrix for participant \(i\) is the covariance matrix of \(\mathbf y_{i}\): \(\mathbf{\Sigma}_i = \textbf{Cov}(\mathbf{y}_i) \succeq \mathbf{0}\)

We want to link the functional connectivity matrix to the clinical outcomes using clustering methods.

SPD Matrices and K-Tensors Algorithm: Clustering SPD Matrices

Positive Semi-Definite Matrices

Examples of Positive Semi-Definite Matrices

  • \(\mathbf \psi_{2\times 2} = \begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}\)
  • \(\mathbf \psi_{3\times 3} = \begin{bmatrix} 1 & 0.5 & 0.3 \\ 0.5 & 1 & 0.2 \\ 0.3 & 0.2 & 1 \end{bmatrix}\)
  • \(\mathbf \psi_{p\times p} = \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{p1} & x_{p2} & \cdots & x_{pp} \end{bmatrix}\)

Visualization

A ellipsoid centered at origin is defined by the set of points \(\mathbf x \in \mathbb R^p\) such that \(\mathbf x^T \mathbf \psi \mathbf x = 1\).

Positive Semi-Definite Matrices Visualization