utils module¶
- mvcluster.utils.datagen(dataset: str) Tuple[List[ndarray], List[ndarray], ndarray] [source]¶
Dataset loader dispatcher.
- mvcluster.utils.init_G_F(XW: ndarray, k: int) tuple [source]¶
Initialize cluster assignments G and centroids F using KMeans.
- Parameters:
XW – Array [n_samples, embedding_dim], data to cluster.
k – Number of clusters.
- Returns:
Tuple (G, F) where: - G: 1D array of length n_samples, initial cluster labels. - F: 2D array [k, embedding_dim], initial cluster centroids.
- Return type:
(np.ndarray, np.ndarray)
- mvcluster.utils.init_W(X: ndarray, f: int) ndarray [source]¶
Initialize projection matrix W using truncated SVD.
- Parameters:
X – Array [n_samples, n_features], input data matrix.
f – Target embedding dimension.
- Returns:
Projection matrix [n_features, f].
- Return type:
np.ndarray
- mvcluster.utils.prepare_embeddings_from_views(As: list[spmatrix], Xs: list[ndarray], tf_idf: bool = False, beta: float = 1.0) list[ndarray] [source]¶
Preprocess all (A, X) pairs and compute final embeddings (H = A @ X).
- Parameters:
As (list of sp.spmatrix) – Adjacency matrices for each view.
Xs (list of np.ndarray) – Feature matrices for each view.
tf_idf (bool) – Whether to apply TF-IDF transformation to features.
beta (float) – Scaling for self-loops in adjacency normalization.
- Returns:
List of processed H embeddings (one per view).
- Return type:
list of np.ndarray
- mvcluster.utils.preprocess_dataset(adj: spmatrix, features: ndarray, tf_idf: bool = False, beta: float = 1.0, max_features: int = 5000) tuple[spmatrix, ndarray] [source]¶
Normalize adjacency matrix and feature matrix.
- Parameters:
adj (sp.spmatrix) – Sparse adjacency matrix.
features (np.ndarray) – Feature matrix (dense or sparse).
tf_idf (bool, optional) – Whether to apply TF-IDF transformation, by default False.
beta (float, optional) – Scaling factor for self-loops, by default 1.0.
max_features (int, optional) – Maximum number of feature columns to retain, by default 1000.
- Returns:
Tuple containing the normalized adjacency and processed features.
- Return type:
tuple[sp.spmatrix, np.ndarray]
- mvcluster.utils.visualize_clusters(X, labels, method='pca', title='Cluster Visualization', view_index=None)[source]¶
Visualize clustering results using PCA, SVD, or t-SNE.
- Parameters:
X (array-like or list of arrays) – Feature data or multi-view data.
labels (array-like) – Predicted labels.
method (str) – ‘pca’, ‘svd’, or ‘tsne’.
title (str) – Title of the plot.
view_index (int or None) – If X is multi-view, choose which view to plot (None = concatenate all views).