utils module

mvcluster.utils.clustering_accuracy(y_true: list, y_pred: list) float[source]
mvcluster.utils.clustering_f1_score(y_true: list, y_pred: list, **kwargs) float[source]
mvcluster.utils.cmat_to_psuedo_y_true_and_y_pred(cmat: ndarray) tuple[source]
mvcluster.utils.datagen(dataset: str) Tuple[List[ndarray], List[ndarray], ndarray][source]

Dataset loader dispatcher.

mvcluster.utils.init_G_F(XW: ndarray, k: int) tuple[source]

Initialize cluster assignments G and centroids F using KMeans.

Parameters:
  • XW – Array [n_samples, embedding_dim], data to cluster.

  • k – Number of clusters.

Returns:

Tuple (G, F) where: - G: 1D array of length n_samples, initial cluster labels. - F: 2D array [k, embedding_dim], initial cluster centroids.

Return type:

(np.ndarray, np.ndarray)

mvcluster.utils.init_W(X: ndarray, f: int) ndarray[source]

Initialize projection matrix W using truncated SVD.

Parameters:
  • X – Array [n_samples, n_features], input data matrix.

  • f – Target embedding dimension.

Returns:

Projection matrix [n_features, f].

Return type:

np.ndarray

mvcluster.utils.ordered_confusion_matrix(y_true: list, y_pred: list) ndarray[source]
mvcluster.utils.prepare_embeddings_from_views(As: list[spmatrix], Xs: list[ndarray], tf_idf: bool = False, beta: float = 1.0) list[ndarray][source]

Preprocess all (A, X) pairs and compute final embeddings (H = A @ X).

Parameters:
  • As (list of sp.spmatrix) – Adjacency matrices for each view.

  • Xs (list of np.ndarray) – Feature matrices for each view.

  • tf_idf (bool) – Whether to apply TF-IDF transformation to features.

  • beta (float) – Scaling for self-loops in adjacency normalization.

Returns:

List of processed H embeddings (one per view).

Return type:

list of np.ndarray

mvcluster.utils.preprocess_dataset(adj: spmatrix, features: ndarray, tf_idf: bool = False, beta: float = 1.0, max_features: int = 5000) tuple[spmatrix, ndarray][source]

Normalize adjacency matrix and feature matrix.

Parameters:
  • adj (sp.spmatrix) – Sparse adjacency matrix.

  • features (np.ndarray) – Feature matrix (dense or sparse).

  • tf_idf (bool, optional) – Whether to apply TF-IDF transformation, by default False.

  • beta (float, optional) – Scaling factor for self-loops, by default 1.0.

  • max_features (int, optional) – Maximum number of feature columns to retain, by default 1000.

Returns:

Tuple containing the normalized adjacency and processed features.

Return type:

tuple[sp.spmatrix, np.ndarray]

mvcluster.utils.visualize_clusters(X, labels, method='pca', title='Cluster Visualization', view_index=None)[source]

Visualize clustering results using PCA, SVD, or t-SNE.

Parameters:
  • X (array-like or list of arrays) – Feature data or multi-view data.

  • labels (array-like) – Predicted labels.

  • method (str) – ‘pca’, ‘svd’, or ‘tsne’.

  • title (str) – Title of the plot.

  • view_index (int or None) – If X is multi-view, choose which view to plot (None = concatenate all views).