utils module¶

mvcluster.utils.clustering_accuracy(y_true: list, y_pred: list) → float[source]¶

mvcluster.utils.clustering_f1_score(y_true: list, y_pred: list, **kwargs) → float[source]¶

mvcluster.utils.cmat_to_psuedo_y_true_and_y_pred(cmat: ndarray) → tuple[source]¶

mvcluster.utils.datagen(dataset: str) → Tuple[List[ndarray], List[ndarray], ndarray][source]¶: Dataset loader dispatcher.

mvcluster.utils.init_G_F(XW: ndarray, k: int) → tuple[source]¶

Initialize cluster assignments G and centroids F using KMeans.

Parameters:

XW – Array [n_samples, embedding_dim], data to cluster.
k – Number of clusters.

Returns:

Tuple (G, F) where: - G: 1D array of length n_samples, initial cluster labels. - F: 2D array [k, embedding_dim], initial cluster centroids.

Return type:

(np.ndarray, np.ndarray)

mvcluster.utils.init_W(X: ndarray, f: int) → ndarray[source]¶

Initialize projection matrix W using truncated SVD.

Parameters:

X – Array [n_samples, n_features], input data matrix.
f – Target embedding dimension.

Returns:

Projection matrix [n_features, f].

Return type:

np.ndarray

mvcluster.utils.ordered_confusion_matrix(y_true: list, y_pred: list) → ndarray[source]¶

mvcluster.utils.prepare_embeddings_from_views(As: list[spmatrix], Xs: list[ndarray], tf_idf: bool = False, beta: float = 1.0) → list[ndarray][source]¶

Preprocess all (A, X) pairs and compute final embeddings (H = A @ X).

Parameters:

As (list of sp.spmatrix) – Adjacency matrices for each view.
Xs (list of np.ndarray) – Feature matrices for each view.
tf_idf (bool) – Whether to apply TF-IDF transformation to features.
beta (float) – Scaling for self-loops in adjacency normalization.

Returns:

List of processed H embeddings (one per view).

Return type:

list of np.ndarray

mvcluster.utils.preprocess_dataset(adj: spmatrix, features: ndarray, tf_idf: bool = False, beta: float = 1.0, max_features: int = 5000) → tuple[spmatrix, ndarray][source]¶

Normalize adjacency matrix and feature matrix.

Parameters:

adj (sp.spmatrix) – Sparse adjacency matrix.
features (np.ndarray) – Feature matrix (dense or sparse).
tf_idf (bool, optional) – Whether to apply TF-IDF transformation, by default False.
beta (float, optional) – Scaling factor for self-loops, by default 1.0.
max_features (int, optional) – Maximum number of feature columns to retain, by default 1000.

Returns:

Tuple containing the normalized adjacency and processed features.

Return type:

tuple[sp.spmatrix, np.ndarray]

mvcluster.utils.visualize_clusters(X, labels, method='pca', title='Cluster Visualization', view_index=None)[source]¶

Visualize clustering results using PCA, SVD, or t-SNE.

Parameters:

X (array-like or list of arrays) – Feature data or multi-view data.
labels (array-like) – Predicted labels.
method (str) – ‘pca’, ‘svd’, or ‘tsne’.
title (str) – Title of the plot.
view_index (int or None) – If X is multi-view, choose which view to plot (None = concatenate all views).

utils module¶

mvcluster

Navigation

Related Topics