kNN#
- class scdiffeq.tools._knn.kNN(adata: AnnData, use_key: str = 'X_pca', n_neighbors: int = 20, metric: str = 'euclidean', space: Space | None = None)[source]#
Bases:
objectk-Nearest Neighbors container using voyager backend.
This class provides a kNN graph interface for single-cell data stored in AnnData objects. It uses voyager (Spotify’s HNSW implementation) for efficient approximate nearest neighbor search.
- Parameters:
adata – AnnData object containing the data.
use_key – Key to fetch data from adata (e.g., “X_pca”). Default: “X_pca”.
n_neighbors – Number of neighbors to return in queries. Default: 20.
metric – Distance metric to use. One of “euclidean”, “cosine”, “inner_product”. Default: “euclidean”.
space – Alternative to metric - directly specify voyager.Space. If provided, overrides metric parameter.
- adata#
The AnnData object.
- use_key#
Key used to fetch data.
- n_neighbors#
Number of neighbors for queries.
- space#
The voyager.Space used for distance computation.
- __init__(adata: AnnData, use_key: str = 'X_pca', n_neighbors: int = 20, metric: str = 'euclidean', space: Space | None = None)[source]#
- property index: Index#
The voyager Index object.
- query(X_query: ndarray, n_neighbors: int | None = None, include_distances: bool = False) ndarray | Tuple[ndarray, ndarray][source]#
Query the kNN index for nearest neighbors.
- Parameters:
X_query – Query points of shape (n_queries, n_dim).
n_neighbors – Number of neighbors to return. If None, uses self.n_neighbors.
include_distances – If True, also return distances.
- Returns:
neighbors: Array of shape (n_queries, n_neighbors) with neighbor indices. If include_distances is True:
Tuple of (neighbors, distances) arrays.
- Return type:
If include_distances is False
- count(query_result: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) List[dict] | List[str][source]#
Count neighbor annotations from query results.
- Parameters:
query_result – Array of neighbor indices from query().
obs_key – Key in adata.obs to count.
max_only – If True, return only the most frequent annotation per query.
n_neighbors – Number of neighbors (for reshaping). If None, uses self.n_neighbors.
- Returns:
List of dicts mapping annotation values to counts. If max_only is True:
List of most frequent annotation values.
- Return type:
If max_only is False
- aggregate(X_query: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) DataFrame[source]#
Query neighbors and aggregate annotation counts.
Combines query() and count() into a single operation.
- Parameters:
X_query – Query points of shape (n_queries, n_dim).
obs_key – Key in adata.obs to aggregate.
max_only – If True, return only the most frequent annotation per query.
n_neighbors – Number of neighbors. If None, uses self.n_neighbors.
- Returns:
DataFrame with aggregated counts or most frequent annotations.
- multi_aggregate(X_query: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) List[DataFrame] | DataFrame[source]#
Aggregate annotations for multiple query sets.
- Parameters:
X_query – Multiple query sets of shape (n_sets, n_queries, n_dim).
obs_key – Key in adata.obs to aggregate.
max_only – If True, return only the most frequent annotation per query.
n_neighbors – Number of neighbors. If None, uses self.n_neighbors.
- Returns:
List of DataFrames, one per query set. If max_only is True:
Single DataFrame with columns for each query set.
- Return type:
If max_only is False