`kNN`#

class scdiffeq.tools._knn.kNN(adata: AnnData, use_key: str = 'X_pca', n_neighbors: int = 20, metric: str = 'euclidean', space: Space | None = None)[source]#

Bases: object

k-Nearest Neighbors container using voyager backend.

This class provides a kNN graph interface for single-cell data stored in AnnData objects. It uses voyager (Spotify’s HNSW implementation) for efficient approximate nearest neighbor search.

Parameters:

adata – AnnData object containing the data.
use_key – Key to fetch data from adata (e.g., “X_pca”). Default: “X_pca”.
n_neighbors – Number of neighbors to return in queries. Default: 20.
metric – Distance metric to use. One of “euclidean”, “cosine”, “inner_product”. Default: “euclidean”.
space – Alternative to metric - directly specify voyager.Space. If provided, overrides metric parameter.

adata#: The AnnData object.

use_key#: Key used to fetch data.

n_neighbors#: Number of neighbors for queries.

space#: The voyager.Space used for distance computation.

__init__(adata: AnnData, use_key: str = 'X_pca', n_neighbors: int = 20, metric: str = 'euclidean', space: Space | None = None)[source]#

property X: ndarray#: Fetch the data array from adata.

property n_dim: int#: Number of dimensions in the data.

property n_obs: int#: Number of observations (cells) in the index.

property index: Index#: The voyager Index object.

_build() → None[source]#: Build the kNN index by adding all items from the data.

query(X_query: ndarray, n_neighbors: int | None = None, include_distances: bool = False) → ndarray | Tuple[ndarray, ndarray][source]#

Query the kNN index for nearest neighbors.

Parameters:

X_query – Query points of shape (n_queries, n_dim).
n_neighbors – Number of neighbors to return. If None, uses self.n_neighbors.
include_distances – If True, also return distances.

Returns:

neighbors: Array of shape (n_queries, n_neighbors) with neighbor indices. If include_distances is True:

Tuple of (neighbors, distances) arrays.

Return type:

If include_distances is False

_count_values(col: Series) → dict[source]#: Count value occurrences in a Series.

_max_count(col: Series) → str[source]#: Get the most frequent value in a Series.

count(query_result: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) → List[dict] | List[str][source]#

Count neighbor annotations from query results.

Parameters:

query_result – Array of neighbor indices from query().
obs_key – Key in adata.obs to count.
max_only – If True, return only the most frequent annotation per query.
n_neighbors – Number of neighbors (for reshaping). If None, uses self.n_neighbors.

Returns:

List of dicts mapping annotation values to counts. If max_only is True:

List of most frequent annotation values.

Return type:

If max_only is False

aggregate(X_query: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) → DataFrame[source]#

Query neighbors and aggregate annotation counts.

Combines query() and count() into a single operation.

Parameters:

X_query – Query points of shape (n_queries, n_dim).
obs_key – Key in adata.obs to aggregate.
max_only – If True, return only the most frequent annotation per query.
n_neighbors – Number of neighbors. If None, uses self.n_neighbors.

Returns:

DataFrame with aggregated counts or most frequent annotations.

multi_aggregate(X_query: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) → List[DataFrame] | DataFrame[source]#

Aggregate annotations for multiple query sets.

Parameters:

X_query – Multiple query sets of shape (n_sets, n_queries, n_dim).
obs_key – Key in adata.obs to aggregate.
max_only – If True, return only the most frequent annotation per query.
n_neighbors – Number of neighbors. If None, uses self.n_neighbors.

Returns:

List of DataFrames, one per query set. If max_only is True:

Single DataFrame with columns for each query set.

Return type:

If max_only is False

__repr__() → str[source]#: String representation of the kNN instance.

kNN#

`kNN`#