kNN#

class scdiffeq.tools._knn.kNN(adata: AnnData, use_key: str = 'X_pca', n_neighbors: int = 20, metric: str = 'euclidean', space: Space | None = None)[source]#

Bases: object

k-Nearest Neighbors container using voyager backend.

This class provides a kNN graph interface for single-cell data stored in AnnData objects. It uses voyager (Spotify’s HNSW implementation) for efficient approximate nearest neighbor search.

Parameters:
  • adata – AnnData object containing the data.

  • use_key – Key to fetch data from adata (e.g., “X_pca”). Default: “X_pca”.

  • n_neighbors – Number of neighbors to return in queries. Default: 20.

  • metric – Distance metric to use. One of “euclidean”, “cosine”, “inner_product”. Default: “euclidean”.

  • space – Alternative to metric - directly specify voyager.Space. If provided, overrides metric parameter.

adata#

The AnnData object.

use_key#

Key used to fetch data.

n_neighbors#

Number of neighbors for queries.

space#

The voyager.Space used for distance computation.

__init__(adata: AnnData, use_key: str = 'X_pca', n_neighbors: int = 20, metric: str = 'euclidean', space: Space | None = None)[source]#
property X: ndarray#

Fetch the data array from adata.

property n_dim: int#

Number of dimensions in the data.

property n_obs: int#

Number of observations (cells) in the index.

property index: Index#

The voyager Index object.

_build() None[source]#

Build the kNN index by adding all items from the data.

query(X_query: ndarray, n_neighbors: int | None = None, include_distances: bool = False) ndarray | Tuple[ndarray, ndarray][source]#

Query the kNN index for nearest neighbors.

Parameters:
  • X_query – Query points of shape (n_queries, n_dim).

  • n_neighbors – Number of neighbors to return. If None, uses self.n_neighbors.

  • include_distances – If True, also return distances.

Returns:

neighbors: Array of shape (n_queries, n_neighbors) with neighbor indices. If include_distances is True:

Tuple of (neighbors, distances) arrays.

Return type:

If include_distances is False

_count_values(col: Series) dict[source]#

Count value occurrences in a Series.

_max_count(col: Series) str[source]#

Get the most frequent value in a Series.

count(query_result: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) List[dict] | List[str][source]#

Count neighbor annotations from query results.

Parameters:
  • query_result – Array of neighbor indices from query().

  • obs_key – Key in adata.obs to count.

  • max_only – If True, return only the most frequent annotation per query.

  • n_neighbors – Number of neighbors (for reshaping). If None, uses self.n_neighbors.

Returns:

List of dicts mapping annotation values to counts. If max_only is True:

List of most frequent annotation values.

Return type:

If max_only is False

aggregate(X_query: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) DataFrame[source]#

Query neighbors and aggregate annotation counts.

Combines query() and count() into a single operation.

Parameters:
  • X_query – Query points of shape (n_queries, n_dim).

  • obs_key – Key in adata.obs to aggregate.

  • max_only – If True, return only the most frequent annotation per query.

  • n_neighbors – Number of neighbors. If None, uses self.n_neighbors.

Returns:

DataFrame with aggregated counts or most frequent annotations.

multi_aggregate(X_query: ndarray, obs_key: str, max_only: bool = False, n_neighbors: int | None = None) List[DataFrame] | DataFrame[source]#

Aggregate annotations for multiple query sets.

Parameters:
  • X_query – Multiple query sets of shape (n_sets, n_queries, n_dim).

  • obs_key – Key in adata.obs to aggregate.

  • max_only – If True, return only the most frequent annotation per query.

  • n_neighbors – Number of neighbors. If None, uses self.n_neighbors.

Returns:

List of DataFrames, one per query set. If max_only is True:

Single DataFrame with columns for each query set.

Return type:

If max_only is False

__repr__() str[source]#

String representation of the kNN instance.