Preprocessing

preprocessing.dendrogram(adata, cluster_header, *, tl_kwargs={}, pl_kwargs={}, save=False, figsize=None, output_folder='', outputfilename_suffix='')

Generating a dendrogram from the AnnData object.

Parameters

adata: AnnData

Annotated data matrix.

cluster_header: str

Column in adata.obs storing cell annotation. Passed into scanpy’s dendrogram as groupby.

tl_kwargs: dict

Additional parameters to pass to sc.tl.dendrogram.

pl_kwargs: dict

Additional parameters to pass to sc.pl.dendrogram.

save: bool | str (default: False)

Whether to save plot in output_folder. If string, choose the type of file to save as (‘png’(default), ‘svg’, ‘pdf).

figsize: tuple (default: (12, 2))

figure.figsize for plt.rc_context.

output_folder: str (default: “”)

Output folder. Created if doesn’t exist.

outputfilename_suffix: str (default: “”)

Suffix for all output files.

Returns

does not return anything. Adds adata.uns[“dendrogram_{cluster_header}”] to passed in adata.

preprocessing.prep_medians(adata, cluster_header, use_mean=False, positive_genes_only=True, plot=False)

Calculating the median expression matrix. Subsetting adata if positive_genes_only = True.

Parameters

adata: AnnData

Annotated data matrix.

cluster_header: str

Column in adata.obs storing cell annotation.

use_mean: bool (default: False)

Whether to use the mean (vs median) for minimum gene expression threshold.

positive_genes_only: bool (default: True)

Whether to subset AnnData to only have genes with median/mean expression greater than 0.

Returns

adata: AnnData

AnnData with median expression values stored in adata.varm[“medians_{cluster_header}”].

preprocessing.get_medians(adata, cluster_header, use_mean=False)

Calculating the median (mean) expression per gene for each cluster_header.

Parameters

adata: AnnData

Annotated data matrix.

cluster_header: str

Column in adata.obs storing cell annotation.

use_mean: bool (default: False)

Whether to use the mean (vs median) for minimum gene expression threshold.

Returns

cluster_medians: pd.DataFrame

Gene-by-cluster median (mean) expression dataframe.

preprocessing.prep_binary_scores(adata, cluster_header, medians_header='medians_')

Calculating the binary scores of each gene per cluster_header.

Parameters

adata: AnnData

Annotated data matrix.

cluster_header: str

Column in adata.obs storing cell annotation.

medians_header: str (default: “medians_{cluster_header}”)

Key in adata.varm storing median expression matrix.

Returns

adata: AnnData

AnnData with binary scores stored in adata.varm[“binary_scores_{cluster_header}”].

preprocessing.plot_varm(adata, varm_key, nonzero=False, scale=None, figsize=(6, 4), show=True, save=False, output_folder='')

Plotting histogram of median expression per gene per cluster.

Parameters:

adata: AnnData

Annotated data matrix.

varm_key: str

Key in adata.varm storing calculated medians or binary scores.

nonzero: bool

Whether to remove zeros from histogram.

scale: str

How to scale the y-axis.

figsize: tuple

Width and height of plot.

show: bool

Whether to show the plot.

save: bool | str (default: False)

Whether to save plot. If string, choose the type of file to save as (“png”, “svg”, “pdf”).

output_folder: str (default: “”)

Output folder for output files.

Returns:

fig: matplotlib.pyplot.figure

Histogram of adata.varm[varm_key]

preprocessing.spaceTx_genefilter(adata, lower_percentile=0.1, upper_percentile=0.99, min_txLength=700, species='human', species_dict=None, gencode_folder='gencode_annotation')

Filtering genes for spatial gene probe panel design.

Parameters

adata: AnnData

Annotated data matrix.

lower_percentile: float (default: 0.1)

Lower quartile percentile to filter non-0 median gene expression.

upper_percentile: float (default: 0.99)

Upper quartile percentile to filter non-0 median gene expression.

min_txLength: int (default: 700)

Minimum transcript length.

species: [“human”, “mouse”, “other”] (default: “human”)

Species relating to gencode_annotation.

Returns

adata: AnnData

Subset AnnData based on lower_percentile, upper_percentile, min_txLength.