Preprocessing

preprocessing.dendrogram(adata, cluster_header, *, tl_kwargs={}, pl_kwargs={}, save=False, figsize=None, output_folder='', outputfilename_suffix='')

Generating a dendrogram from the AnnData object.

Parameters

adata: AnnData
Annotated data matrix.

cluster_header: str
Column in adata.obs storing cell annotation. Passed into scanpy’s dendrogram as groupby.

tl_kwargs: dict
Additional parameters to pass to sc.tl.dendrogram.

pl_kwargs: dict
Additional parameters to pass to sc.pl.dendrogram.

save: bool | str (default: False)
Whether to save plot in output_folder. If string, choose the type of file to save as (‘png’(default), ‘svg’, ‘pdf).

figsize: tuple (default: (12, 2))
figure.figsize for plt.rc_context.

output_folder: str (default: “”)
Output folder. Created if doesn’t exist.

outputfilename_suffix: str (default: “”)
Suffix for all output files.

Returns

does not return anything. Adds adata.uns[“dendrogram_{cluster_header}”] to passed in adata.

preprocessing.prep_medians(adata, cluster_header, use_mean=False, positive_genes_only=True, plot=False)

Calculating the median expression matrix. Subsetting adata if positive_genes_only = True.

Parameters

adata: AnnData
Annotated data matrix.

cluster_header: str
Column in adata.obs storing cell annotation.

use_mean: bool (default: False)
Whether to use the mean (vs median) for minimum gene expression threshold.

positive_genes_only: bool (default: True)
Whether to subset AnnData to only have genes with median/mean expression greater than 0.

Returns

adata: AnnData: AnnData with median expression values stored in adata.varm[“medians_{cluster_header}”].

preprocessing.get_medians(adata, cluster_header, use_mean=False)

Calculating the median (mean) expression per gene for each cluster_header.

Parameters

adata: AnnData
Annotated data matrix.

cluster_header: str
Column in adata.obs storing cell annotation.

use_mean: bool (default: False)
Whether to use the mean (vs median) for minimum gene expression threshold.

Returns

cluster_medians: pd.DataFrame: Gene-by-cluster median (mean) expression dataframe.

preprocessing.prep_binary_scores(adata, cluster_header, medians_header='medians_')

Calculating the binary scores of each gene per cluster_header.

Parameters

adata: AnnData
Annotated data matrix.

cluster_header: str
Column in adata.obs storing cell annotation.

medians_header: str (default: “medians_{cluster_header}”)
Key in adata.varm storing median expression matrix.

Returns

adata: AnnData: AnnData with binary scores stored in adata.varm[“binary_scores_{cluster_header}”].

preprocessing.plot_varm(adata, varm_key, nonzero=False, scale=None, figsize=(6, 4), show=True, save=False, output_folder='')

Plotting histogram of median expression per gene per cluster.

Parameters:

adata: AnnData
Annotated data matrix.

varm_key: str
Key in adata.varm storing calculated medians or binary scores.

nonzero: bool
Whether to remove zeros from histogram.

scale: str
How to scale the y-axis.

figsize: tuple
Width and height of plot.

show: bool
Whether to show the plot.

save: bool | str (default: False)
Whether to save plot. If string, choose the type of file to save as (“png”, “svg”, “pdf”).

output_folder: str (default: “”)
Output folder for output files.

Returns:

fig: matplotlib.pyplot.figure: Histogram of adata.varm[varm_key]

preprocessing.spaceTx_genefilter(adata, lower_percentile=0.1, upper_percentile=0.99, min_txLength=700, species='human', species_dict=None, gencode_folder='gencode_annotation')

Filtering genes for spatial gene probe panel design.

Parameters

adata: AnnData
Annotated data matrix.

lower_percentile: float (default: 0.1)
Lower quartile percentile to filter non-0 median gene expression.

upper_percentile: float (default: 0.99)
Upper quartile percentile to filter non-0 median gene expression.

min_txLength: int (default: 700)
Minimum transcript length.

species: [“human”, “mouse”, “other”] (default: “human”)
Species relating to gencode_annotation.

Returns

adata: AnnData: Subset AnnData based on lower_percentile, upper_percentile, min_txLength.