Preprocessing
- preprocessing.dendrogram(adata, cluster_header, *, plot=False, save=False, figsize=(12, 2), output_folder='', outputfilename_suffix='', **kwargs)
Generating a dendrogram from the AnnData object.
Parameters
- adata: AnnData
Annotated data matrix.
- cluster_header: str
Column in adata.obs storing cell annotation. Passed into scanpy’s dendrogram as groupby.
- plot: bool (default: False)
Whether to use sc.pl.dendrogram instead of sc.tl.dendrogram.
- save: bool | str (default: False)
Whether to save plot in output_folder. If string, choose the type of file to save as (‘png’(default), ‘svg’, ‘pdf).
- figsize: tuple (default: (12, 2))
figure.figsize for plt.rc_context.
- output_folder: str (default: “”)
Output folder. Created if doesn’t exist.
- outputfilename_suffix: str (default: “”)
Suffix for all output files.
- kwargs: dictionary (default: None)
Additional parameters to pass to sc.tl.dendrogram or sc.pl.dendrogram.
Returns
does not return anything. Adds adata.uns[“dendrogram_{cluster_header}”] to passed in adata.
- preprocessing.prep_medians(adata, cluster_header, use_mean=False, positive_genes_only=True)
Calculating the median expression matrix. Subsetting adata if positive_genes_only = True.
Parameters
- adata: AnnData
Annotated data matrix.
- cluster_header: str
Column in adata.obs storing cell annotation.
- use_mean: bool (default: False)
Whether to use the mean (vs median) for minimum gene expression threshold.
- positive_genes_only: bool (default: True)
Whether to subset AnnData to only have genes with median/mean expression greater than 0.
Returns
- adata: AnnData
AnnData with median expression values stored in adata.varm[“medians_{cluster_header}”].
- preprocessing.get_medians(adata, cluster_header, use_mean=False)
Calculating the median (mean) expression per gene for each cluster_header.
Parameters
- adata: AnnData
Annotated data matrix.
- cluster_header: str
Column in adata.obs storing cell annotation.
- use_mean: bool (default: False)
Whether to use the mean (vs median) for minimum gene expression threshold.
Returns
- cluster_medians: pd.DataFrame
Gene-by-cluster median (mean) expression dataframe.
- preprocessing.prep_binary_scores(adata, cluster_header, medians_header='medians_')
Calculating the binary scores of each gene per cluster_header.
Parameters
- adata: AnnData
Annotated data matrix.
- cluster_header: str
Column in adata.obs storing cell annotation.
- medians_header: str (default: “medians_{cluster_header}”)
Key in adata.varm storing median expression matrix.
Returns
- adata: AnnData
AnnData with binary scores stored in adata.varm[“binary_scores_{cluster_header}”].