Evaluating

evaluating.DecisionTree(adata, cluster_header, markers_dict, *, medians_header='medians_', beta=0.5, combinations=False, use_mean=False, save=False, save_supplementary=False, output_folder='', outputfilename_prefix='')

Calculating sklearn.metrics’s fbeta_score, precision_score, recall_score, and confusion_matrix for genes_eval.

Parameters

adata: AnnData
Annotated data matrix.

cluster_header: str
Column in adata.obs storing cell annotation.

markers_dict: dict
Dictionary containing genes for each cluster_header (clusterName: list of markers)

medians_header: str (default: “medians_{cluster_header}”)
Key in adata.varm storing median expression matrix.

beta: float (default: 0.5)
beta parameter in sklearn.metrics’s fbeta_score.

combinations: bool (default: False)
Whether to find the combination of genes_eval with the highest fbeta_score.

use_mean: bool (default: False)
Whether to use the mean (vs median) for minimum gene expression threshold.

save: bool (default: False)
Whether to save csv and pkl of df_results in output_folder.

save_supplementary: bool (default: False)
Whether to save additional supplementary csvs.

output_folder: str (default: “”)
Output folder. Created if doesn’t exist.

outputfilename_prefix: str (default: “”)
Prefix for all output files.

Returns

df_results: pd.DataFrame: NS-Forest results. Includes classification metrics (f_score, precision, recall, onTarget).

evaluating.add_fraction(adata, df_results, cluster_header, medians_header='medians_', use_mean=False, save_supplementary=False, output_folder='', outputfilename_prefix='')

Calculating sklearn.metrics’s fbeta_score, sklearn.metrics’s prevision_score, sklearn.metrics’s confusion_matrix for each genes_eval combination. Returning set of genes and scores with highest score sum.

Parameters

adata: AnnData: Annotated data matrix.
df_results: pd.DataFrame: NS-Forest results. Contains classification metrics (f_score, precision, recall, onTarget).
cluster_header: Column in adata’s .obs representing cell annotation.
medians_header: str: Key in adata’s .varm storing median expression matrix.
use_mean: Whether to use the mean or median for minimum gene expression threshold.
output_folder: Output folder.
outputfilename_prefix: Prefix for all output files.

Returns

df_results: pd.DataFrame of the NS-Forest results. Contains classification metrics (f_score, precision, recall, onTarget).