Evaluating

evaluating.DecisionTree(adata, cluster_header, markers_dict, *, medians_header='medians_', beta=0.5, combinations=False, use_mean=False, save=False, save_supplementary=False, output_folder='', outputfilename_prefix='')

Calculating sklearn.metrics’s fbeta_score, precision_score, recall_score, and confusion_matrix for genes_eval.

Parameters

adata: AnnData

Annotated data matrix.

cluster_header: str

Column in adata.obs storing cell annotation.

markers_dict: dict

Dictionary containing genes for each cluster_header (clusterName: list of markers)

medians_header: str (default: “medians_{cluster_header}”)

Key in adata.varm storing median expression matrix.

beta: float (default: 0.5)

beta parameter in sklearn.metrics’s fbeta_score.

combinations: bool (default: False)

Whether to find the combination of genes_eval with the highest fbeta_score.

use_mean: bool (default: False)

Whether to use the mean (vs median) for minimum gene expression threshold.

save: bool (default: False)

Whether to save csv and pkl of df_results in output_folder.

save_supplementary: bool (default: False)

Whether to save additional supplementary csvs.

output_folder: str (default: “”)

Output folder. Created if doesn’t exist.

outputfilename_prefix: str (default: “”)

Prefix for all output files.

Returns

df_results: pd.DataFrame

NS-Forest results. Includes classification metrics (f_score, precision, recall, onTarget).

evaluating.add_fraction(adata, df_results, cluster_header, medians_header='medians_', use_mean=False, save_supplementary=False, output_folder='', outputfilename_prefix='')

Calculating sklearn.metrics’s fbeta_score, sklearn.metrics’s prevision_score, sklearn.metrics’s confusion_matrix for each genes_eval combination. Returning set of genes and scores with highest score sum.

Parameters

adata: AnnData

Annotated data matrix.

df_results: pd.DataFrame

NS-Forest results. Contains classification metrics (f_score, precision, recall, onTarget).

cluster_header

Column in adata’s .obs representing cell annotation.

medians_header: str

Key in adata’s .varm storing median expression matrix.

use_mean

Whether to use the mean or median for minimum gene expression threshold.

output_folder

Output folder.

outputfilename_prefix

Prefix for all output files.

Returns

df_results: pd.DataFrame of the NS-Forest results. Contains classification metrics (f_score, precision, recall, onTarget).